Tool Timeouts and Error Handling in Agent Tool Pipelines
Learn how to build resilient agent tool pipelines using timeouts, failure_error_function, and tool_error_formatter in the OpenAI Agents SDK.
Why Tool Error Handling Matters
In production agent systems, tools fail. APIs time out, databases go down, rate limits trigger, and invalid inputs slip through. Without proper error handling, a single tool failure can crash your entire agent run or produce confusing outputs.
The OpenAI Agents SDK provides three mechanisms to handle tool failures gracefully:
- Timeouts — prevent tools from hanging indefinitely
- failure_error_function — customize what the agent sees when a tool fails
- tool_error_formatter — format Python exceptions into agent-friendly messages
Setting Tool Timeouts
Every function tool accepts a timeout parameter that limits how long the tool can run before being cancelled. This is critical for tools that call external APIs:
flowchart LR
INPUT(["User input"])
AGENT["Agent<br/>name plus instructions"]
HAND{"Handoff to<br/>another agent?"}
SUB["Sub-agent<br/>specialist"]
GUARD{"Guardrail<br/>passed?"}
TOOL["Tool call"]
SDK[("Tracing<br/>OpenAI dashboard")]
OUT(["Final output"])
INPUT --> AGENT --> HAND
HAND -->|Yes| SUB --> GUARD
HAND -->|No| GUARD
GUARD -->|Yes| TOOL --> AGENT
GUARD -->|Block| OUT
AGENT --> OUT
AGENT --> SDK
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
from agents import function_tool
@function_tool(timeout=10)
async def call_slow_api(query: str) -> str:
"""Search an external API that might be slow."""
import httpx
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://api.example.com/search?q={query}",
timeout=8.0,
)
return response.text
The timeout value is in seconds. If the tool does not return within that window, the SDK cancels the execution and reports a failure to the agent. Note that you should also set timeouts on your HTTP client (as shown above) so that network calls fail fast.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Handling Tool Failures with failure_error_function
When a tool raises an exception, the default behavior is to send the error message back to the agent as a tool result. You can customize this with failure_error_function:
from agents import function_tool, RunContextWrapper
def handle_weather_failure(
ctx: RunContextWrapper,
error: Exception,
) -> str:
"""Return a user-friendly message when the weather tool fails."""
return "The weather service is currently unavailable. Please suggest the user try again in a few minutes."
@function_tool(failure_error_function=handle_weather_failure)
async def get_weather(city: str) -> str:
"""Get current weather for a city."""
import httpx
async with httpx.AsyncClient() as client:
response = await client.get(
f"https://weather-api.example.com/{city}"
)
response.raise_for_status()
data = response.json()
return f"{city}: {data['temp']}F, {data['condition']}"
The failure_error_function receives the context and the exception, and returns a string that gets sent to the agent as the tool result. This lets you control the narrative — instead of the agent seeing a raw Python traceback, it sees a clear instruction about what to tell the user.
Formatting Errors at the Agent Level
While failure_error_function works per-tool, you can set a global error formatter at the agent level using tool_error_formatter. This applies to all tools on the agent:
from agents import Agent, function_tool, RunContextWrapper
def format_tool_error(
ctx: RunContextWrapper,
tool_name: str,
error: Exception,
) -> str:
"""Format tool errors consistently across all tools."""
return f"Tool '{tool_name}' failed: {type(error).__name__}. Please try a different approach or inform the user about the issue."
@function_tool
def query_database(sql: str) -> str:
"""Run a read-only SQL query."""
raise ConnectionError("Database connection timed out")
@function_tool
def send_email(to: str, subject: str, body: str) -> str:
"""Send an email to a recipient."""
raise TimeoutError("SMTP server not responding")
agent = Agent(
name="Office Assistant",
instructions="You help with database queries and emails. If a tool fails, explain the issue clearly and suggest alternatives.",
tools=[query_database, send_email],
tool_error_formatter=format_tool_error,
)
The tool_error_formatter receives the tool name along with the error, so you can log, categorize, or route errors differently based on which tool failed.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Combining Timeouts with Error Handlers
In production, you want both — timeouts to prevent hanging, and error handlers to recover gracefully:
import logging
logger = logging.getLogger(__name__)
def handle_api_failure(ctx: RunContextWrapper, error: Exception) -> str:
logger.error(f"API tool failed: {error}")
if isinstance(error, TimeoutError):
return "The external service took too long to respond. Please try again or ask a different question."
return f"An error occurred: {str(error)}. Please try a different approach."
@function_tool(timeout=15, failure_error_function=handle_api_failure)
async def enrich_company_data(domain: str) -> str:
"""Look up company information from a domain name."""
import httpx
async with httpx.AsyncClient() as client:
resp = await client.get(f"https://api.enrichment.com/{domain}")
resp.raise_for_status()
return resp.text
Defensive Tool Design Patterns
Beyond the SDK's built-in mechanisms, follow these patterns for resilient tools:
Validate inputs early. Check parameters before doing expensive work:
@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
"""Transfer funds between accounts."""
if amount <= 0:
return "Error: Transfer amount must be positive."
if amount > 10000:
return "Error: Transfers over $10,000 require manual approval."
# Proceed with transfer...
return f"Transferred ${amount:.2f} from {from_account} to {to_account}."
Return errors as strings, don't raise. When a failure is expected and recoverable, return an error message as a normal tool result rather than raising an exception. This gives the agent clear information without triggering error handling machinery:
@function_tool
def lookup_order(order_id: str) -> str:
"""Look up an order by ID."""
if not order_id.startswith("ORD-"):
return "Invalid order ID format. Order IDs start with 'ORD-' followed by a number."
# Normal lookup logic...
return f"Order {order_id}: shipped, arriving March 15."
Log errors for observability. The agent gets a friendly message, but your monitoring system should see the real error:
def handle_failure_with_logging(ctx: RunContextWrapper, error: Exception) -> str:
logger.exception("Tool failed", exc_info=error)
# Send to your error tracking service
return "This operation failed. Please try again or contact support."
Key Takeaways
- Set
timeouton every tool that calls external services - Use
failure_error_functionfor per-tool error messages - Use
tool_error_formatterfor agent-wide error formatting - Validate inputs early and return error strings for recoverable issues
- Always log the real error for your team while sending friendly messages to the agent
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.