Why Tool Error Handling Matters

In production agent systems, tools fail. APIs time out, databases go down, rate limits trigger, and invalid inputs slip through. Without proper error handling, a single tool failure can crash your entire agent run or produce confusing outputs.

The OpenAI Agents SDK provides three mechanisms to handle tool failures gracefully:

Timeouts — prevent tools from hanging indefinitely
failure_error_function — customize what the agent sees when a tool fails
tool_error_formatter — format Python exceptions into agent-friendly messages

Setting Tool Timeouts

Every function tool accepts a timeout parameter that limits how long the tool can run before being cancelled. This is critical for tools that call external APIs:

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff

from agents import function_tool

@function_tool(timeout=10)
async def call_slow_api(query: str) -> str:
    """Search an external API that might be slow."""
    import httpx
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://api.example.com/search?q={query}",
            timeout=8.0,
        )
        return response.text

The timeout value is in seconds. If the tool does not return within that window, the SDK cancels the execution and reports a failure to the agent. Note that you should also set timeouts on your HTTP client (as shown above) so that network calls fail fast.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Handling Tool Failures with failure_error_function

When a tool raises an exception, the default behavior is to send the error message back to the agent as a tool result. You can customize this with failure_error_function:

from agents import function_tool, RunContextWrapper

def handle_weather_failure(
    ctx: RunContextWrapper,
    error: Exception,
) -> str:
    """Return a user-friendly message when the weather tool fails."""
    return "The weather service is currently unavailable. Please suggest the user try again in a few minutes."

@function_tool(failure_error_function=handle_weather_failure)
async def get_weather(city: str) -> str:
    """Get current weather for a city."""
    import httpx
    async with httpx.AsyncClient() as client:
        response = await client.get(
            f"https://weather-api.example.com/{city}"
        )
        response.raise_for_status()
        data = response.json()
        return f"{city}: {data['temp']}F, {data['condition']}"

The failure_error_function receives the context and the exception, and returns a string that gets sent to the agent as the tool result. This lets you control the narrative — instead of the agent seeing a raw Python traceback, it sees a clear instruction about what to tell the user.

Formatting Errors at the Agent Level

While failure_error_function works per-tool, you can set a global error formatter at the agent level using tool_error_formatter. This applies to all tools on the agent:

from agents import Agent, function_tool, RunContextWrapper

def format_tool_error(
    ctx: RunContextWrapper,
    tool_name: str,
    error: Exception,
) -> str:
    """Format tool errors consistently across all tools."""
    return f"Tool '{tool_name}' failed: {type(error).__name__}. Please try a different approach or inform the user about the issue."

@function_tool
def query_database(sql: str) -> str:
    """Run a read-only SQL query."""
    raise ConnectionError("Database connection timed out")

@function_tool
def send_email(to: str, subject: str, body: str) -> str:
    """Send an email to a recipient."""
    raise TimeoutError("SMTP server not responding")

agent = Agent(
    name="Office Assistant",
    instructions="You help with database queries and emails. If a tool fails, explain the issue clearly and suggest alternatives.",
    tools=[query_database, send_email],
    tool_error_formatter=format_tool_error,
)

The tool_error_formatter receives the tool name along with the error, so you can log, categorize, or route errors differently based on which tool failed.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Combining Timeouts with Error Handlers

In production, you want both — timeouts to prevent hanging, and error handlers to recover gracefully:

import logging

logger = logging.getLogger(__name__)

def handle_api_failure(ctx: RunContextWrapper, error: Exception) -> str:
    logger.error(f"API tool failed: {error}")
    if isinstance(error, TimeoutError):
        return "The external service took too long to respond. Please try again or ask a different question."
    return f"An error occurred: {str(error)}. Please try a different approach."

@function_tool(timeout=15, failure_error_function=handle_api_failure)
async def enrich_company_data(domain: str) -> str:
    """Look up company information from a domain name."""
    import httpx
    async with httpx.AsyncClient() as client:
        resp = await client.get(f"https://api.enrichment.com/{domain}")
        resp.raise_for_status()
        return resp.text

Defensive Tool Design Patterns

Beyond the SDK's built-in mechanisms, follow these patterns for resilient tools:

Validate inputs early. Check parameters before doing expensive work:

@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
    """Transfer funds between accounts."""
    if amount <= 0:
        return "Error: Transfer amount must be positive."
    if amount > 10000:
        return "Error: Transfers over $10,000 require manual approval."
    # Proceed with transfer...
    return f"Transferred ${amount:.2f} from {from_account} to {to_account}."

Return errors as strings, don't raise. When a failure is expected and recoverable, return an error message as a normal tool result rather than raising an exception. This gives the agent clear information without triggering error handling machinery:

@function_tool
def lookup_order(order_id: str) -> str:
    """Look up an order by ID."""
    if not order_id.startswith("ORD-"):
        return "Invalid order ID format. Order IDs start with 'ORD-' followed by a number."
    # Normal lookup logic...
    return f"Order {order_id}: shipped, arriving March 15."

Log errors for observability. The agent gets a friendly message, but your monitoring system should see the real error:

def handle_failure_with_logging(ctx: RunContextWrapper, error: Exception) -> str:
    logger.exception("Tool failed", exc_info=error)
    # Send to your error tracking service
    return "This operation failed. Please try again or contact support."

Key Takeaways

Set timeout on every tool that calls external services
Use failure_error_function for per-tool error messages
Use tool_error_formatter for agent-wide error formatting
Validate inputs early and return error strings for recoverable issues
Always log the real error for your team while sending friendly messages to the agent

Tool Timeouts and Error Handling in Agent Tool Pipelines

Why Tool Error Handling Matters

Setting Tool Timeouts

Handling Tool Failures with failure_error_function

Formatting Errors at the Agent Level

Combining Timeouts with Error Handlers

Defensive Tool Design Patterns

Key Takeaways

Try CallSphere AI Voice Agents

Related Articles You May Like

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

OpenAI revenue run-rate — April 2026 read — April 2026 update

Stargate progress update — April 2026 site and capex

OpenAI acquisitions and acquihires — April 2026 roundup

The Anthropic vs OpenAI Founders' Schism: How a 2020 Disagreement Shaped Modern LLM Mythology