Skip to content
Learn Agentic AI
Learn Agentic AI11 min read16 views

Tool Guardrails: Protecting Function Execution

Learn how to implement tool input and output guardrails in the OpenAI Agents SDK to validate function arguments, skip dangerous calls, and replace tool outputs before they reach the agent.

Why Tool Execution Needs Its Own Guardrails

Input guardrails catch bad user messages. Output guardrails catch bad agent responses. But between those two checkpoints, the agent calls tools — and tool calls are where the real damage happens. A miscrafted tool call can delete database records, send emails to the wrong recipient, charge a credit card for the wrong amount, or leak internal data through an API.

Tool guardrails in the OpenAI Agents SDK intercept tool execution at two points: before the function runs (tool input guardrails) and after it returns (tool output guardrails). They give you the ability to validate arguments, skip dangerous calls entirely, or replace tool outputs with sanitized versions.

Tool Input Guardrails: Validating Before Execution

A tool input guardrail inspects the arguments that the agent has decided to pass to a function. It runs after the LLM has generated the tool call but before the actual function executes.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner, function_tool
from pydantic import BaseModel
import asyncio

@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
    """Transfer funds between customer accounts."""
    # In production, this calls your banking API
    return f"Transferred ${amount:.2f} from {from_account} to {to_account}"

@function_tool
def get_account_balance(account_id: str) -> str:
    """Get the current balance for an account."""
    # Simulated lookup
    balances = {"ACC001": 5420.50, "ACC002": 12300.00}
    balance = balances.get(account_id, 0.0)
    return f"Account {account_id} balance: ${balance:.2f}"

Now define a tool input guardrail that validates transfer amounts:

async def transfer_amount_guardrail(ctx, agent, tool_call):
    """Block transfers above the auto-approval limit."""
    if tool_call.function.name != "transfer_funds":
        return None  # Only check transfer_funds calls

    import json
    args = json.loads(tool_call.function.arguments)
    amount = args.get("amount", 0)

    if amount > 10000:
        return {
            "skip": True,
            "replacement_output": (
                "Transfer blocked: amounts over $10,000 require "
                "manager approval. Please escalate this request."
            ),
        }

    if amount <= 0:
        return {
            "skip": True,
            "replacement_output": (
                "Transfer blocked: amount must be a positive number."
            ),
        }

    return None  # Allow the call to proceed

When the guardrail returns None, the tool call proceeds normally. When it returns a dictionary with skip: True, the actual function is never called, and the replacement_output is fed back to the agent as if the tool had returned that value.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Attaching Tool Input Guardrails to an Agent

banking_agent = Agent(
    name="BankingAgent",
    instructions="""You are a banking support agent. You can check
    account balances and transfer funds between accounts. Always
    confirm the details with the customer before executing a transfer.""",
    model="gpt-4o",
    tools=[transfer_funds, get_account_balance],
    tool_use_guardrails=[
        {
            "type": "input",
            "guardrail_function": transfer_amount_guardrail,
        },
    ],
)

This is a fundamentally different safety model than relying on prompt instructions. The prompt says "confirm before transferring," but the guardrail enforces a hard limit regardless of what the model decides to do.

Tool Output Guardrails: Sanitizing After Execution

Tool output guardrails run after the function returns but before the result is passed back to the agent. They are useful for redacting sensitive data, normalizing formats, or adding warnings to tool results.

async def redact_tool_output_guardrail(ctx, agent, tool_call, tool_output):
    """Redact sensitive fields from tool outputs before the agent sees them."""
    import re

    output_str = str(tool_output)

    # Redact SSNs
    output_str = re.sub(
        r"(d{3})-(d{2})-(d{4})",
        r"***-**-\3",
        output_str,
    )

    # Redact credit card numbers (keep last 4)
    output_str = re.sub(
        r"d{4}[-s]?d{4}[-s]?d{4}[-s]?(d{4})",
        r"****-****-****-\1",
        output_str,
    )

    if output_str != str(tool_output):
        return {"replacement_output": output_str}

    return None  # No modification needed

The agent sees the redacted version. It can still reference "the card ending in 4242" or "the last four of your SSN" without the full sensitive data ever appearing in the conversation context. This is critical because the full conversation context is often logged, cached, or sent to other services.

Attaching Output Guardrails

customer_agent = Agent(
    name="CustomerAgent",
    instructions="Help customers with account inquiries.",
    model="gpt-4o",
    tools=[lookup_customer, get_transactions],
    tool_use_guardrails=[
        {
            "type": "output",
            "guardrail_function": redact_tool_output_guardrail,
        },
    ],
)

Combining Input and Output Tool Guardrails

For maximum protection, apply both input and output guardrails to the same agent. Input guardrails prevent dangerous calls. Output guardrails sanitize the results of allowed calls.

secure_agent = Agent(
    name="SecureAgent",
    instructions="You are a secure financial assistant.",
    model="gpt-4o",
    tools=[transfer_funds, get_account_balance, lookup_customer],
    tool_use_guardrails=[
        {
            "type": "input",
            "guardrail_function": transfer_amount_guardrail,
        },
        {
            "type": "input",
            "guardrail_function": block_after_hours_guardrail,
        },
        {
            "type": "output",
            "guardrail_function": redact_tool_output_guardrail,
        },
    ],
)

Skipping Calls vs Replacing Output

Tool guardrails give you two distinct intervention strategies, and choosing the right one depends on the scenario.

Skipping the call means the function never executes. Use this when the tool call itself is dangerous — transferring too much money, deleting data, or calling an external API with invalid parameters.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Replacing the output means the function executes normally, but its return value is modified before the agent sees it. Use this when the function is safe to call but its output contains sensitive data that should not enter the conversation context.

async def selective_guardrail(ctx, agent, tool_call):
    """Example showing both skip and allow-with-modification patterns."""
    import json

    if tool_call.function.name == "delete_record":
        # SKIP: Never allow deletion through the agent
        return {
            "skip": True,
            "replacement_output": (
                "Record deletion is not available through this interface. "
                "Please submit a deletion request through the admin portal."
            ),
        }

    if tool_call.function.name == "search_users":
        args = json.loads(tool_call.function.arguments)
        query = args.get("query", "")
        if len(query) < 3:
            # SKIP: Prevent overly broad searches
            return {
                "skip": True,
                "replacement_output": (
                    "Search query must be at least 3 characters. "
                    "Please ask the customer for more specific information."
                ),
            }

    return None  # Allow all other calls

Real-World Pattern: Audit Logging Through Tool Guardrails

Tool guardrails are an excellent place to implement audit logging because they see every tool call the agent makes, including the arguments and outputs.

import json
from datetime import datetime

async def audit_log_guardrail(ctx, agent, tool_call):
    """Log every tool call for audit purposes. Never skip or modify."""
    args = json.loads(tool_call.function.arguments)

    audit_entry = {
        "timestamp": datetime.utcnow().isoformat(),
        "agent_name": agent.name,
        "tool_name": tool_call.function.name,
        "arguments": args,
        "session_id": getattr(ctx, "session_id", "unknown"),
    }

    # Write to your audit log (database, file, or external service)
    await write_audit_log(audit_entry)

    # Always return None — this guardrail never blocks
    return None

This guardrail observes without interfering. Every tool call is logged with full context, giving you a complete audit trail of what the agent did, when, and with what parameters. This is invaluable for compliance, debugging, and understanding agent behavior in production.

Best Practices

Fail closed, not open. If your guardrail encounters an error during evaluation (network timeout, parsing failure), skip the tool call rather than allowing it. An errored guardrail should block, not pass.

Keep guardrail logic simple. Tool guardrails add latency to every tool call. Use fast checks — argument validation, threshold comparisons, regex matching. Reserve LLM-based evaluation for input and output guardrails where the overhead is amortized across the full request.

Test with adversarial tool calls. Craft test cases where the model generates edge-case arguments: negative amounts, empty strings, SQL injection in search queries, extremely long inputs. Your guardrails should handle all of these gracefully.

Separate concerns. Use one guardrail per concern — one for amount limits, one for audit logging, one for PII redaction. This makes them independently testable and easy to enable or disable per environment.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Input and Output Guardrails in the OpenAI Agents SDK: A Production Pattern (2026)

Stop the agent BEFORE it does the wrong thing. How to wire input and output guardrails in the OpenAI Agents SDK with cheap classifiers and an eval suite that proves they work.

Agentic AI

Safety Evaluation for Agents: Jailbreak, Prompt Injection, and Tool-Misuse Test Suites in 2026

How to build a safety eval pipeline that runs known jailbreak corpora, prompt-injection attacks, and tool-misuse scenarios on every release — and gates merges on it.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

AI Engineering

NeMo Guardrails vs LlamaGuard: Side-by-Side Comparison in 2026

NeMo Guardrails and LlamaGuard solve overlapping problems with different architectures. The trade-offs once you push them past 100 RPS in production agent stacks.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.