Tool Guardrails: Protecting Function Execution
Learn how to implement tool input and output guardrails in the OpenAI Agents SDK to validate function arguments, skip dangerous calls, and replace tool outputs before they reach the agent.
Why Tool Execution Needs Its Own Guardrails
Input guardrails catch bad user messages. Output guardrails catch bad agent responses. But between those two checkpoints, the agent calls tools — and tool calls are where the real damage happens. A miscrafted tool call can delete database records, send emails to the wrong recipient, charge a credit card for the wrong amount, or leak internal data through an API.
Tool guardrails in the OpenAI Agents SDK intercept tool execution at two points: before the function runs (tool input guardrails) and after it returns (tool output guardrails). They give you the ability to validate arguments, skip dangerous calls entirely, or replace tool outputs with sanitized versions.
Tool Input Guardrails: Validating Before Execution
A tool input guardrail inspects the arguments that the agent has decided to pass to a function. It runs after the LLM has generated the tool call but before the actual function executes.
flowchart LR
INPUT(["User input"])
AGENT["Agent<br/>name plus instructions"]
HAND{"Handoff to<br/>another agent?"}
SUB["Sub-agent<br/>specialist"]
GUARD{"Guardrail<br/>passed?"}
TOOL["Tool call"]
SDK[("Tracing<br/>OpenAI dashboard")]
OUT(["Final output"])
INPUT --> AGENT --> HAND
HAND -->|Yes| SUB --> GUARD
HAND -->|No| GUARD
GUARD -->|Yes| TOOL --> AGENT
GUARD -->|Block| OUT
AGENT --> OUT
AGENT --> SDK
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
from agents import Agent, Runner, function_tool
from pydantic import BaseModel
import asyncio
@function_tool
def transfer_funds(from_account: str, to_account: str, amount: float) -> str:
"""Transfer funds between customer accounts."""
# In production, this calls your banking API
return f"Transferred ${amount:.2f} from {from_account} to {to_account}"
@function_tool
def get_account_balance(account_id: str) -> str:
"""Get the current balance for an account."""
# Simulated lookup
balances = {"ACC001": 5420.50, "ACC002": 12300.00}
balance = balances.get(account_id, 0.0)
return f"Account {account_id} balance: ${balance:.2f}"
Now define a tool input guardrail that validates transfer amounts:
async def transfer_amount_guardrail(ctx, agent, tool_call):
"""Block transfers above the auto-approval limit."""
if tool_call.function.name != "transfer_funds":
return None # Only check transfer_funds calls
import json
args = json.loads(tool_call.function.arguments)
amount = args.get("amount", 0)
if amount > 10000:
return {
"skip": True,
"replacement_output": (
"Transfer blocked: amounts over $10,000 require "
"manager approval. Please escalate this request."
),
}
if amount <= 0:
return {
"skip": True,
"replacement_output": (
"Transfer blocked: amount must be a positive number."
),
}
return None # Allow the call to proceed
When the guardrail returns None, the tool call proceeds normally. When it returns a dictionary with skip: True, the actual function is never called, and the replacement_output is fed back to the agent as if the tool had returned that value.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Attaching Tool Input Guardrails to an Agent
banking_agent = Agent(
name="BankingAgent",
instructions="""You are a banking support agent. You can check
account balances and transfer funds between accounts. Always
confirm the details with the customer before executing a transfer.""",
model="gpt-4o",
tools=[transfer_funds, get_account_balance],
tool_use_guardrails=[
{
"type": "input",
"guardrail_function": transfer_amount_guardrail,
},
],
)
This is a fundamentally different safety model than relying on prompt instructions. The prompt says "confirm before transferring," but the guardrail enforces a hard limit regardless of what the model decides to do.
Tool Output Guardrails: Sanitizing After Execution
Tool output guardrails run after the function returns but before the result is passed back to the agent. They are useful for redacting sensitive data, normalizing formats, or adding warnings to tool results.
async def redact_tool_output_guardrail(ctx, agent, tool_call, tool_output):
"""Redact sensitive fields from tool outputs before the agent sees them."""
import re
output_str = str(tool_output)
# Redact SSNs
output_str = re.sub(
r"(d{3})-(d{2})-(d{4})",
r"***-**-\3",
output_str,
)
# Redact credit card numbers (keep last 4)
output_str = re.sub(
r"d{4}[-s]?d{4}[-s]?d{4}[-s]?(d{4})",
r"****-****-****-\1",
output_str,
)
if output_str != str(tool_output):
return {"replacement_output": output_str}
return None # No modification needed
The agent sees the redacted version. It can still reference "the card ending in 4242" or "the last four of your SSN" without the full sensitive data ever appearing in the conversation context. This is critical because the full conversation context is often logged, cached, or sent to other services.
Attaching Output Guardrails
customer_agent = Agent(
name="CustomerAgent",
instructions="Help customers with account inquiries.",
model="gpt-4o",
tools=[lookup_customer, get_transactions],
tool_use_guardrails=[
{
"type": "output",
"guardrail_function": redact_tool_output_guardrail,
},
],
)
Combining Input and Output Tool Guardrails
For maximum protection, apply both input and output guardrails to the same agent. Input guardrails prevent dangerous calls. Output guardrails sanitize the results of allowed calls.
secure_agent = Agent(
name="SecureAgent",
instructions="You are a secure financial assistant.",
model="gpt-4o",
tools=[transfer_funds, get_account_balance, lookup_customer],
tool_use_guardrails=[
{
"type": "input",
"guardrail_function": transfer_amount_guardrail,
},
{
"type": "input",
"guardrail_function": block_after_hours_guardrail,
},
{
"type": "output",
"guardrail_function": redact_tool_output_guardrail,
},
],
)
Skipping Calls vs Replacing Output
Tool guardrails give you two distinct intervention strategies, and choosing the right one depends on the scenario.
Skipping the call means the function never executes. Use this when the tool call itself is dangerous — transferring too much money, deleting data, or calling an external API with invalid parameters.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Replacing the output means the function executes normally, but its return value is modified before the agent sees it. Use this when the function is safe to call but its output contains sensitive data that should not enter the conversation context.
async def selective_guardrail(ctx, agent, tool_call):
"""Example showing both skip and allow-with-modification patterns."""
import json
if tool_call.function.name == "delete_record":
# SKIP: Never allow deletion through the agent
return {
"skip": True,
"replacement_output": (
"Record deletion is not available through this interface. "
"Please submit a deletion request through the admin portal."
),
}
if tool_call.function.name == "search_users":
args = json.loads(tool_call.function.arguments)
query = args.get("query", "")
if len(query) < 3:
# SKIP: Prevent overly broad searches
return {
"skip": True,
"replacement_output": (
"Search query must be at least 3 characters. "
"Please ask the customer for more specific information."
),
}
return None # Allow all other calls
Real-World Pattern: Audit Logging Through Tool Guardrails
Tool guardrails are an excellent place to implement audit logging because they see every tool call the agent makes, including the arguments and outputs.
import json
from datetime import datetime
async def audit_log_guardrail(ctx, agent, tool_call):
"""Log every tool call for audit purposes. Never skip or modify."""
args = json.loads(tool_call.function.arguments)
audit_entry = {
"timestamp": datetime.utcnow().isoformat(),
"agent_name": agent.name,
"tool_name": tool_call.function.name,
"arguments": args,
"session_id": getattr(ctx, "session_id", "unknown"),
}
# Write to your audit log (database, file, or external service)
await write_audit_log(audit_entry)
# Always return None — this guardrail never blocks
return None
This guardrail observes without interfering. Every tool call is logged with full context, giving you a complete audit trail of what the agent did, when, and with what parameters. This is invaluable for compliance, debugging, and understanding agent behavior in production.
Best Practices
Fail closed, not open. If your guardrail encounters an error during evaluation (network timeout, parsing failure), skip the tool call rather than allowing it. An errored guardrail should block, not pass.
Keep guardrail logic simple. Tool guardrails add latency to every tool call. Use fast checks — argument validation, threshold comparisons, regex matching. Reserve LLM-based evaluation for input and output guardrails where the overhead is amortized across the full request.
Test with adversarial tool calls. Craft test cases where the model generates edge-case arguments: negative amounts, empty strings, SQL injection in search queries, extremely long inputs. Your guardrails should handle all of these gracefully.
Separate concerns. Use one guardrail per concern — one for amount limits, one for audit logging, one for PII redaction. This makes them independently testable and easy to enable or disable per environment.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.