AI Agent Reliability Patterns: Retries, Fallbacks, and Circuit Breakers for Production Agents
How to build reliable AI agents using battle-tested distributed systems patterns: retry strategies, fallback chains, circuit breakers, and graceful degradation.
Agents Fail. The Question Is How Gracefully.
AI agents in production face a constant stream of failures: API rate limits, tool execution errors, malformed LLM outputs, timeout on external services, and model hallucinations that derail multi-step plans. The difference between a demo agent and a production agent is not capability -- it is reliability engineering.
The good news is that decades of distributed systems engineering have produced patterns that apply directly to agent systems.
Pattern 1: Structured Retries
Not all failures are equal. Your retry strategy should match the failure type:
from tenacity import retry, stop_after_attempt, wait_exponential, retry_if_exception_type
@retry(
retry=retry_if_exception_type((RateLimitError, TimeoutError)),
wait=wait_exponential(multiplier=1, min=1, max=60),
stop=stop_after_attempt(5),
before_sleep=log_retry_attempt
)
async def call_llm(messages, tools):
return await client.messages.create(
model="claude-sonnet-4-20250514",
messages=messages,
tools=tools
)
Key principles:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Exponential backoff: Prevents thundering herd on rate limits
- Jitter: Add random jitter to prevent synchronized retries from multiple agents
- Selective retry: Only retry transient errors (rate limits, timeouts). Do not retry on invalid requests or authentication failures
- Maximum attempts: Always cap retries to prevent infinite loops
Pattern 2: Model Fallback Chains
When your primary model is unavailable or degraded, fall back to alternatives:
MODEL_CHAIN = [
{"model": "claude-sonnet-4-20250514", "provider": "anthropic"},
{"model": "gpt-4o", "provider": "openai"},
{"model": "claude-haiku-4-20250514", "provider": "anthropic"}, # Cheaper, faster, less capable
]
async def resilient_llm_call(messages, tools):
for model_config in MODEL_CHAIN:
try:
return await call_provider(
model=model_config["model"],
provider=model_config["provider"],
messages=messages,
tools=tools
)
except (ServiceUnavailableError, RateLimitError) as e:
logger.warning(f"Fallback from {model_config['model']}: {e}")
continue
raise AllModelsUnavailableError("Exhausted all model fallbacks")
Important considerations:
flowchart TD
HUB(("Agents Fail. The<br/>Question Is How…"))
HUB --> L0["Pattern 1: Structured<br/>Retries"]
style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L1["Pattern 2: Model Fallback<br/>Chains"]
style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L2["Pattern 3: Circuit Breakers"]
style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L3["Pattern 4: Idempotent Tool<br/>Execution"]
style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L4["Pattern 5: Graceful<br/>Degradation"]
style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L5["Pattern 6: Checkpointing for<br/>Long-Running Agents"]
style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L6["Measuring Reliability"]
style L6 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
style HUB fill:#4f46e5,stroke:#4338ca,color:#fff
- Prompts may need adjustment for different models (tool schemas, system prompt format)
- Track which model actually served each request for quality monitoring
- Quality may degrade with fallback models -- alert when the primary model has been unavailable for extended periods
Pattern 3: Circuit Breakers
Prevent cascading failures by stopping calls to a failing service:
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.failure_count = 0
self.failure_threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.state = "CLOSED" # CLOSED = normal, OPEN = blocking, HALF_OPEN = testing
self.last_failure_time = None
async def call(self, func, *args, **kwargs):
if self.state == "OPEN":
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = "HALF_OPEN"
else:
raise CircuitOpenError("Circuit breaker is open")
try:
result = await func(*args, **kwargs)
if self.state == "HALF_OPEN":
self.state = "CLOSED"
self.failure_count = 0
return result
except Exception as e:
self.failure_count += 1
self.last_failure_time = time.time()
if self.failure_count >= self.failure_threshold:
self.state = "OPEN"
raise
Use separate circuit breakers for each external dependency (LLM provider, tool APIs, databases).
Pattern 4: Idempotent Tool Execution
Agent tools must be safe to retry. If a tool call times out, the agent (or retry logic) may call it again. Non-idempotent tools can cause double-charges, duplicate records, or other side effects.
Design principles:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- Use idempotency keys for operations that create or modify resources
- Make read operations naturally idempotent
- Log tool execution results and check for existing results before re-executing
- Use database transactions with unique constraints to prevent duplicates
Pattern 5: Graceful Degradation
When full functionality is unavailable, provide reduced but useful service:
- Tool failure: If a search tool fails, the agent can still answer from its parametric knowledge (with appropriate caveats)
- Context retrieval failure: If RAG retrieval fails, fall back to a general response with a disclaimer
- Timeout: If the agent cannot complete a complex task within the time budget, return partial results with an explanation
Pattern 6: Checkpointing for Long-Running Agents
Agents that run for minutes or hours should checkpoint their state:
class CheckpointedAgent:
async def run(self, task):
checkpoint = await self.load_checkpoint(task.id)
for step in self.plan(task, resume_from=checkpoint):
result = await self.execute_step(step)
await self.save_checkpoint(task.id, step, result)
if result.failed and not result.retryable:
return self.partial_result(task.id)
return self.final_result(task.id)
If the agent crashes or the process restarts, it resumes from the last checkpoint instead of starting over.
Measuring Reliability
Track these metrics to quantify agent reliability:
- Task completion rate: Percentage of tasks completed successfully
- Mean time to completion: Average wall-clock time per task
- Retry rate: How often retries are needed (high rates indicate systemic issues)
- Fallback rate: How often the primary model/tool is unavailable
- Error categorization: Breakdown of failures by type (rate limit, timeout, parsing, tool error)
Sources: Microsoft Release It! Patterns | Anthropic Agent Reliability | AWS Well-Architected Framework
flowchart LR
IN(["Input prompt"])
subgraph PRE["Pre processing"]
TOK["Tokenize"]
EMB["Embed"]
end
subgraph CORE["Model Core"]
ATTN["Self attention layers"]
MLP["Feed forward layers"]
end
subgraph POST["Post processing"]
SAMP["Sampling"]
DETOK["Detokenize"]
end
OUT(["Generated text"])
IN --> TOK --> EMB --> ATTN --> MLP --> SAMP --> DETOK --> OUT
style IN fill:#f1f5f9,stroke:#64748b,color:#0f172a
style CORE fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
flowchart TD
HUB(("Agents Fail. The<br/>Question Is How…"))
HUB --> L0["Pattern 1: Structured<br/>Retries"]
style L0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L1["Pattern 2: Model Fallback<br/>Chains"]
style L1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L2["Pattern 3: Circuit Breakers"]
style L2 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L3["Pattern 4: Idempotent Tool<br/>Execution"]
style L3 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L4["Pattern 5: Graceful<br/>Degradation"]
style L4 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L5["Pattern 6: Checkpointing for<br/>Long-Running Agents"]
style L5 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
HUB --> L6["Measuring Reliability"]
style L6 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
style HUB fill:#4f46e5,stroke:#4338ca,color:#fff
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.