The Chain of Responsibility Pattern: Cascading Agent Attempts Until Success
Implement the Chain of Responsibility pattern for AI agents with fallback chains, capability matching, and cost-optimized ordering to handle requests efficiently.
What Is the Chain of Responsibility?
The Chain of Responsibility pattern passes a request along a chain of handlers. Each handler examines the request and either processes it or passes it to the next handler in the chain. The request travels down the chain until a handler successfully processes it, or the chain is exhausted.
In AI agent systems, this pattern is invaluable for building fallback chains. You might try a fast, cheap model first, fall back to a more capable model if the first one fails, and escalate to a specialized agent or human as a last resort. Each link in the chain can also check whether it has the right capabilities before attempting to handle the request.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Core Implementation
from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any
@dataclass
class Request:
content: str
required_capabilities: set[str]
metadata: dict
@dataclass
class Response:
content: str
handler_name: str
success: bool
cost: float # estimated cost in USD
class AgentHandler(ABC):
def __init__(self, name: str, capabilities: set[str],
cost_per_call: float):
self.name = name
self.capabilities = capabilities
self.cost_per_call = cost_per_call
self._next: AgentHandler | None = None
def set_next(self, handler: "AgentHandler") -> "AgentHandler":
self._next = handler
return handler
def can_handle(self, request: Request) -> bool:
return request.required_capabilities.issubset(
self.capabilities
)
def handle(self, request: Request) -> Response | None:
if self.can_handle(request):
try:
result = self.process(request)
if result.success:
return result
except Exception as e:
print(f"{self.name} failed: {e}")
if self._next:
print(f"{self.name} passing to {self._next.name}")
return self._next.handle(request)
return None
@abstractmethod
def process(self, request: Request) -> Response:
pass
Building Concrete Handlers
import openai
class LightweightAgent(AgentHandler):
def __init__(self):
super().__init__(
name="GPT-4o-mini",
capabilities={"text_generation", "summarization",
"classification"},
cost_per_call=0.001,
)
self.client = openai.OpenAI()
def process(self, request: Request) -> Response:
response = self.client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": request.content}],
)
content = response.choices[0].message.content
# Simple quality check
if len(content) < 20:
return Response(content, self.name, success=False,
cost=self.cost_per_call)
return Response(content, self.name, success=True,
cost=self.cost_per_call)
class PowerfulAgent(AgentHandler):
def __init__(self):
super().__init__(
name="GPT-4o",
capabilities={"text_generation", "summarization",
"classification", "reasoning",
"code_generation"},
cost_per_call=0.01,
)
self.client = openai.OpenAI()
def process(self, request: Request) -> Response:
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": request.content}],
)
return Response(
response.choices[0].message.content,
self.name, success=True,
cost=self.cost_per_call,
)
class HumanEscalation(AgentHandler):
def __init__(self):
super().__init__(
name="Human Reviewer",
capabilities={"text_generation", "summarization",
"classification", "reasoning",
"code_generation", "human_judgment"},
cost_per_call=5.0,
)
def process(self, request: Request) -> Response:
# In production, this would create a ticket or send
# a notification to a human review queue
return Response(
content="[Escalated to human review queue]",
handler_name=self.name,
success=True,
cost=self.cost_per_call,
)
Assembling the Chain
def build_cost_optimized_chain() -> AgentHandler:
lightweight = LightweightAgent()
powerful = PowerfulAgent()
human = HumanEscalation()
# Chain: cheap -> expensive -> human
lightweight.set_next(powerful)
powerful.set_next(human)
return lightweight
chain = build_cost_optimized_chain()
# Simple request — handled by lightweight agent
simple = Request(
content="Summarize this paragraph in one sentence.",
required_capabilities={"summarization"},
metadata={},
)
result = chain.handle(simple)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")
# Complex request — needs reasoning, skips to powerful agent
complex_req = Request(
content="Analyze the time complexity of this algorithm.",
required_capabilities={"reasoning", "code_generation"},
metadata={},
)
result = chain.handle(complex_req)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")
The capability check in can_handle means the chain intelligently skips handlers that lack the required capabilities, so a request needing reasoning jumps straight to GPT-4o without wasting a call on GPT-4o-mini.
flowchart TD
CALL(["Inbound Call"])
HEALTH{"Primary<br/>agent healthy?"}
PRIMARY["Primary agent<br/>LLM provider A"]
SECONDARY["Hot standby<br/>LLM provider B"]
QUEUE[("Persisted<br/>call state")]
HUMAN(["Live human<br/>fallback"])
DONE(["Caller served"])
CALL --> HEALTH
HEALTH -->|Yes| PRIMARY
HEALTH -->|Timeout or 5xx| SECONDARY
PRIMARY --> QUEUE
SECONDARY --> QUEUE
PRIMARY --> DONE
SECONDARY --> DONE
SECONDARY -->|Both fail| HUMAN
style HEALTH fill:#f59e0b,stroke:#d97706,color:#1f2937
style PRIMARY fill:#4f46e5,stroke:#4338ca,color:#fff
style SECONDARY fill:#0ea5e9,stroke:#0369a1,color:#fff
style HUMAN fill:#dc2626,stroke:#b91c1c,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
FAQ
How do I order the handlers for cost efficiency?
Place the cheapest handler first and the most expensive last. This ensures simple requests are handled cheaply while complex requests still get resolved. Track the percentage of requests handled at each level to monitor whether your chain ordering is optimal.
What if I want to try all handlers and pick the best result?
That is a different pattern — closer to Map-Reduce or an ensemble. The Chain of Responsibility is specifically designed for "first success wins" semantics. If you need to compare outputs from multiple agents, use a fan-out approach and a separate evaluator to pick the best.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
How do I handle the case where no handler in the chain can process a request?
The handle method returns None when the chain is exhausted. Wrap the chain call in logic that detects this and returns a graceful error to the user, such as "We could not process your request. A support ticket has been created."
#AgentDesignPatterns #ChainOfResponsibility #Python #AgenticAI #FaultTolerance #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.