The Chain of Responsibility Pattern: Cascading Agent Attempts Until Success

What Is the Chain of Responsibility?

The Chain of Responsibility pattern passes a request along a chain of handlers. Each handler examines the request and either processes it or passes it to the next handler in the chain. The request travels down the chain until a handler successfully processes it, or the chain is exhausted.

In AI agent systems, this pattern is invaluable for building fallback chains. You might try a fast, cheap model first, fall back to a more capable model if the first one fails, and escalate to a specialized agent or human as a last resort. Each link in the chain can also check whether it has the right capabilities before attempting to handle the request.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Core Implementation

from abc import ABC, abstractmethod
from dataclasses import dataclass
from typing import Any

@dataclass
class Request:
    content: str
    required_capabilities: set[str]
    metadata: dict

@dataclass
class Response:
    content: str
    handler_name: str
    success: bool
    cost: float  # estimated cost in USD

class AgentHandler(ABC):
    def __init__(self, name: str, capabilities: set[str],
                 cost_per_call: float):
        self.name = name
        self.capabilities = capabilities
        self.cost_per_call = cost_per_call
        self._next: AgentHandler | None = None

    def set_next(self, handler: "AgentHandler") -> "AgentHandler":
        self._next = handler
        return handler

    def can_handle(self, request: Request) -> bool:
        return request.required_capabilities.issubset(
            self.capabilities
        )

    def handle(self, request: Request) -> Response | None:
        if self.can_handle(request):
            try:
                result = self.process(request)
                if result.success:
                    return result
            except Exception as e:
                print(f"{self.name} failed: {e}")

        if self._next:
            print(f"{self.name} passing to {self._next.name}")
            return self._next.handle(request)

        return None

    @abstractmethod
    def process(self, request: Request) -> Response:
        pass

Building Concrete Handlers

import openai

class LightweightAgent(AgentHandler):
    def __init__(self):
        super().__init__(
            name="GPT-4o-mini",
            capabilities={"text_generation", "summarization",
                          "classification"},
            cost_per_call=0.001,
        )
        self.client = openai.OpenAI()

    def process(self, request: Request) -> Response:
        response = self.client.chat.completions.create(
            model="gpt-4o-mini",
            messages=[{"role": "user", "content": request.content}],
        )
        content = response.choices[0].message.content
        # Simple quality check
        if len(content) < 20:
            return Response(content, self.name, success=False,
                            cost=self.cost_per_call)
        return Response(content, self.name, success=True,
                        cost=self.cost_per_call)

class PowerfulAgent(AgentHandler):
    def __init__(self):
        super().__init__(
            name="GPT-4o",
            capabilities={"text_generation", "summarization",
                          "classification", "reasoning",
                          "code_generation"},
            cost_per_call=0.01,
        )
        self.client = openai.OpenAI()

    def process(self, request: Request) -> Response:
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": request.content}],
        )
        return Response(
            response.choices[0].message.content,
            self.name, success=True,
            cost=self.cost_per_call,
        )

class HumanEscalation(AgentHandler):
    def __init__(self):
        super().__init__(
            name="Human Reviewer",
            capabilities={"text_generation", "summarization",
                          "classification", "reasoning",
                          "code_generation", "human_judgment"},
            cost_per_call=5.0,
        )

    def process(self, request: Request) -> Response:
        # In production, this would create a ticket or send
        # a notification to a human review queue
        return Response(
            content="[Escalated to human review queue]",
            handler_name=self.name,
            success=True,
            cost=self.cost_per_call,
        )

Assembling the Chain

def build_cost_optimized_chain() -> AgentHandler:
    lightweight = LightweightAgent()
    powerful = PowerfulAgent()
    human = HumanEscalation()

    # Chain: cheap -> expensive -> human
    lightweight.set_next(powerful)
    powerful.set_next(human)

    return lightweight

chain = build_cost_optimized_chain()

# Simple request — handled by lightweight agent
simple = Request(
    content="Summarize this paragraph in one sentence.",
    required_capabilities={"summarization"},
    metadata={},
)
result = chain.handle(simple)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")

# Complex request — needs reasoning, skips to powerful agent
complex_req = Request(
    content="Analyze the time complexity of this algorithm.",
    required_capabilities={"reasoning", "code_generation"},
    metadata={},
)
result = chain.handle(complex_req)
print(f"Handled by: {result.handler_name}, Cost: ${result.cost}")

The capability check in can_handle means the chain intelligently skips handlers that lack the required capabilities, so a request needing reasoning jumps straight to GPT-4o without wasting a call on GPT-4o-mini.

flowchart TD
    CALL(["Inbound Call"])
    HEALTH{"Primary<br/>agent healthy?"}
    PRIMARY["Primary agent<br/>LLM provider A"]
    SECONDARY["Hot standby<br/>LLM provider B"]
    QUEUE[("Persisted<br/>call state")]
    HUMAN(["Live human<br/>fallback"])
    DONE(["Caller served"])
    CALL --> HEALTH
    HEALTH -->|Yes| PRIMARY
    HEALTH -->|Timeout or 5xx| SECONDARY
    PRIMARY --> QUEUE
    SECONDARY --> QUEUE
    PRIMARY --> DONE
    SECONDARY --> DONE
    SECONDARY -->|Both fail| HUMAN
    style HEALTH fill:#f59e0b,stroke:#d97706,color:#1f2937
    style PRIMARY fill:#4f46e5,stroke:#4338ca,color:#fff
    style SECONDARY fill:#0ea5e9,stroke:#0369a1,color:#fff
    style HUMAN fill:#dc2626,stroke:#b91c1c,color:#fff
    style DONE fill:#059669,stroke:#047857,color:#fff

FAQ

How do I order the handlers for cost efficiency?

Place the cheapest handler first and the most expensive last. This ensures simple requests are handled cheaply while complex requests still get resolved. Track the percentage of requests handled at each level to monitor whether your chain ordering is optimal.

What if I want to try all handlers and pick the best result?

That is a different pattern — closer to Map-Reduce or an ensemble. The Chain of Responsibility is specifically designed for "first success wins" semantics. If you need to compare outputs from multiple agents, use a fan-out approach and a separate evaluator to pick the best.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

How do I handle the case where no handler in the chain can process a request?

The handle method returns None when the chain is exhausted. Wrap the chain call in logic that detects this and returns a graceful error to the user, such as "We could not process your request. A support ticket has been created."

#AgentDesignPatterns #ChainOfResponsibility #Python #AgenticAI #FaultTolerance #LearnAI #AIEngineering

The Chain of Responsibility Pattern: Cascading Agent Attempts Until Success

What Is the Chain of Responsibility?

Core Implementation

Building Concrete Handlers

Assembling the Chain

FAQ

How do I order the handlers for cost efficiency?

What if I want to try all handlers and pick the best result?

How do I handle the case where no handler in the chain can process a request?

Try CallSphere AI Voice Agents

Related Articles You May Like

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Designing Agent Loops with the Claude Agent SDK

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale

Multilingual Chat Agents in 2026: The 57-Language Gap and How to Close It

Enterprise CIO Guide: Harvey AI — Legal Agents Move from Pilot to Practice