Skip to content
Learn Agentic AI
Learn Agentic AI14 min read7 views

Building Multi-Agent Systems with Claude: Coordination Without a Framework

Build multi-agent systems using the raw Anthropic API without any framework. Learn patterns for routing, delegation, result aggregation, and inter-agent communication using plain Python and Claude.

Why Build Without a Framework

Frameworks like LangChain and CrewAI provide convenient abstractions for multi-agent systems, but they also introduce complexity, version lock-in, and opaque behavior that can be hard to debug. For many production systems, building multi-agent coordination directly on the Anthropic API gives you full control over the communication protocol, error handling, and cost management.

The patterns in this guide use nothing beyond the anthropic Python SDK and standard library modules. You will learn the fundamental coordination patterns that every multi-agent framework implements under the hood.

Pattern 1: Router Agent

A router agent examines the user input and delegates to specialized agents:

flowchart TD
    INPUT(["Task input"])
    SUPER["Supervisor agent<br/>plans plus monitors"]
    W1["Worker 1<br/>research"]
    W2["Worker 2<br/>code"]
    W3["Worker 3<br/>writing"]
    CRITIC{"Output meets<br/>rubric?"}
    REWORK["Rework or<br/>retry path"]
    SHARED[("Shared scratchpad<br/>and memory")]
    OUT(["Final result"])
    INPUT --> SUPER
    SUPER --> W1 --> CRITIC
    SUPER --> W2 --> CRITIC
    SUPER --> W3 --> CRITIC
    W1 --> SHARED
    W2 --> SHARED
    W3 --> SHARED
    SHARED --> SUPER
    CRITIC -->|Pass| OUT
    CRITIC -->|Fail| REWORK --> SUPER
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CRITIC fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OUT fill:#059669,stroke:#047857,color:#fff
    style SHARED fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
import anthropic
import json

client = anthropic.Anthropic()

def router_agent(user_input: str) -> dict:
    response = client.messages.create(
        model="claude-haiku-3-5-20241022",
        max_tokens=256,
        system="""Classify the user request into exactly one category.
Return JSON: {"category": "...", "reasoning": "..."}
Categories: technical_support, billing, sales, general""",
        messages=[{"role": "user", "content": user_input}]
    )
    return json.loads(response.content[0].text)

def technical_agent(user_input: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are a technical support specialist. Diagnose issues and provide step-by-step solutions.",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.content[0].text

def billing_agent(user_input: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system="You are a billing specialist. Help with invoices, payments, and subscription changes.",
        messages=[{"role": "user", "content": user_input}]
    )
    return response.content[0].text

AGENTS = {
    "technical_support": technical_agent,
    "billing": billing_agent,
}

def handle_request(user_input: str) -> str:
    route = router_agent(user_input)
    agent_fn = AGENTS.get(route["category"])
    if agent_fn:
        return agent_fn(user_input)
    # Default fallback
    return client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        messages=[{"role": "user", "content": user_input}]
    ).content[0].text

result = handle_request("My API key stopped working after I rotated it")
print(result)

The router uses a cheap, fast model (Haiku) for classification, then dispatches to a more capable model (Sonnet) with a specialized system prompt. This is cost-efficient because most of the routing decisions are simple classification tasks.

Pattern 2: Parallel Delegation

When a task has independent subtasks, run multiple agents concurrently:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
import anthropic
import asyncio

async def run_agent(system: str, prompt: str) -> str:
    client = anthropic.AsyncAnthropic()
    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=2048,
        system=system,
        messages=[{"role": "user", "content": prompt}]
    )
    return response.content[0].text

async def parallel_analysis(document: str) -> dict:
    tasks = {
        "summary": run_agent(
            "You are a summarization agent. Produce a 3-paragraph executive summary.",
            document
        ),
        "risks": run_agent(
            "You are a risk analysis agent. Identify all potential risks and rate them high/medium/low.",
            document
        ),
        "action_items": run_agent(
            "You are a project management agent. Extract all action items with owners and deadlines.",
            document
        ),
    }

    results = {}
    for name, task in tasks.items():
        results[name] = await task

    return results

document = "Meeting notes: We discussed the Q2 product roadmap..."
analysis = asyncio.run(parallel_analysis(document))
for section, content in analysis.items():
    print(f"\n=== {section.upper()} ===\n{content}")

Three agents analyze the same document simultaneously, each with a different focus. This completes in the time of the slowest agent rather than the sum of all three, dramatically reducing total latency.

Pattern 3: Sequential Pipeline

Some tasks require agents to work in sequence, where each agent's output feeds the next:

import anthropic

client = anthropic.Anthropic()

def pipeline_agent(system: str, input_text: str) -> str:
    response = client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system=system,
        messages=[{"role": "user", "content": input_text}]
    )
    return response.content[0].text

def code_review_pipeline(code: str) -> dict:
    # Stage 1: Bug detection
    bugs = pipeline_agent(
        "You are a bug detection agent. Identify bugs, off-by-one errors, and logic flaws. "
        "List each bug with its line reference and severity.",
        f"Review this code for bugs:\n\n{code}"
    )

    # Stage 2: Security review (informed by bugs found)
    security = pipeline_agent(
        "You are a security review agent. Identify security vulnerabilities including "
        "injection, authentication, and data exposure issues.",
        f"Code:\n{code}\n\nBugs already found:\n{bugs}\n\nNow identify security issues not covered above."
    )

    # Stage 3: Synthesize into a final report
    report = pipeline_agent(
        "You are a technical writing agent. Synthesize code review findings into a "
        "clear, actionable report organized by priority.",
        f"Bugs found:\n{bugs}\n\nSecurity issues:\n{security}\n\n"
        "Create a unified code review report."
    )

    return {"bugs": bugs, "security": security, "report": report}

result = code_review_pipeline("def login(user, pw): ...")
print(result["report"])

Each stage adds context for the next. The security agent knows about bugs already found, so it focuses on security-specific issues. The synthesizer combines both perspectives into a coherent report.

Pattern 4: Aggregator with Quality Check

After parallel agents produce results, an aggregator merges and validates them:

import anthropic
import asyncio

async def research_with_aggregation(topic: str) -> str:
    client = anthropic.AsyncAnthropic()

    # Parallel research from different perspectives
    perspectives = await asyncio.gather(
        run_agent("Research from a technical perspective. Cite specific technologies.", topic),
        run_agent("Research from a business/market perspective. Include market data.", topic),
        run_agent("Research from a user experience perspective. Focus on usability.", topic),
    )

    # Aggregator merges and deduplicates
    combined = "\n\n---\n\n".join(
        f"Perspective {i+1}:\n{p}" for i, p in enumerate(perspectives)
    )

    response = await client.messages.create(
        model="claude-sonnet-4-20250514",
        max_tokens=4096,
        system="You are a research synthesis agent. Merge multiple research perspectives "
               "into a single coherent report. Remove duplicates, resolve contradictions, "
               "and organize by theme. Flag any contradictions between sources.",
        messages=[{"role": "user", "content": combined}]
    )
    return response.content[0].text

report = asyncio.run(research_with_aggregation("State of AI agents in enterprise software 2026"))
print(report)

The aggregator is a critical quality control step. It catches contradictions between agents, removes redundancy, and produces a unified output that is higher quality than any single agent could produce alone.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Error Handling in Multi-Agent Systems

Production multi-agent systems need robust error handling:

import anthropic
import asyncio

async def safe_agent_call(name: str, system: str, prompt: str) -> dict:
    try:
        client = anthropic.AsyncAnthropic()
        response = await client.messages.create(
            model="claude-sonnet-4-20250514",
            max_tokens=2048,
            system=system,
            messages=[{"role": "user", "content": prompt}]
        )
        return {"agent": name, "status": "success", "result": response.content[0].text}
    except anthropic.RateLimitError:
        return {"agent": name, "status": "rate_limited", "result": None}
    except anthropic.APIError as e:
        return {"agent": name, "status": "error", "result": str(e)}

async def resilient_pipeline(prompt: str):
    results = await asyncio.gather(
        safe_agent_call("analyst", "You are a data analyst.", prompt),
        safe_agent_call("writer", "You are a technical writer.", prompt),
        safe_agent_call("reviewer", "You are a code reviewer.", prompt),
    )

    successful = [r for r in results if r["status"] == "success"]
    failed = [r for r in results if r["status"] != "success"]

    if failed:
        print(f"Warning: {len(failed)} agents failed: {[f['agent'] for f in failed]}")

    return successful

Wrapping each agent call in error handling ensures that one failure does not take down the entire system. Log failures for debugging but continue with whatever results are available.

FAQ

When should I use a framework instead of raw API calls?

Use a framework when you need features like persistent memory across sessions, complex state machines, built-in tracing dashboards, or agent-to-agent handoff protocols. Use raw API calls when you need full control, minimal dependencies, or when your coordination pattern does not fit a framework's assumptions.

How do I manage costs in multi-agent systems?

Use cheap models (Haiku) for routing, classification, and simple tasks. Reserve expensive models (Opus, Sonnet) for tasks requiring deep reasoning. Use prompt caching aggressively to reduce repeated context costs. Set max_tokens appropriately for each agent — a summarizer needs fewer tokens than a code generator.

How do agents share state without a framework?

Pass state explicitly through function arguments. For simple systems, a shared dictionary works. For complex systems, use a message queue (Redis, NATS) or a shared database. The key principle is making state flow explicit rather than relying on implicit shared memory.


#Anthropic #Claude #MultiAgent #Architecture #Coordination #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

Agentic AI

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Fully autonomous agents are still a fantasy in production. LangGraph's interrupt() lets you pause for human approval mid-graph without losing state. We cover approve/edit/reject/respond actions and CallSphere's escalation ladder.

AI Strategy

Vector DB Build vs Buy: The 2026 Decision Framework Made Simple

When to use Pinecone vs pgvector vs Qdrant vs Weaviate. A decision framework that maps team size and workload to the right pick without endless evaluation loops.

AI Engineering

Building an Organization Skill Registry for Claude Agents

A practical engineering deep dive into Claude org skill registry, covering architecture, tradeoffs, and what production teams need to know about enterprise AI.

AI Voice Agents

Claude-Powered Voice Agents for Salon and Spa Bookings

Why Claude salon AI is reshaping voice and chat automation, with concrete patterns for appointment AI in production deployments. A field-tested view from production teams shippi...

AI Strategy

Building Customer Support Pipelines on Claude Sonnet 4.6

How leaders should think about Claude Sonnet 4.6 customer support — adoption patterns, ROI, competitive dynamics, and what CX automation means for the next 12 months.