Skip to content
Technical Guides
Technical Guides17 min read9 views

Building Multi-Agent Voice Systems with the OpenAI Agents SDK

A developer guide to building multi-agent voice systems with the OpenAI Agents SDK — triage, handoffs, shared state, and tool calling.

Why one agent is not enough

A single agent with fifty tools and a thousand-line system prompt will work — badly. It will hallucinate tool names, forget constraints, and generally underperform a smaller agent focused on one job. Multi-agent systems split the problem: a triage agent that identifies intent, specialist agents that handle each intent deeply, and handoffs that move the conversation between them without losing context.

This post walks through building a multi-agent voice system with the OpenAI Agents SDK, the same pattern CallSphere uses across its real estate, healthcare, and sales verticals.

caller → triage_agent
              │
              ├── buyer_intent ───► buyer_specialist
              ├── seller_intent ──► seller_specialist
              ├── rental_intent ──► rental_specialist
              └── tour_intent ────► tour_coordinator

Architecture overview

┌───────────────────────────────────────┐
│          Session state (shared)      │
│  • caller info                        │
│  • conversation history               │
│  • collected fields                   │
└──────────────┬────────────────────────┘
               │
               ▼
┌───────────────────────────────────────┐
│ Triage agent (thin, routing only)     │
└──────────────┬────────────────────────┘
               │ handoff
    ┌──────────┼──────────┐
    ▼          ▼          ▼
┌───────┐  ┌───────┐  ┌───────┐
│buyer  │  │seller │  │rental │
│agent  │  │agent  │  │agent  │
└───┬───┘  └───┬───┘  └───┬───┘
    │          │          │
    ▼          ▼          ▼
   tools      tools      tools

Prerequisites

  • Python 3.11+ and the openai-agents package.
  • An OpenAI key with Realtime + Agents SDK access.
  • Per-agent tool definitions.

Step-by-step walkthrough

1. Define the triage agent

from agents import Agent, Runner, handoff

buyer_agent = Agent(
    name="Buyer Specialist",
    instructions="You help home buyers. Ask qualifying questions, check availability, and book tours.",
    tools=[search_listings, book_tour],
)

seller_agent = Agent(
    name="Seller Specialist",
    instructions="You help home sellers. Collect property details and schedule valuation calls.",
    tools=[create_valuation_lead],
)

rental_agent = Agent(
    name="Rental Specialist",
    instructions="You help rental inquiries. Collect preferences and schedule showings.",
    tools=[search_rentals, book_showing],
)

triage = Agent(
    name="Triage",
    instructions=(
        "Greet the caller and identify whether they are buying, selling, or renting. "
        "Hand off to the correct specialist as soon as you know."
    ),
    handoffs=[handoff(buyer_agent), handoff(seller_agent), handoff(rental_agent)],
)

2. Share session state

from agents import RunContext

class SessionState:
    def __init__(self, call_id: str, caller_phone: str):
        self.call_id = call_id
        self.caller_phone = caller_phone
        self.collected = {}

3. Run the loop

async def run_call(call_id: str, caller_phone: str, user_turns: list[str]):
    state = SessionState(call_id, caller_phone)
    messages = []
    for user_text in user_turns:
        messages.append({"role": "user", "content": user_text})
        result = await Runner.run(triage, input=messages, context=state)
        messages.append({"role": "assistant", "content": result.final_output})

4. Handle handoffs cleanly

The SDK emits a HandoffEvent when one agent transfers to another. Use it to log the handoff and keep the shared state consistent.

flowchart LR
    INPUT(["User input"])
    AGENT["Agent<br/>name plus instructions"]
    HAND{"Handoff to<br/>another agent?"}
    SUB["Sub-agent<br/>specialist"]
    GUARD{"Guardrail<br/>passed?"}
    TOOL["Tool call"]
    SDK[("Tracing<br/>OpenAI dashboard")]
    OUT(["Final output"])
    INPUT --> AGENT --> HAND
    HAND -->|Yes| SUB --> GUARD
    HAND -->|No| GUARD
    GUARD -->|Yes| TOOL --> AGENT
    GUARD -->|Block| OUT
    AGENT --> OUT
    AGENT --> SDK
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from agents import HandoffEvent

async def observe(result):
    for event in result.events:
        if isinstance(event, HandoffEvent):
            await log_handoff(event.from_agent, event.to_agent, event.reason)

5. Bridge to the Realtime API

Route the user's audio-derived transcripts into the Runner and pipe the final_output back to the TTS side of the Realtime session. Keep one agent-SDK context per call.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

6. Guardrails per agent

Each specialist gets its own constraints: the buyer agent cannot book valuations, the seller agent cannot search listings. This prevents the combined prompt bloat that kills single-agent systems.

Production considerations

  • State scope: shared session state is fine; shared mutable global state is not.
  • Handoff loops: add a max-handoff counter; the SDK can recover from loops but it is expensive.
  • Tool permissions: agents only see the tools they need.
  • Telemetry: record which agent handled each turn for post-call analytics.
  • Handoff summaries: the outgoing agent should summarize what it learned so the incoming agent does not re-ask.

CallSphere's real implementation

CallSphere uses the OpenAI Agents SDK for every multi-agent vertical. Real estate runs 10 agents (triage, buyer, seller, rental, tour coordinator, qualification, finance, showing, negotiation, handoff-to-human). Healthcare combines 14 tools behind a lighter triage/specialist split. Salon runs 4 agents (receptionist, booking, upsell, recovery). After-hours escalation has 7 tools around an urgency-classifier triage. IT helpdesk pairs 10 tools with RAG behind a triage agent. The sales pod uses 5 GPT-4 specialists plus ElevenLabs TTS.

The voice plane under all of them is the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03, PCM16 at 24kHz, and server VAD. Handoffs happen inside a single Realtime session so there is no audio drop between agents. A GPT-4o-mini post-call pipeline writes per-agent metrics so customers can see which specialist is closing and which is leaking. CallSphere supports 57+ languages with sub-second end-to-end latency.

Common pitfalls

  • Too many agents: 3-10 is a sweet spot; 20 is usually over-decomposed.
  • Specialists that re-ask basics: use handoff summaries.
  • Shared tools across specialists: defeats the point of role separation.
  • Handoff loops: cap the count and escalate on loop.
  • Ignoring per-agent evals: regressions hide in aggregate metrics.

FAQ

Can I use this without the Realtime API?

Yes. The Agents SDK is transport-agnostic; Realtime is just one front-end.

How do I A/B test a single agent in a multi-agent graph?

Version the agent separately and route X% of triage handoffs to the new version.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What is a reasonable number of tools per specialist?

3-10. Past 15 the model starts confusing tool signatures.

How do I handle human escalation?

Add a transfer_to_human tool on every specialist and a dedicated escalation agent.

Does handoff cost extra tokens?

Yes, but less than the equivalent monolithic prompt.

Next steps

Want to see a 10-agent real-estate stack running live? Book a demo, read the technology page, or see pricing.

#CallSphere #OpenAIAgentsSDK #MultiAgent #VoiceAI #Orchestration #Handoffs #AIVoiceAgents

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Fully autonomous agents are still a fantasy in production. LangGraph's interrupt() lets you pause for human approval mid-graph without losing state. We cover approve/edit/reject/respond actions and CallSphere's escalation ladder.

Agentic AI

Token-Level Evaluation of Streaming Agents: TTFT, Stream Smoothness, and Mid-Stream Hallucination Detection

Streaming changes the eval game — final-answer correctness isn't enough when users perceive the answer one token at a time. Here's the metric set that matters.

Agentic AI

OpenAI Agents SDK vs Assistants API in 2026: Migration Guide with Eval Parity

Honest principal-engineer comparison of the OpenAI Agents SDK and the legacy Assistants API, with a migration checklist and eval-parity strategy so you don't ship regressions.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Streaming Agent Responses with OpenAI Agents SDK and LangChain in 2026

How to stream tokens, tool-call deltas, and intermediate steps from an agent — with code for both the OpenAI Agents SDK and LangChain — and the gotchas that bite in production.

Agentic AI

Tool Selection Accuracy: The Eval Most Teams Skip — and Should Not (2026)

Your agent picked the wrong tool 12% of the time and the final answer was still right. That's a latent bug. Here's the eval pipeline that surfaces it.