Building Multi-Agent Voice Systems with the OpenAI Agents SDK
A developer guide to building multi-agent voice systems with the OpenAI Agents SDK — triage, handoffs, shared state, and tool calling.
Why one agent is not enough
A single agent with fifty tools and a thousand-line system prompt will work — badly. It will hallucinate tool names, forget constraints, and generally underperform a smaller agent focused on one job. Multi-agent systems split the problem: a triage agent that identifies intent, specialist agents that handle each intent deeply, and handoffs that move the conversation between them without losing context.
This post walks through building a multi-agent voice system with the OpenAI Agents SDK, the same pattern CallSphere uses across its real estate, healthcare, and sales verticals.
caller → triage_agent
│
├── buyer_intent ───► buyer_specialist
├── seller_intent ──► seller_specialist
├── rental_intent ──► rental_specialist
└── tour_intent ────► tour_coordinator
Architecture overview
┌───────────────────────────────────────┐
│ Session state (shared) │
│ • caller info │
│ • conversation history │
│ • collected fields │
└──────────────┬────────────────────────┘
│
▼
┌───────────────────────────────────────┐
│ Triage agent (thin, routing only) │
└──────────────┬────────────────────────┘
│ handoff
┌──────────┼──────────┐
▼ ▼ ▼
┌───────┐ ┌───────┐ ┌───────┐
│buyer │ │seller │ │rental │
│agent │ │agent │ │agent │
└───┬───┘ └───┬───┘ └───┬───┘
│ │ │
▼ ▼ ▼
tools tools tools
Prerequisites
- Python 3.11+ and the
openai-agentspackage. - An OpenAI key with Realtime + Agents SDK access.
- Per-agent tool definitions.
Step-by-step walkthrough
1. Define the triage agent
from agents import Agent, Runner, handoff
buyer_agent = Agent(
name="Buyer Specialist",
instructions="You help home buyers. Ask qualifying questions, check availability, and book tours.",
tools=[search_listings, book_tour],
)
seller_agent = Agent(
name="Seller Specialist",
instructions="You help home sellers. Collect property details and schedule valuation calls.",
tools=[create_valuation_lead],
)
rental_agent = Agent(
name="Rental Specialist",
instructions="You help rental inquiries. Collect preferences and schedule showings.",
tools=[search_rentals, book_showing],
)
triage = Agent(
name="Triage",
instructions=(
"Greet the caller and identify whether they are buying, selling, or renting. "
"Hand off to the correct specialist as soon as you know."
),
handoffs=[handoff(buyer_agent), handoff(seller_agent), handoff(rental_agent)],
)
2. Share session state
from agents import RunContext
class SessionState:
def __init__(self, call_id: str, caller_phone: str):
self.call_id = call_id
self.caller_phone = caller_phone
self.collected = {}
3. Run the loop
async def run_call(call_id: str, caller_phone: str, user_turns: list[str]):
state = SessionState(call_id, caller_phone)
messages = []
for user_text in user_turns:
messages.append({"role": "user", "content": user_text})
result = await Runner.run(triage, input=messages, context=state)
messages.append({"role": "assistant", "content": result.final_output})
4. Handle handoffs cleanly
The SDK emits a HandoffEvent when one agent transfers to another. Use it to log the handoff and keep the shared state consistent.
flowchart LR
INPUT(["User input"])
AGENT["Agent<br/>name plus instructions"]
HAND{"Handoff to<br/>another agent?"}
SUB["Sub-agent<br/>specialist"]
GUARD{"Guardrail<br/>passed?"}
TOOL["Tool call"]
SDK[("Tracing<br/>OpenAI dashboard")]
OUT(["Final output"])
INPUT --> AGENT --> HAND
HAND -->|Yes| SUB --> GUARD
HAND -->|No| GUARD
GUARD -->|Yes| TOOL --> AGENT
GUARD -->|Block| OUT
AGENT --> OUT
AGENT --> SDK
style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
style SDK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
style OUT fill:#059669,stroke:#047857,color:#fff
from agents import HandoffEvent
async def observe(result):
for event in result.events:
if isinstance(event, HandoffEvent):
await log_handoff(event.from_agent, event.to_agent, event.reason)
5. Bridge to the Realtime API
Route the user's audio-derived transcripts into the Runner and pipe the final_output back to the TTS side of the Realtime session. Keep one agent-SDK context per call.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
6. Guardrails per agent
Each specialist gets its own constraints: the buyer agent cannot book valuations, the seller agent cannot search listings. This prevents the combined prompt bloat that kills single-agent systems.
Production considerations
- State scope: shared session state is fine; shared mutable global state is not.
- Handoff loops: add a max-handoff counter; the SDK can recover from loops but it is expensive.
- Tool permissions: agents only see the tools they need.
- Telemetry: record which agent handled each turn for post-call analytics.
- Handoff summaries: the outgoing agent should summarize what it learned so the incoming agent does not re-ask.
CallSphere's real implementation
CallSphere uses the OpenAI Agents SDK for every multi-agent vertical. Real estate runs 10 agents (triage, buyer, seller, rental, tour coordinator, qualification, finance, showing, negotiation, handoff-to-human). Healthcare combines 14 tools behind a lighter triage/specialist split. Salon runs 4 agents (receptionist, booking, upsell, recovery). After-hours escalation has 7 tools around an urgency-classifier triage. IT helpdesk pairs 10 tools with RAG behind a triage agent. The sales pod uses 5 GPT-4 specialists plus ElevenLabs TTS.
The voice plane under all of them is the OpenAI Realtime API with gpt-4o-realtime-preview-2025-06-03, PCM16 at 24kHz, and server VAD. Handoffs happen inside a single Realtime session so there is no audio drop between agents. A GPT-4o-mini post-call pipeline writes per-agent metrics so customers can see which specialist is closing and which is leaking. CallSphere supports 57+ languages with sub-second end-to-end latency.
Common pitfalls
- Too many agents: 3-10 is a sweet spot; 20 is usually over-decomposed.
- Specialists that re-ask basics: use handoff summaries.
- Shared tools across specialists: defeats the point of role separation.
- Handoff loops: cap the count and escalate on loop.
- Ignoring per-agent evals: regressions hide in aggregate metrics.
FAQ
Can I use this without the Realtime API?
Yes. The Agents SDK is transport-agnostic; Realtime is just one front-end.
How do I A/B test a single agent in a multi-agent graph?
Version the agent separately and route X% of triage handoffs to the new version.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
What is a reasonable number of tools per specialist?
3-10. Past 15 the model starts confusing tool signatures.
How do I handle human escalation?
Add a transfer_to_human tool on every specialist and a dedicated escalation agent.
Does handoff cost extra tokens?
Yes, but less than the equivalent monolithic prompt.
Next steps
Want to see a 10-agent real-estate stack running live? Book a demo, read the technology page, or see pricing.
#CallSphere #OpenAIAgentsSDK #MultiAgent #VoiceAI #Orchestration #Handoffs #AIVoiceAgents
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.