Skip to content
Technical Guides
Technical Guides9 min read4 views

Hierarchical Agent Handoffs (OpenAI Agents SDK) vs Vapi Squads

Triage to specialist to return-to-orchestrator pattern explained with code. CallSphere's OpenAI Agents SDK handoffs vs Vapi Squads' linear chain.

TL;DR

CallSphere's voice agents use a triage → specialist → return-to-orchestrator pattern via the OpenAI Agents SDK. Vapi's answer to multi-agent is Squads, which chain agents linearly without a return path. The hierarchical pattern lets a caller bounce between three or four specialists in one call without losing context. The linear pattern works when the conversation has a fixed sequence and the caller doesn't switch topics. Most production voice AI workloads — healthcare intake, real estate buyer calls, IT helpdesk — benefit from hierarchical handoffs because the caller controls the topic and changes it mid-call.

The Multi-Agent Pattern That Actually Scales

A single mega-prompt that handles every skill collapses around 70% reliability under production load. Every team that scales past one workflow rebuilds around multiple specialist agents. The architectural question is how the agents are wired together.

Two dominant patterns:

  1. Linear chain: Agent A talks until handoff, then Agent B talks until handoff, then Agent C closes. No return path. (Vapi Squads.)
  2. Hierarchical with return: A triage agent at the root dispatches to specialists. Specialists can hand back to triage or to peers. State persists across handoffs. (CallSphere via OpenAI Agents SDK.)

Linear is simpler to design. Hierarchical handles caller-driven topic changes natively.

Vapi Squads in Detail

Vapi Squads define a sequence of agent definitions that participate in a call. Each agent has its own system prompt, tools, and voice. Transitions are triggered by either the agent calling a handoff function or by Vapi's matching logic.

What works:

  • Cleanly separates concerns. Sales script agents — opener, qualifier, closer — fit Squads perfectly.
  • Voice changes per agent give an audible cue when the call shifts.
  • Each agent has a focused prompt and toolset.

What doesn't:

  • No return path. Once you hand off to a specialist, you can't return to the original agent without redesigning the chain.
  • No central listener. There's no triage agent always present that can intercept "wait, actually, can you check my balance again?" mid-call.
  • State sharing is metadata-shaped. Persistent state across agents is bolt-on rather than first-class.

CallSphere Hierarchical Handoffs

The OpenAI Agents SDK gives every agent a list of handoffs — other agents it can transfer the conversation to. The runtime tracks the active agent and forwards new turns to it. Importantly, specialists can hand back to triage or to peers, which gives you arbitrary topology, not just a linear chain.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Production toplogies on CallSphere:

  • Real Estate: Triage + 10 specialists (Property Search with vision, Buyer Lead, Seller Lead, Mortgage Pre-Qual, Tour Scheduling, Listing Inquiry, Open House, Market Analytics, Closing Coordinator). Up to 6 specialist visits per call.
  • Salon: Triage + 4 specialists (Booking, Service Recommendation, Reminder/Reschedule).
  • IT Helpdesk: Triage + 10 specialists with ChromaDB RAG behind the answer specialist.
  • After-Hours: Triage + 7 specialists with escalation policy.
  • Sales: 5 GPT-4 specialists + ElevenLabs "Sarah" voice.
  • Healthcare: Single Head Agent + 14 function-calling tools (single-domain depth, not multi-agent breadth).

Code-Level Pattern

from agents import Agent

property_search = Agent(
    name="Property Search",
    instructions="Search listings, including buyer-uploaded photos via vision.",
    tools=[search_listings, analyze_photo],
)

buyer_lead = Agent(
    name="Buyer Lead",
    instructions="Qualify budget, timeline, and preapproval. Hand off to property_search if listing-specific. Hand back to triage if scope changes.",
    tools=[qualify_buyer, capture_contact],
    handoffs=[property_search],
)

mortgage_prequal = Agent(
    name="Mortgage Pre-Qual",
    instructions="Walk caller through pre-qual estimate.",
    tools=[run_prequal_estimate],
)

tour_scheduling = Agent(
    name="Tour Scheduling",
    instructions="Book a property tour at the listing.",
    tools=[book_tour, send_confirmation_sms],
)

triage = Agent(
    name="Triage",
    instructions=(
        "Listen for caller intent. Route to specialist. "
        "Return here when specialist completes."
    ),
    handoffs=[buyer_lead, mortgage_prequal, tour_scheduling, property_search],
)

# Wire return paths
buyer_lead.handoffs.append(triage)
mortgage_prequal.handoffs.append(triage)
tour_scheduling.handoffs.append(triage)
property_search.handoffs.append(triage)

The graph is now: Triage ↔ Buyer Lead ↔ Property Search, plus Mortgage Pre-Qual and Tour Scheduling. Any specialist can return to Triage, and Buyer Lead can hand directly to Property Search without bouncing through Triage.

Sequence Diagram of a Real Call

sequenceDiagram
    participant Caller
    participant T as Triage
    participant BL as Buyer Lead
    participant PS as Property Search
    participant MQ as Mortgage Pre-Qual
    participant TS as Tour Scheduling
    Caller->>T: "Hi, I'm interested in buying"
    T->>BL: handoff(intent="buyer")
    Caller->>BL: budget + timeline
    Caller->>BL: "Here's a photo I texted you"
    BL->>PS: handoff(reason="listing photo")
    PS-->>Caller: "That's listing 123, $689k, 3BR"
    PS->>BL: handoff(complete)
    Caller->>BL: "Am I pre-qualified?"
    BL->>MQ: handoff(reason="prequal check")
    MQ-->>Caller: pre-qual estimate
    MQ->>BL: handoff(complete)
    BL->>TS: handoff(reason="schedule tour")
    TS-->>Caller: tour confirmed Saturday 2pm
    TS->>T: handoff(complete)
    T-->>Caller: closing summary

Six specialist visits, two return-to-orchestrator transitions, all in one call. The same flow on a linear Squad would have to be designed up-front and would not handle the caller's "actually, am I prequalified?" detour gracefully.

Capability Comparison

Capability CallSphere (Agents SDK) Vapi Squads
Topology Hierarchical with return Linear chain
Triage agent Native Workaround
Return-to-orchestrator Native No
Vision-capable specialist Yes (Real Estate) DIY
RAG-backed specialist Yes (IT + ChromaDB) DIY
Specialists per vertical Up to 10 Limited by chain length
Shared session state SDK-managed Manual via metadata
Mid-call topic switch Native Not natively supported
Tool subset per agent Yes Yes
Voice change per agent Optional Native

State Sharing Across Handoffs

The Agents SDK passes session state forward to the next agent. CallSphere extends this with a vertical-specific context object (caller name, intent, captured fields, prior tool outputs). Specialists read what they need and add what they discover. Triage reads the final state to compose the closing summary.

In Squads, state sharing is metadata fields you populate on transitions. Possible, but more code.

When Linear Squads Are the Right Tool

Linear works when:

  • The call has a fixed script (sales call: opener → qualifier → closer).
  • The caller does not control the topic.
  • You want a different voice per agent for an audible script cue.

CallSphere uses a roughly linear pattern in Sales (5 GPT-4 specialists + Sarah voice) for exactly that reason.

When Hierarchical Wins

Hierarchical wins when:

  • The caller controls the topic (healthcare intake, IT helpdesk, real estate buyer).
  • More than two skill domains share one call.
  • You want to compose a closing summary from all specialist outputs at the end.

This covers most enterprise voice AI workloads in 2026.

FAQ

Can Vapi do hierarchical handoffs at all?

You can simulate them by having every Squad agent able to "return" via a transfer to a known triage agent, but the SDK pattern is linear and the experience reflects that. State sharing is more manual.

How does CallSphere prevent infinite handoff loops?

Each agent has a max-handoff-depth budget per call. Triage tracks the count and short-circuits with a polite escalation if the depth exceeds the budget.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

What about cost? More agents means more tokens?

The system prompt for each specialist is much shorter than a mega-prompt would be. Total tokens per call are typically lower than a single-agent design because each specialist sees only the relevant context.

How do specialists know when to hand back?

Each specialist's instructions tell it the completion criteria. The Agents SDK exposes a handoff(target) tool the model can call. The triage agent's instructions reinforce "specialists will return here when done."

Does the caller hear the handoff?

By default, no. The handoff is silent. CallSphere optionally adds a "let me get a specialist for you" line for some verticals (after-hours escalation), tunable per workflow.

Designing Your Own Agent Topology

If you're considering building a multi-agent voice stack, here's the design heuristic CallSphere uses across verticals.

Step one: list every distinct skill the agent needs. Not categories — specific verbs. "Qualify a buyer," "search listings," "estimate mortgage pre-qual," "book a tour."

Step two: cluster the skills by who controls the topic. If a single skill always follows another in fixed order, they belong together. If a caller can request a skill out of order, it deserves its own specialist.

Step three: identify the triage role. Triage doesn't do work; it routes. It listens for intent at the start and again whenever a specialist returns control.

Step four: define handoff criteria as text in each specialist's prompt. "Hand back to triage when the caller's intent shifts away from buying" is an explicit, model-readable instruction. The Agents SDK reads it.

Step five: budget the max handoff depth. We default to 8 transitions per call. Beyond that, the call escalates to a human via the After-Hours triage path. This prevents pathological handoff loops.

Step six: instrument every transition. Each handoff is a row in PostgreSQL with the trigger, the from-agent, the to-agent, the call ID, and the timestamp. Post-call analytics by gpt-4o-mini reads these rows and attributes outcomes to topology decisions.

The output of these six steps is a per-vertical agent graph. CallSphere's graphs are version-controlled in git alongside the prompts. We treat the topology as code.

Try CallSphere

See hierarchical handoffs in production. Book a demo or browse Healthcare and Real Estate.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Infrastructure

Defense, ITAR & AI Voice Vendor Compliance in 2026

ITAR technical-data definitions don't care if a human or an LLM produced the output. CMMC Level 2 has been mandatory since November 2025. Here is what an AI voice vendor needs to ship to defense in 2026.

AI Engineering

Latency vs Cost: A Decision Matrix for Voice AI Spend in 2026

Every 100ms of latency costs you. So does every cent per minute. Here is the decision matrix we use across 6 verticals to pick where to spend and where to save on voice AI infrastructure.

AI Infrastructure

WebRTC Over QUIC and the Future of Realtime: Where Voice AI Goes After 2026

WebTransport is Baseline as of March 2026. Media Over QUIC ships in production within the year. Here is what changes for AI voice agents — and what stays the same.

Agentic AI

OpenAI Agents SDK vs Assistants API in 2026: Migration Guide with Eval Parity

Honest principal-engineer comparison of the OpenAI Agents SDK and the legacy Assistants API, with a migration checklist and eval-parity strategy so you don't ship regressions.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Parallel Tool Calling in the OpenAI Agents SDK: When It Helps, When It Hurts (2026)

OpenAI's parallel function calling can cut latency in half — or burn money on dependent calls. The architecture, code, and an eval that proves the win.