TL;DR — Use handoffs (not "agents-as-tools") when the chosen specialist should own the next reply, not just contribute a paragraph. The OpenAI Agents SDK exposes handoffs as tool calls; LangGraph implements them via subgraphs. CallSphere's after-hours stack uses a 7-rung hierarchical ladder — Primary → Secondary → 6 fallbacks.

The pattern

A handoff is an explicit, irreversible transfer of control from one agent to another. The receiving agent inherits the conversation and decides what happens next. Compare to "agent-as-tool" where the parent agent stays in control and just consumes the child's response.

In a hierarchy, handoffs flow down the tree (delegation) and sometimes back up (escalation). Children rarely talk laterally — when they do, route through the parent.

flowchart TD
  ROOT[Root triage] -->|handoff: clinical| L1[Clinical lead]
  ROOT -->|handoff: ops| L2[Ops lead]
  L1 -->|handoff: scheduling| C1[Scheduler]
  L1 -->|handoff: nurse line| C2[Nurse triage]
  L2 -->|handoff: billing| C3[Billing]
  L2 -->|handoff: insurance| C4[Insurance]
  C2 -->|escalate| L1
  L1 -->|escalate to human| HUMAN[Human RN]

When to use it

The right answer depends on identity — a billing question should be answered by billing, not summarized by triage.
You need clean trace boundaries per specialist for compliance.
Specialists carry their own system prompt, model, and tool set that shouldn't bleed into siblings.
A two-tier (or deeper) decomposition emerges naturally: clinical vs ops, sales vs support, primary vs fallback.

CallSphere implementation

CallSphere's after-hours product is a 7-agent hierarchical handoff ladder:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Primary Aria — front-line triage, answers easy questions, hands off otherwise.
Secondary Aria — handles overflow when Primary times out or hits a tool error.
Six fallbacks — voicemail capture, SMS deflection, callback scheduler, emergency router, missed-caller logger, escalation-to-human.

Each rung has its own prompt, its own toolset, and its own SLA. A handoff carries the full conversation transcript plus structured intent ("scheduling", "emergency", "billing"). Across the platform: 37 agents · 90+ tools · 115+ DB tables · 6 verticals, with handoffs implemented as edges in a Postgres-backed state machine.

Pricing: Starter $149 · Growth $499 · Scale $1,499, 14-day trial, 22% affiliate.

Build steps with code

from agents import Agent, handoff

billing = Agent(name="Billing", instructions="Resolve invoice questions...")
support = Agent(name="Support", instructions="Tech issues only...")

triage = Agent(
    name="Triage",
    instructions="Route to billing or support. Hand off, don't summarize.",
    handoffs=[handoff(billing), handoff(support)]
)
result = await Runner.run(triage, "My invoice is wrong")
print(result.final_output)  # billing replied

In LangGraph the same idea uses Command(goto="billing", update={...}) to jump nodes while updating shared state. Both express irreversible delegation; pick the framework, not the pattern.

Pitfalls

Ping-pong handoffs — A → B → A → B. Add a hop counter; abort at 4.
Dropped context — handoffs that strip history confuse the receiving agent. Always pass the last 6 turns.
Dead-end leaves — fallback agents with no way back to a human. Always include an escalation edge.
Frozen hierarchy — hard-coded trees break when the org adds a 4th department. Externalize the tree to config (YAML or DB).

FAQ

Q: Handoff or agent-as-tool? Handoff if the specialist should reply directly to the user. Agent-as-tool if the parent should integrate and rephrase.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: Can a child hand off back up? Yes — that's escalation. Just track depth so you don't loop.

Q: How does this differ from supervisor + specialists? Supervisor stays in control between every turn. Hierarchical handoffs let a specialist own multiple consecutive turns until it hands back.

Q: Does OpenAI's Agents SDK support nested handoffs? Yes. Each agent can declare its own handoffs, and runs traverse the tree until a terminal agent produces final output.

Q: What about streaming? Handoffs preserve the stream — the receiving agent picks up streaming on the same connection. Frameworks differ; test before production.

Sources

## Hierarchical Handoffs: Tree-Structured Delegation in 2026 — operator perspective Once you've shipped hierarchical Handoffs to a real workload, the design questions change. You stop asking 'can the agent do this?' and start asking 'can the agent do this within a 1.2s p95 and under $0.04 per session?' That contract is what separates a demo from a production system. CallSphere learned this the expensive way while wiring 37 specialized agents to 90+ tools across 115+ database tables — every integration that didn't enforce schemas at the tool boundary eventually paged someone. ## Why this matters for AI voice + chat agents Agentic AI in a real call center is a different beast than a single-LLM chatbot. Instead of one model answering one prompt, you orchestrate a small team: a router that decides intent, specialists that own a vertical (booking, intake, billing, escalation), and tools that read and write to the same Postgres your CRM trusts. Hand-offs are where most production bugs hide — when Agent A passes context to Agent B, anything that isn't explicit in the message gets lost, and the user feels it as the agent "forgetting." That's why the systems that hold up under load are the ones with typed tool schemas, deterministic state stored outside the conversation, and a hard ceiling on tool calls per session. The cost story is just as important: a multi-agent loop can quietly burn 10x the tokens of a single-LLM design if you let it think out loud at every step. The fix isn't a smarter model, it's smaller agents, shorter prompts, cached system messages, and evals that fail the build when p95 latency or per-session cost regresses. CallSphere runs this pattern across 6 verticals in production, and the rule has held every time: the agent you can debug in five minutes will out-survive the agent that's "smarter" on a benchmark. ## FAQs **Q: How do you scale hierarchical Handoffs without blowing up token cost?** A: Scaling comes from constraint, not capability. The deployments that hold up keep each agent narrow, cap tool calls per turn, cache the system prompt, and pin a smaller model for routing while reserving the larger model for synthesis. CallSphere's stack — 37 agents · 90+ tools · 115+ DB tables · 6 verticals live — is sized that way on purpose. **Q: What stops hierarchical Handoffs from looping forever on edge cases?** A: Hard ceilings beat heuristics. A maximum step count, an idempotency key on every tool call, and a fallback to a deterministic script when confidence drops below a threshold are what keep the loop bounded. Evals that simulate noisy inputs catch the rest before they reach a real caller. **Q: Where does CallSphere use hierarchical Handoffs in production today?** A: It's already in production. Today CallSphere runs this pattern in Sales and After-Hours Escalation, alongside the other live verticals (Healthcare, Real Estate, Salon, Sales, After-Hours Escalation, IT Helpdesk). The same orchestrator code path serves voice and chat — the difference is the tool set the router exposes. ## See it live Want to see real estate agents handle real traffic? Spin up a walkthrough at https://realestate.callsphere.tech or grab 20 minutes on the calendar: https://calendly.com/sagar-callsphere/new-meeting.

Hierarchical Handoffs: Tree-Structured Delegation in 2026

The pattern

When to use it

CallSphere implementation

Build steps with code

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

AWS Multi-Agent Orchestrator: Supervisor Routing Patterns Guide

OpenAI revenue run-rate — April 2026 read — April 2026 update

Stargate progress update — April 2026 site and capex