Skip to content
Learn Agentic AI
Learn Agentic AI12 min read15 views

Nested Handoff History and Conversation Management in Multi-Agent Systems

Learn how to manage conversation history across agent boundaries using nest_handoff_history, per-handoff overrides, CONVERSATION HISTORY blocks, and handoff_history_mapper in the OpenAI Agents SDK.

The Context Challenge in Multi-Agent Systems

When multiple agents collaborate on a task, conversation history management becomes critical. Each handoff creates a decision point: should the target agent see everything that happened before, a filtered subset, or a restructured view of the history?

The OpenAI Agents SDK provides several mechanisms for controlling how conversation history flows across agent boundaries. Understanding these mechanisms is the difference between a multi-agent system that works reliably and one that confuses itself with irrelevant context.

nest_handoff_history in RunConfig

The nest_handoff_history flag in RunConfig controls the fundamental structure of how history is presented to target agents after a handoff. When enabled, it wraps the pre-handoff conversation in a clearly delimited block rather than flattening it into the target agent's message stream.

flowchart TD
    INPUT(["Task input"])
    SUPER["Supervisor agent<br/>plans plus monitors"]
    W1["Worker 1<br/>research"]
    W2["Worker 2<br/>code"]
    W3["Worker 3<br/>writing"]
    CRITIC{"Output meets<br/>rubric?"}
    REWORK["Rework or<br/>retry path"]
    SHARED[("Shared scratchpad<br/>and memory")]
    OUT(["Final result"])
    INPUT --> SUPER
    SUPER --> W1 --> CRITIC
    SUPER --> W2 --> CRITIC
    SUPER --> W3 --> CRITIC
    W1 --> SHARED
    W2 --> SHARED
    W3 --> SHARED
    SHARED --> SUPER
    CRITIC -->|Pass| OUT
    CRITIC -->|Fail| REWORK --> SUPER
    style SUPER fill:#4f46e5,stroke:#4338ca,color:#fff
    style CRITIC fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OUT fill:#059669,stroke:#047857,color:#fff
    style SHARED fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

Default Behavior (nest_handoff_history=False)

By default, the target agent receives the full conversation history as a flat sequence of messages. This means the target agent sees all previous messages as if they were part of its own conversation:

from agents import Agent, Runner, handoff, RunConfig
import asyncio

agent_b = Agent(
    name="AgentB",
    instructions="You are Agent B. Continue the conversation.",
    model="gpt-4o",
)

agent_a = Agent(
    name="AgentA",
    instructions="Greet the user, then hand off to Agent B.",
    model="gpt-4o",
    handoffs=[handoff(agent_b, description="Transfer to Agent B")],
)

async def main():
    # Default: flat history
    config = RunConfig()
    result = await Runner.run(
        agent_a,
        input="Hello, I need help with my account.",
        run_config=config,
    )
    print(result.final_output)

asyncio.run(main())

With flat history, Agent B sees something like:

User: Hello, I need help with my account.
Assistant (AgentA): Hi there! Let me transfer you to the right specialist.
[handoff to AgentB]

Agent B cannot easily distinguish which messages came from Agent A versus from the user. This can lead to confusion, especially when Agent A gave instructions or made promises that Agent B should not be bound by.

Nested Behavior (nest_handoff_history=True)

When you enable nested history, the pre-handoff conversation is wrapped in a CONVERSATION HISTORY block:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
from agents import Agent, Runner, handoff, RunConfig
import asyncio

agent_b = Agent(
    name="AgentB",
    instructions="You are Agent B. Review the conversation history and continue helping the user.",
    model="gpt-4o",
)

agent_a = Agent(
    name="AgentA",
    instructions="Greet the user, then hand off to Agent B.",
    model="gpt-4o",
    handoffs=[handoff(agent_b, description="Transfer to Agent B")],
)

async def main():
    config = RunConfig(nest_handoff_history=True)
    result = await Runner.run(
        agent_a,
        input="Hello, I need help with my account.",
        run_config=config,
    )
    print(result.final_output)

asyncio.run(main())

With nested history, Agent B sees something structured like:

--- CONVERSATION HISTORY ---
User: Hello, I need help with my account.
Assistant (AgentA): Hi there! Let me transfer you to the right specialist.
--- END CONVERSATION HISTORY ---

This clear demarcation helps Agent B understand:

  • What was said before it joined
  • Which messages are from the user versus previous agents
  • That the conversation is a continuation, not a fresh start

Per-Handoff History Overrides

You can override the global nest_handoff_history setting on individual handoffs. This lets you use different strategies for different handoff targets:

from agents import Agent, handoff, RunConfig

escalation_agent = Agent(
    name="EscalationAgent",
    instructions="""You are a senior escalation manager. Review the
    full conversation history carefully to understand what has already
    been tried before you intervene.""",
    model="gpt-4o",
)

faq_agent = Agent(
    name="FAQAgent",
    instructions="""You answer frequently asked questions. You do not
    need prior conversation context — just answer the question directly.""",
    model="gpt-4o",
)

triage_agent = Agent(
    name="TriageAgent",
    instructions="Route to the right department.",
    model="gpt-4o",
    handoffs=[
        # Escalation needs full nested history to review what happened
        handoff(
            escalation_agent,
            description="Escalate complex issues",
            nest_handoff_history=True,
        ),
        # FAQ does not need history — start fresh
        handoff(
            faq_agent,
            description="Answer common questions",
            nest_handoff_history=False,
        ),
    ],
)

The per-handoff override takes precedence over the global RunConfig setting. This gives you fine-grained control:

Handoff Target Global Setting Per-Handoff Override Effective Behavior
EscalationAgent False True Nested
FAQAgent True False Flat
SupportAgent True (none) Nested (inherits global)

The CONVERSATION HISTORY Block

When nest_handoff_history is enabled, the SDK wraps prior conversation in a structured block. The target agent receives this as a system or context message before processing continues.

The format is designed to be unambiguous to the LLM:

[CONVERSATION HISTORY FROM PREVIOUS AGENT: AgentA]
User: I need to cancel my subscription.
AgentA: I understand you want to cancel. Let me transfer you to our retention team.
[END CONVERSATION HISTORY]

Why This Matters for Agent Quality

Without nesting, a common failure mode occurs when the target agent "adopts" the previous agent's persona. If Agent A said "I'll look into that for you," Agent B might continue as if it made that promise. With nested history, Agent B clearly sees this was a different agent's statement.

Another failure mode is tool confusion. If Agent A called tools and the results are in the flat history, Agent B might try to reference those tool results as if they were its own. Nesting makes the boundary explicit.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

handoff_history_mapper for Custom Forwarding

For maximum control, use handoff_history_mapper — a function that transforms the conversation history into whatever format you want before it reaches the target agent:

from agents import Agent, handoff

def summarize_history_mapper(history: list) -> list:
    """Replace full history with a summary message."""
    if not history:
        return history

    # Extract just the user messages
    user_messages = []
    for msg in history:
        if hasattr(msg, 'role') and msg.role == 'user':
            content = msg.content if isinstance(msg.content, str) else str(msg.content)
            user_messages.append(content)

    summary = "Previous conversation summary:\n"
    for i, msg in enumerate(user_messages, 1):
        summary += f"{i}. User said: {msg}\n"

    # Return a single summary message
    return [{"role": "system", "content": summary}]

specialist_agent = Agent(
    name="Specialist",
    instructions="Help the user based on the conversation summary provided.",
    model="gpt-4o",
)

triage_agent = Agent(
    name="Triage",
    instructions="Route to specialist.",
    model="gpt-4o",
    handoffs=[
        handoff(
            specialist_agent,
            description="Specialist for complex issues",
            handoff_history_mapper=summarize_history_mapper,
        ),
    ],
)

Advanced History Mapper: Role-Based Filtering

def role_based_mapper(allowed_roles: list[str]):
    """Create a mapper that only forwards messages from specific roles."""
    def _mapper(history: list) -> list:
        filtered = []
        for msg in history:
            if hasattr(msg, 'role') and msg.role in allowed_roles:
                filtered.append(msg)
        return filtered
    return _mapper

# Only forward user and system messages — strip all assistant responses
triage_agent = Agent(
    name="Triage",
    instructions="Route to specialist.",
    model="gpt-4o",
    handoffs=[
        handoff(
            specialist_agent,
            description="Specialist",
            handoff_history_mapper=role_based_mapper(["user", "system"]),
        ),
    ],
)

History Mapper with Token Counting

For production systems where context window management is critical:

def token_budget_mapper(max_tokens: int = 2000):
    """Keep only the most recent messages that fit within a token budget."""
    def _mapper(history: list) -> list:
        # Rough approximation: 4 chars ≈ 1 token
        budget = max_tokens
        result = []

        # Process from most recent to oldest
        for msg in reversed(history):
            content = ""
            if hasattr(msg, 'content'):
                content = msg.content if isinstance(msg.content, str) else str(msg.content)
            estimated_tokens = len(content) // 4

            if estimated_tokens <= budget:
                result.insert(0, msg)
                budget -= estimated_tokens
            else:
                break

        return result
    return _mapper

triage_agent = Agent(
    name="Triage",
    instructions="Route to specialist.",
    model="gpt-4o",
    handoffs=[
        handoff(
            specialist_agent,
            description="Specialist",
            handoff_history_mapper=token_budget_mapper(max_tokens=3000),
        ),
    ],
)

Managing Context Across Agent Boundaries

Beyond history manipulation, there are patterns for managing shared state across agents using the context parameter:

from agents import Agent, Runner, handoff, RunContextWrapper
import asyncio

# Shared context type
class ConversationContext:
    def __init__(self):
        self.customer_id: str | None = None
        self.verified: bool = False
        self.notes: list[str] = []
        self.handoff_chain: list[str] = []

async def track_handoff(context: RunContextWrapper[ConversationContext]) -> None:
    context.context.handoff_chain.append("billing")
    context.context.notes.append("Handed off to billing")

billing_agent = Agent(
    name="BillingAgent",
    instructions="Handle billing. Check context.verified before making changes.",
    model="gpt-4o",
)

verification_agent = Agent(
    name="VerificationAgent",
    instructions="""Verify the customer's identity by asking for their
    account email and last 4 digits of payment method.""",
    model="gpt-4o",
    handoffs=[
        handoff(
            billing_agent,
            description="Transfer to billing after verification",
            on_handoff=track_handoff,
        ),
    ],
)

async def main():
    ctx = ConversationContext()
    ctx.customer_id = "cust_12345"

    result = await Runner.run(
        verification_agent,
        input="I need to dispute a charge on my account",
        context=ctx,
    )

    print(f"Handoff chain: {ctx.handoff_chain}")
    print(f"Notes: {ctx.notes}")

asyncio.run(main())

Best Practices

1. Use nested history for chains longer than 2 agents. When conversations pass through 3 or more agents, flat history becomes confusing. Nesting makes boundaries explicit.

2. Strip tool calls when handing to non-technical agents. If a diagnostic agent ran API health checks, the billing agent does not need to see those tool calls. Use handoff_filters.remove_all_tools or a custom filter.

3. Budget your context window. Each handoff accumulates history. For long-running multi-agent conversations, use handoff_history_mapper with token budgets to keep history within limits.

4. Use the context object for state, not history. Do not rely on conversation history to pass structured state between agents. Use the context parameter on Runner.run() for typed, reliable state sharing.

5. Log handoff history transformations. In production, log what was filtered out so you can debug cases where the target agent lacked necessary context.

Summary

Conversation history management is the unsexy but essential infrastructure of multi-agent systems. Use nest_handoff_history to create clear boundaries between agent conversations. Use per-handoff overrides for different strategies per target. Use handoff_history_mapper for complete control over what gets forwarded. And use the context object for reliable state sharing that does not depend on the LLM interpreting conversation history correctly.

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Human-in-the-Loop Hybrid Agents: 73% Fewer Errors in 2026

Fully autonomous agents are still a fantasy in production. LangGraph's interrupt() lets you pause for human approval mid-graph without losing state. We cover approve/edit/reject/respond actions and CallSphere's escalation ladder.

Agentic AI

Browser Agents with LangGraph + Playwright: Visual Evaluation Pipelines That Don't Lie

Build a browser agent with LangGraph and Playwright that does multi-step web tasks, then ground-truth its work with visual diffs and DOM-based evaluators.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Funding & Industry

OpenAI revenue run-rate — April 2026 read — April 2026 update

OpenAI's April 2026 reported revenue run-rate cleared $13B annualized, on continued ChatGPT growth, agentic Operator monetization, and enterprise API expansion.

Funding & Industry

Stargate progress update — April 2026 site and capex

OpenAI's Stargate with Oracle and SoftBank crossed a milestone in April 2026 with the first Texas site partially energized and three additional sites under construction.

AI Strategy

Enterprise CIO Guide: AutoGen 0.5 — Microsoft's Multi-Agent Refresh

Enterprise CIO Guide perspective on AutoGen 0.5 brings async-first execution, an extension architecture, and tighter Azure integration.