Skip to content
Learn Agentic AI
Learn Agentic AI9 min read14 views

Response Compaction: Managing Long Agent Conversations

Master OpenAIResponsesCompactionSession for automatic and manual compaction of long agent conversations including token management, custom triggers, and compaction strategies.

The Long Conversation Problem

Every AI agent faces a fundamental constraint: the context window. A conversation that starts with a simple question and evolves over dozens of turns accumulates history. At some point, the raw history exceeds the model's context limit — or the input token cost becomes untenable.

Naive solutions (truncating the oldest messages, using a sliding window) throw away potentially important context. The user might reference something from the beginning of the conversation, and if you dropped it, the agent hallucinates or asks the user to repeat themselves.

Response compaction is a smarter approach: instead of dropping old messages, the system summarizes them — compressing the history into a shorter representation that preserves the essential information.

OpenAIResponsesCompactionSession

The OpenAI Agents SDK provides OpenAIResponsesCompactionSession — a session wrapper that automatically compacts conversation history when it gets too long.

flowchart LR
    INPUT(["User intent"])
    PARSE["Parse plus<br/>classify"]
    PLAN["Plan and tool<br/>selection"]
    AGENT["Agent loop<br/>LLM plus tools"]
    GUARD{"Guardrails<br/>and policy"}
    EXEC["Execute and<br/>verify result"]
    OBS[("Trace and metrics")]
    OUT(["Outcome plus<br/>next action"])
    INPUT --> PARSE --> PLAN --> AGENT --> GUARD
    GUARD -->|Pass| EXEC --> OUT
    GUARD -->|Fail| AGENT
    AGENT --> OBS
    style AGENT fill:#4f46e5,stroke:#4338ca,color:#fff
    style GUARD fill:#f59e0b,stroke:#d97706,color:#1f2937
    style OBS fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style OUT fill:#059669,stroke:#047857,color:#fff
from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base_session = SQLiteSession(db_path="./conversations.db")

compaction_session = OpenAIResponsesCompactionSession(
    session=base_session,
)

This wraps any base session with compaction capabilities. When the conversation history crosses a token threshold, the session automatically summarizes older turns before they are sent to the model.

How Auto-Compaction Works

The compaction session monitors the token count of the conversation history. When it crosses the configured threshold, it triggers compaction automatically:

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
  1. The session estimates the token count of all stored items.
  2. If the count exceeds the threshold, compaction is triggered.
  3. The older portion of the conversation is sent to the model for summarization.
  4. The summary replaces the detailed history.
  5. Recent messages are preserved in full detail.
from agents import Agent, Runner
from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base = SQLiteSession(db_path="./compact_demo.db")
session = OpenAIResponsesCompactionSession(session=base)

agent = Agent(
    name="LongConversationAgent",
    instructions="You are a research assistant helping with a long project.",
)

# This conversation can run for hundreds of turns
# Compaction kicks in automatically when history gets too long
async def research_session(session_id: str):
    questions = [
        "Let's research quantum computing applications.",
        "What about quantum error correction?",
        "How does surface code work?",
        # ... hundreds more turns
        "Summarize everything we've discussed about error correction.",
    ]

    for q in questions:
        result = await Runner.run(
            agent, q, session=session, session_id=session_id
        )
        print(result.final_output)

The agent can handle arbitrarily long conversations without hitting context limits or accumulating unbounded costs.

Manual Compaction with run_compaction()

Sometimes you want to trigger compaction explicitly — for example, at the end of a logical section of conversation, or before a handoff to another agent.

from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(session=base)

# After a long discussion, manually compact
await session.run_compaction(session_id="project-alpha")

# Now the history is summarized and shorter
items = await session.get_items("project-alpha")
print(f"Items after compaction: {len(items)}")

Manual compaction is useful at natural conversation boundaries:

async def handle_conversation_phase(
    session: OpenAIResponsesCompactionSession,
    session_id: str,
    agent: Agent,
    messages: list[str],
):
    """Process a phase of conversation, then compact."""
    for msg in messages:
        await Runner.run(agent, msg, session=session, session_id=session_id)

    # Compact after each phase to keep history manageable
    await session.run_compaction(session_id)
    print(f"Phase complete, history compacted for {session_id}")

Disabling Auto-Compaction

If you want full control over when compaction happens, disable the automatic trigger:

session = OpenAIResponsesCompactionSession(
    session=base_session,
    auto_compact=False,  # Disable automatic compaction
)

# Now compaction only happens when you call it explicitly
await session.run_compaction(session_id)

This is useful when:

  • You have custom logic for when compaction should occur
  • You want to compact only at specific conversation milestones
  • You need to ensure compaction does not interrupt time-sensitive interactions

Custom Compaction Triggers with should_trigger_compaction

For fine-grained control, implement a custom callback that decides when compaction should fire:

from agents.extensions.sessions import (
    SQLiteSession,
    OpenAIResponsesCompactionSession,
)

def custom_trigger(items: list, token_estimate: int) -> bool:
    """Custom logic for when to trigger compaction."""
    # Compact if over 50,000 tokens
    if token_estimate > 50_000:
        return True

    # Compact if over 100 items regardless of token count
    if len(items) > 100:
        return True

    # Don't compact small conversations
    return False

base = SQLiteSession(db_path="./sessions.db")
session = OpenAIResponsesCompactionSession(
    session=base,
    should_trigger_compaction=custom_trigger,
)

Advanced: Time-Based Compaction

Compact history that is older than a certain threshold:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

from datetime import datetime, timedelta

def time_based_trigger(items: list, token_estimate: int) -> bool:
    """Compact if the oldest item is more than 2 hours old."""
    if not items:
        return False

    oldest_timestamp = items[0].get("created_at")
    if oldest_timestamp:
        age = datetime.utcnow() - datetime.fromisoformat(oldest_timestamp)
        if age > timedelta(hours=2) and token_estimate > 10_000:
            return True

    return False

Token Management in Long Conversations

Compaction is one part of a broader token management strategy. Here is a complete approach:

Layer 1: Session Limits

Cap the number of items loaded from the session:

from agents.extensions.sessions import SessionSettings

settings = SessionSettings(limit=50)

Layer 2: Compaction

Summarize older history to reduce token usage:

session = OpenAIResponsesCompactionSession(session=base)

Layer 3: Token Budgeting

Track and budget token usage across the conversation:

class TokenBudgetManager:
    def __init__(self, max_input_tokens: int = 100_000):
        self.max_input_tokens = max_input_tokens
        self.total_input_tokens = 0
        self.total_output_tokens = 0

    def track_usage(self, result):
        """Track token usage from a run result."""
        usage = result.raw_responses[-1].usage
        self.total_input_tokens += usage.input_tokens
        self.total_output_tokens += usage.output_tokens

    def should_compact(self) -> bool:
        """Signal compaction when approaching budget."""
        return self.total_input_tokens > self.max_input_tokens * 0.8

    def get_report(self) -> dict:
        return {
            "total_input": self.total_input_tokens,
            "total_output": self.total_output_tokens,
            "budget_remaining": self.max_input_tokens - self.total_input_tokens,
        }

Combining All Layers

budget = TokenBudgetManager(max_input_tokens=200_000)

async def managed_conversation(session_id: str, message: str):
    result = await Runner.run(
        agent,
        message,
        session=compaction_session,
        session_id=session_id,
        session_settings=SessionSettings(limit=80),
    )

    budget.track_usage(result)

    if budget.should_compact():
        await compaction_session.run_compaction(session_id)
        print("Compacted due to token budget pressure")

    return result.final_output

What Gets Preserved During Compaction

Compaction is not lossy — it is a summarization. The model that performs compaction is instructed to preserve:

  • Key facts and decisions made during the conversation
  • User preferences and stated requirements
  • Action items and commitments
  • Names, dates, numbers, and other specific details
  • The overall trajectory and context of the conversation

What gets compressed:

  • Verbose explanations that can be summarized
  • Back-and-forth clarification exchanges
  • Redundant information repeated across turns
  • Tool call details (replaced with outcome summaries)

The result is a compact representation that captures the essence of the conversation while using far fewer tokens.

Sources:

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like