Skip to content
Learn Agentic AI
Learn Agentic AI10 min read10 views

Conversation Summarization: Generating Concise Summaries of Long Agent Interactions

Build conversation summarization systems that generate concise, actionable summaries of long AI agent interactions with key point extraction, decision tracking, and follow-up items.

Why Summarize Conversations?

Long conversations with AI agents accumulate context that becomes unwieldy. A 30-message support interaction buries the actual decisions and next steps under layers of troubleshooting dialog. Conversation summarization extracts the essential information — what was discussed, what was decided, what actions remain — and presents it in a form that humans and other agents can use efficiently.

Summaries serve multiple purposes: handoff context when transferring to a human agent, session continuity when a user returns later, audit trails for compliance, and analytics data for improving agent performance.

Modeling Conversation Turns

Start by structuring raw conversation data into a form suitable for summarization.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional

class TurnType(Enum):
    GREETING = "greeting"
    QUESTION = "question"
    ANSWER = "answer"
    ACTION = "action"
    DECISION = "decision"
    COMPLAINT = "complaint"
    RESOLUTION = "resolution"
    SMALL_TALK = "small_talk"

@dataclass
class ConversationTurn:
    speaker: str  # "user" or "agent"
    content: str
    timestamp: datetime
    turn_type: TurnType = TurnType.ANSWER
    importance: float = 0.5  # 0.0-1.0
    entities: dict = field(default_factory=dict)
    is_key_point: bool = False

class TurnClassifier:
    def __init__(self):
        self.type_indicators = {
            TurnType.QUESTION: ["?", "how", "what", "when", "can you"],
            TurnType.COMPLAINT: [
                "problem", "issue", "broken", "wrong", "not working",
            ],
            TurnType.DECISION: [
                "let's go with", "i'll take", "yes proceed",
                "confirmed", "agreed",
            ],
            TurnType.ACTION: [
                "i've initiated", "done", "completed",
                "processed", "updated", "created",
            ],
            TurnType.RESOLUTION: [
                "resolved", "fixed", "that works", "thank you",
                "all set", "that solves",
            ],
            TurnType.GREETING: [
                "hello", "hi ", "hey", "good morning", "good afternoon",
            ],
        }
        self.high_importance_types = {
            TurnType.DECISION, TurnType.ACTION,
            TurnType.RESOLUTION, TurnType.COMPLAINT,
        }

    def classify(self, turn: ConversationTurn) -> ConversationTurn:
        content_lower = turn.content.lower()
        best_type = TurnType.ANSWER
        best_score = 0

        for turn_type, indicators in self.type_indicators.items():
            hits = sum(1 for ind in indicators if ind in content_lower)
            if hits > best_score:
                best_score = hits
                best_type = turn_type

        turn.turn_type = best_type
        turn.importance = (
            0.8 if best_type in self.high_importance_types else 0.4
        )
        turn.is_key_point = turn.importance >= 0.7
        return turn

Key Point Extraction

Not every turn matters for the summary. Extract key points — decisions, actions, complaints, and resolutions — while filtering noise.

@dataclass
class KeyPoint:
    content: str
    category: str
    timestamp: datetime
    speaker: str

class KeyPointExtractor:
    def __init__(self, importance_threshold: float = 0.6):
        self.threshold = importance_threshold
        self.classifier = TurnClassifier()

    def extract(
        self, turns: list[ConversationTurn]
    ) -> list[KeyPoint]:
        classified = [self.classifier.classify(t) for t in turns]
        key_points = []

        for turn in classified:
            if turn.importance < self.threshold:
                continue

            # Skip near-duplicate key points
            if key_points and self._is_redundant(
                turn.content, key_points[-1].content
            ):
                continue

            key_points.append(KeyPoint(
                content=self._clean_content(turn.content),
                category=turn.turn_type.value,
                timestamp=turn.timestamp,
                speaker=turn.speaker,
            ))

        return key_points

    def _is_redundant(self, new: str, existing: str) -> bool:
        new_words = set(new.lower().split())
        existing_words = set(existing.lower().split())
        if not new_words or not existing_words:
            return False
        overlap = len(new_words & existing_words)
        return overlap / len(new_words) > 0.7

    def _clean_content(self, content: str) -> str:
        # Remove filler phrases
        fillers = [
            "um ", "uh ", "well ", "so basically ",
            "i mean ", "you know ",
        ]
        result = content
        for filler in fillers:
            result = result.replace(filler, "")
        return result.strip()

The Summarization Engine

Combine key points into structured, actionable summaries with distinct sections.

@dataclass
class ConversationSummary:
    topic: str
    duration_minutes: float
    total_turns: int
    key_points: list[KeyPoint]
    decisions: list[str]
    actions_taken: list[str]
    pending_items: list[str]
    outcome: str
    formatted: str = ""

class SummarizationEngine:
    def __init__(self):
        self.extractor = KeyPointExtractor()

    def summarize(
        self, turns: list[ConversationTurn], topic: str = "Support Interaction"
    ) -> ConversationSummary:
        if not turns:
            return ConversationSummary(
                topic=topic, duration_minutes=0,
                total_turns=0, key_points=[],
                decisions=[], actions_taken=[],
                pending_items=[], outcome="No conversation data.",
            )

        key_points = self.extractor.extract(turns)
        duration = (
            turns[-1].timestamp - turns[0].timestamp
        ).total_seconds() / 60

        decisions = [
            kp.content for kp in key_points
            if kp.category == "decision"
        ]
        actions = [
            kp.content for kp in key_points
            if kp.category == "action"
        ]
        complaints = [
            kp.content for kp in key_points
            if kp.category == "complaint"
        ]

        outcome = self._determine_outcome(key_points)
        pending = self._find_pending_items(turns, actions)

        summary = ConversationSummary(
            topic=topic,
            duration_minutes=round(duration, 1),
            total_turns=len(turns),
            key_points=key_points,
            decisions=decisions,
            actions_taken=actions,
            pending_items=pending,
            outcome=outcome,
        )
        summary.formatted = self._format(summary, complaints)
        return summary

    def _determine_outcome(self, key_points: list[KeyPoint]) -> str:
        has_resolution = any(
            kp.category == "resolution" for kp in key_points
        )
        has_complaint = any(
            kp.category == "complaint" for kp in key_points
        )

        if has_resolution:
            return "Resolved"
        if has_complaint:
            return "Unresolved - requires follow-up"
        return "Completed"

    def _find_pending_items(
        self, turns: list[ConversationTurn], completed_actions: list[str]
    ) -> list[str]:
        pending = []
        for turn in turns:
            lower = turn.content.lower()
            if any(
                phrase in lower
                for phrase in ["will follow up", "i'll check", "get back to",
                               "pending", "waiting for"]
            ):
                pending.append(turn.content)
        return pending

    def _format(
        self, summary: ConversationSummary, complaints: list[str]
    ) -> str:
        lines = [
            f"## {summary.topic}",
            f"Duration: {summary.duration_minutes} min | "
            f"Turns: {summary.total_turns} | "
            f"Outcome: {summary.outcome}",
            "",
        ]

        if complaints:
            lines.append("### Issues Reported")
            for c in complaints:
                lines.append(f"- {c}")
            lines.append("")

        if summary.decisions:
            lines.append("### Decisions Made")
            for d in summary.decisions:
                lines.append(f"- {d}")
            lines.append("")

        if summary.actions_taken:
            lines.append("### Actions Taken")
            for a in summary.actions_taken:
                lines.append(f"- {a}")
            lines.append("")

        if summary.pending_items:
            lines.append("### Pending Follow-Up")
            for p in summary.pending_items:
                lines.append(f"- {p}")

        return "\n".join(lines)

Using the Engine

from datetime import datetime, timedelta

base = datetime(2026, 3, 17, 10, 0)
turns = [
    ConversationTurn("user", "Hi, I have a billing problem",
                     base, TurnType.COMPLAINT),
    ConversationTurn("agent", "I'm sorry to hear that. What's the issue?",
                     base + timedelta(seconds=15)),
    ConversationTurn("user", "I was charged twice for order ORD-9921",
                     base + timedelta(seconds=45), TurnType.COMPLAINT),
    ConversationTurn("agent", "I've found the duplicate charge and "
                     "processed a refund of $49.99.",
                     base + timedelta(minutes=2), TurnType.ACTION),
    ConversationTurn("user", "Yes proceed with the refund, confirmed.",
                     base + timedelta(minutes=3), TurnType.DECISION),
    ConversationTurn("agent", "Refund completed. It will appear in "
                     "3-5 business days.",
                     base + timedelta(minutes=4), TurnType.RESOLUTION),
    ConversationTurn("user", "Thank you, that solves my issue.",
                     base + timedelta(minutes=5), TurnType.RESOLUTION),
]

engine = SummarizationEngine()
summary = engine.summarize(turns, topic="Billing: Duplicate Charge")
print(summary.formatted)

This produces a clean summary with issues, decisions, actions, and outcome — ready for agent handoff or session records.

FAQ

When should summarization be triggered?

Trigger summarization at three points: at conversation end for archival and analytics, at agent handoff so the receiving agent has full context, and at session timeout so returning users can review what happened. For long conversations (over 20 turns), also generate running summaries every 10 turns to keep the active context window manageable.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

How do you handle multi-topic conversations in a single summary?

Detect topic shifts using intent classification and segment the conversation into topic blocks before summarizing. Generate a per-topic summary and a brief overall summary. This prevents important details from one topic being buried by the volume of another. Use headings in the formatted output to visually separate topics.

What makes a summary actionable versus just informative?

An actionable summary includes three elements: what happened (key points), what was decided (decisions), and what still needs to happen (pending items with owners and deadlines where available). Summaries that only list what was discussed without extracting decisions and next steps force the reader to re-read the full conversation anyway, defeating the purpose.


#Summarization #ConversationAnalytics #NLP #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.

Agentic AI

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Memory is supposed to make agents better — but does it? Build a memory eval pipeline that measures recall, precision, contradiction rate, and the freshness/staleness tradeoff.

Agentic AI

Agent Memory in LangGraph 2026: Short-Term, Long-Term, and the Patterns That Survive Production

How short-term (thread-scoped) and long-term (cross-thread) memory actually work in LangGraph, with code, schemas, and the eviction policies that keep cost predictable.

AI Strategy

Enterprise CIO Guide: Zep 2.0 — Temporal Knowledge Graphs for Agent Memory

Enterprise CIO Guide perspective on Zep 2.0's Graphiti engine adds temporal knowledge graphs to agent memory — the right data structure for fact updates over time.

Agentic AI

Smolagents: Hugging Face's Code-First Agent Framework Reviewed

Smolagents lets agents write Python instead of JSON. Why code-as-action reduces tool errors and where the security trade-offs are for production deployments.

AI Strategy

Enterprise CIO Guide: Letta 1.0 — The Agent OS for Stateful Agents

Enterprise CIO Guide perspective on Letta (formerly MemGPT) hit 1.0 as a full agent OS — memory, tools, runtime, and dashboard in one platform.