Skip to content
Learn Agentic AI
Learn Agentic AI10 min read5 views

Memory Consolidation: Compressing and Summarizing Agent Memories Over Time

Build a memory consolidation pipeline that compresses detailed agent memories into summaries, preserving essential information while reducing storage and improving retrieval quality.

Why Raw Memories Do Not Scale

An agent that records every interaction verbatim will accumulate thousands of memory items within days. Searching through raw conversation turns is slow, expensive, and produces noisy results. The agent ends up retrieving five slightly different wordings of the same fact instead of one clean summary.

Memory consolidation solves this by periodically compressing groups of related memories into concise summaries. The detailed records are archived or deleted, and the summary takes their place. This mirrors how human memory works during sleep — the brain replays experiences and encodes the essential patterns while discarding surface details.

Consolidation Triggers

Consolidation should not run after every interaction. It needs a trigger. Common triggers include:

flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

Count-based — consolidate after every N new memories are added to a category.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Time-based — consolidate all memories older than a threshold (e.g., 24 hours).

Size-based — consolidate when the memory store exceeds a storage budget.

from datetime import datetime, timedelta
from dataclasses import dataclass, field

@dataclass
class MemoryItem:
    content: str
    created_at: datetime
    category: str = "general"
    consolidated: bool = False
    metadata: dict = field(default_factory=dict)

class ConsolidationTrigger:
    def __init__(
        self,
        count_threshold: int = 20,
        age_threshold_hours: int = 24,
        size_threshold: int = 100,
    ):
        self.count_threshold = count_threshold
        self.age_threshold = timedelta(hours=age_threshold_hours)
        self.size_threshold = size_threshold

    def should_consolidate(
        self, memories: list[MemoryItem]
    ) -> bool:
        unconsolidated = [
            m for m in memories if not m.consolidated
        ]
        if len(unconsolidated) >= self.count_threshold:
            return True
        if len(memories) >= self.size_threshold:
            return True
        now = datetime.now()
        old_items = [
            m for m in unconsolidated
            if (now - m.created_at) > self.age_threshold
        ]
        if len(old_items) >= 5:
            return True
        return False

Summary Generation

The consolidation engine groups related memories and generates a summary using an LLM. The prompt instructs the model to extract key facts, decisions, and preferences while discarding filler.

from openai import AsyncOpenAI

async def consolidate_memories(
    memories: list[MemoryItem],
    client: AsyncOpenAI,
) -> str:
    combined_text = "\n".join(
        f"- [{m.created_at.isoformat()}] {m.content}"
        for m in memories
    )
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a memory consolidation engine. "
                    "Compress the following memory items into a "
                    "concise summary that preserves all key facts, "
                    "user preferences, decisions, and action items. "
                    "Remove redundancy and filler. Output only the "
                    "summary, no preamble."
                ),
            },
            {
                "role": "user",
                "content": combined_text,
            },
        ],
        temperature=0.1,
    )
    return response.choices[0].message.content

Detail Preservation

Not every detail should be compressed away. Some memories contain exact values that summaries tend to round or generalize — API keys, specific dates, numerical thresholds. A detail preservation step extracts and stores these separately.

import re

def extract_preservable_details(
    memories: list[MemoryItem],
) -> list[dict]:
    details = []
    patterns = {
        "date": r"\d{4}-\d{2}-\d{2}",
        "number": r"\b\d+\.?\d*\b",
        "email": r"[\w.-]+@[\w.-]+",
        "url": r"https?://[^\s]+",
    }
    for mem in memories:
        for detail_type, pattern in patterns.items():
            matches = re.findall(pattern, mem.content)
            for match in matches:
                details.append({
                    "type": detail_type,
                    "value": match,
                    "source": mem.content[:80],
                })
    return details

The Full Consolidation Pipeline

Putting it together, the pipeline groups memories by category, generates summaries, preserves critical details, and replaces the originals.

class MemoryConsolidator:
    def __init__(self, client: AsyncOpenAI):
        self.client = client
        self.trigger = ConsolidationTrigger()

    async def run(
        self, store: list[MemoryItem]
    ) -> list[MemoryItem]:
        if not self.trigger.should_consolidate(store):
            return store

        # Group by category
        groups: dict[str, list[MemoryItem]] = {}
        fresh: list[MemoryItem] = []
        for mem in store:
            if mem.consolidated:
                fresh.append(mem)
                continue
            groups.setdefault(mem.category, []).append(mem)

        # Consolidate each group
        for category, items in groups.items():
            if len(items) < 3:
                fresh.extend(items)
                continue
            summary = await consolidate_memories(
                items, self.client
            )
            details = extract_preservable_details(items)
            consolidated = MemoryItem(
                content=summary,
                created_at=datetime.now(),
                category=category,
                consolidated=True,
                metadata={
                    "source_count": len(items),
                    "preserved_details": details,
                },
            )
            fresh.append(consolidated)

        return fresh

Storage Optimization

After consolidation, the raw memories can be archived to cold storage (a separate database table or file) rather than deleted entirely. This gives you an audit trail while keeping the active memory store lean.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

A typical consolidation cycle reduces memory count by 60 to 80 percent. Running it daily keeps the active store small enough for fast retrieval while preserving all the information that matters.

FAQ

Does summarization lose important nuance?

It can if the prompt is not carefully written. The detail preservation step catches structured data like dates and numbers. For subjective nuance, instruct the LLM to preserve sentiment and reasoning, not just facts. Test by comparing agent behavior before and after consolidation.

How often should consolidation run?

For active agents, once per day or once per 50 new memories is a good starting point. Agents with bursty usage patterns benefit from count-based triggers so consolidation runs after intense sessions rather than during quiet periods.

Can I consolidate already-consolidated memories?

Yes. This is called multi-level consolidation. Daily summaries can be consolidated into weekly summaries, and weekly summaries into monthly summaries. Each level compresses further, creating a pyramid of increasingly abstract knowledge.


#MemoryConsolidation #Summarization #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like