Skip to content
Agentic AI
Agentic AI10 min read5 views

Agent Memory Patterns: Episodic, Semantic, and Procedural Stores in Production

Production LLM agents in 2026 separate episodic, semantic, and procedural memory. Here is how to design each store and the tradeoffs that matter.

Why One Memory Store Is Not Enough

Early LLM agents treated memory as one big vector store: dump every conversation chunk, retrieve the nearest neighbors, hope for the best. By 2026, the teams shipping reliable agents at scale have stopped doing this. They borrow the cognitive science taxonomy of episodic, semantic, and procedural memory because each kind needs different storage, different write rules, and very different retrieval behavior.

This guide walks through the three-store pattern, the tradeoffs that matter in production, and the open-source projects (Letta, Zep, Mem0, MemGPT, Cognee) implementing each piece.

The Three Stores

flowchart TB
    User[User Turn] --> Agent[Agent Orchestrator]
    Agent --> EM[Episodic Memory<br/>Time-stamped events]
    Agent --> SM[Semantic Memory<br/>Distilled facts]
    Agent --> PM[Procedural Memory<br/>Skills + workflows]
    EM --> Vec[(Vector + Time Index)]
    SM --> KG[(Knowledge Graph)]
    PM --> Skill[(Skill Registry)]
    Vec --> Retrieve[Retrieval Layer]
    KG --> Retrieve
    Skill --> Retrieve
    Retrieve --> LLM[LLM Context]

Episodic Memory

Episodic memory is the timeline of what happened. Each entry is a tuple of (timestamp, agent_id, user_id, event_type, content, embedding). The right primitive is a vector store with a strong time dimension — pgvector with a btree on occurred_at, or Zep's purpose-built temporal graph.

Write rule: append-only. Every turn, every tool call, every tool result.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Retrieval rule: hybrid — combine semantic similarity to the current query with recency decay. A simple but durable formula is score = 0.7 * cosine + 0.3 * exp(-age_days / half_life).

Semantic Memory

Semantic memory is the distilled, deduplicated set of facts the agent has learned. "User prefers vegetarian food," "ACME's renewal date is October 15," "the database is named prod-east-1." This is not a transcript — it is the lessons drawn from many transcripts.

The right primitive in 2026 is a knowledge graph. Mem0, Cognee, and Graphiti all implement this with Neo4j, Kuzu, or Memgraph as the backing store. Updates run asynchronously: a background process consumes episodic events and emits CRUD operations on the graph.

Write rule: deduplicate on entity + relation. Use entity resolution (canonical-name matching plus embedding clustering) before insert.

Retrieval rule: graph traversal from the entities mentioned in the query. Limit by hop count (typically 2 or 3) and edge weight.

Procedural Memory

Procedural memory is "how I did X last time it worked." It stores the sequence of tool calls that successfully completed a task type. The right primitive is a skill or workflow registry — JSON documents keyed by task signature, retrieved by similarity to the current goal.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Write rule: only on verified success. Never write a skill from a failed or human-cancelled trajectory.

Retrieval rule: exact or near-exact match on task type, then embed the goal and pick the top-k templates.

The Asynchronous Memory Pipeline

The single biggest mistake in 2026 production agents is doing memory writes inline with the user-facing request. Episodic writes can be inline (low cost), but semantic and procedural writes are LLM-driven and slow. Run them on a queue:

sequenceDiagram
    participant U as User
    participant A as Agent
    participant E as Episodic Store
    participant Q as Queue (NATS / SQS)
    participant W as Memory Worker
    participant S as Semantic + Procedural
    U->>A: Message
    A->>E: append event
    A->>U: response
    E-->>Q: emit event
    Q->>W: deliver
    W->>W: extract facts + skills
    W->>S: upsert

This keeps p95 latency low and makes memory enrichment idempotent and re-runnable.

Forgetting and Conflicts

The hard parts in 2026 are not write or read — they are forgetting and conflict resolution. Three patterns are working in practice:

  • TTL on episodic: keep raw events for 30-90 days, then drop. The semantic store retains what mattered.
  • Provenance on semantic: every fact has the source episode IDs. When a contradicting fact arrives, run a tiny LLM judge to merge or supersede.
  • Versioned procedural: skills are versioned; failures decrement a confidence score; below a threshold, the skill is retired.

Open-Source Implementations Worth Studying

  • Letta (formerly MemGPT) — best reference for the OS-paging analogy applied to LLM context
  • Mem0 — production-ready, three-store implementation with graph backend
  • Zep — temporal knowledge graph as a service
  • Cognee — open-source memory engine with strong GraphRAG support
  • Graphiti — Neo4j-backed temporal graph from Zep, open source

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

Agentic AI

Regression Testing for AI Agents: Catching Silent Breakage Before Users Do

Non-deterministic agents break silently when prompts, models, or tools change. Build a regression pipeline with frozen datasets, semantic diffing, and gate thresholds.

Agentic AI

OpenAI Computer-Use Agents (CUA) in Production: Build + Evaluate a Real Workflow (2026)

Build a working computer-use agent with the OpenAI Computer Use tool — clicks, types, scrolls a real browser — then evaluate task success on a benchmark suite.

Agentic AI

From Trace to Production Fix: An End-to-End Observability Workflow for Agents

A real workflow: user complaint → LangSmith trace → reproduce in dataset → fix → ship → re-eval. Principal-engineer notes, real numbers, honest tradeoffs.

Agentic AI

Online vs Offline Agent Evaluation: The Pre-Deploy / Post-Deploy Split

Offline evals catch regressions before deploy on a fixed dataset. Online evals catch real-world drift on live traffic. You need both — here is how we run them.

Agentic AI

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Step-by-step build of a working agent with the OpenAI Agents SDK — Agent class, tools, handoffs, tracing — plus an eval pipeline that catches regressions before merge.