AI Agent Memory Systems: Short-Term, Long-Term, and Episodic Memory

Why Agents Need Memory

An LLM without memory is like a person with amnesia — brilliant in the moment but unable to learn from past interactions. Every API call starts from scratch. The model does not remember what tools it called, what the user said yesterday, or what mistakes it made last time.

Memory gives agents continuity. It allows them to reference previous conversations, avoid repeating errors, accumulate knowledge over time, and provide personalized responses. Without memory, every agent interaction is isolated and context-free.

The Three Types of Agent Memory

Agent memory systems draw from cognitive science, mapping roughly to how human memory works.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

1. Short-Term Memory (Working Memory)

Short-term memory is the conversation history — the messages array that gets sent to the LLM on every call. It holds the current task context: what the user asked, what tools were called, and what results came back.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

class ShortTermMemory:
    """Manages the conversation context window."""

    def __init__(self, max_messages: int = 50):
        self.messages: list[dict] = []
        self.max_messages = max_messages

    def add(self, message: dict):
        self.messages.append(message)
        # Evict oldest messages if over limit (keep system prompt)
        if len(self.messages) > self.max_messages:
            system = [m for m in self.messages if m["role"] == "system"]
            others = [m for m in self.messages if m["role"] != "system"]
            # Keep system messages + most recent others
            self.messages = system + others[-(self.max_messages - len(system)):]

    def get_context(self) -> list[dict]:
        return self.messages.copy()

When to use: Always. Every agent has short-term memory — it is the conversation itself. The challenge is managing its size as conversations grow, since LLMs have finite context windows.

2. Long-Term Memory (Semantic Memory)

Long-term memory persists across conversations. It stores facts, user preferences, learned procedures, and domain knowledge that the agent can retrieve when relevant. Typically implemented with a vector database.

from openai import OpenAI

client = OpenAI()

class LongTermMemory:
    """Vector-based persistent memory store."""

    def __init__(self):
        self.memories: list[dict] = []  # In production, use a vector DB

    def store(self, content: str, metadata: dict = None):
        embedding = client.embeddings.create(
            model="text-embedding-3-small",
            input=content,
        ).data[0].embedding

        self.memories.append({
            "content": content,
            "embedding": embedding,
            "metadata": metadata or {},
        })

    def recall(self, query: str, top_k: int = 5) -> list[str]:
        query_embedding = client.embeddings.create(
            model="text-embedding-3-small",
            input=query,
        ).data[0].embedding

        # Cosine similarity search
        scored = []
        for mem in self.memories:
            score = self._cosine_similarity(query_embedding, mem["embedding"])
            scored.append((score, mem["content"]))

        scored.sort(reverse=True, key=lambda x: x[0])
        return [content for _, content in scored[:top_k]]

    @staticmethod
    def _cosine_similarity(a: list[float], b: list[float]) -> float:
        dot = sum(x * y for x, y in zip(a, b))
        norm_a = sum(x ** 2 for x in a) ** 0.5
        norm_b = sum(x ** 2 for x in b) ** 0.5
        return dot / (norm_a * norm_b) if norm_a and norm_b else 0.0

When to use: When your agent interacts with the same user across multiple sessions, needs to remember facts from previous conversations, or must access a large knowledge base that does not fit in the context window.

3. Episodic Memory (Experience Memory)

Episodic memory stores complete past experiences — entire task executions with their inputs, steps taken, outcomes, and what went right or wrong. This lets agents learn from their own history.

from datetime import datetime
from dataclasses import dataclass, field

@dataclass
class Episode:
    task: str
    steps: list[dict]
    outcome: str  # "success", "failure", "partial"
    lessons: list[str]
    timestamp: str = field(default_factory=lambda: datetime.now().isoformat())

class EpisodicMemory:
    """Stores and retrieves complete task episodes."""

    def __init__(self):
        self.episodes: list[Episode] = []

    def record(self, episode: Episode):
        self.episodes.append(episode)

    def recall_similar(self, task: str, max_episodes: int = 3) -> list[Episode]:
        """Find episodes with similar tasks. In production, use embeddings."""
        # Simple keyword matching — replace with vector search
        scored = []
        task_words = set(task.lower().split())
        for ep in self.episodes:
            ep_words = set(ep.task.lower().split())
            overlap = len(task_words & ep_words) / max(len(task_words), 1)
            scored.append((overlap, ep))
        scored.sort(reverse=True, key=lambda x: x[0])
        return [ep for _, ep in scored[:max_episodes]]

    def get_lessons_for_task(self, task: str) -> list[str]:
        """Extract lessons learned from similar past tasks."""
        similar = self.recall_similar(task)
        lessons = []
        for ep in similar:
            lessons.extend(ep.lessons)
        return lessons

When to use: When your agent performs recurring tasks and you want it to improve over time. Episodic memory is particularly valuable for agents that handle operations tasks (deployments, incident response) where learning from past incidents directly improves future performance.

Combining All Three Memory Types

In practice, a well-designed agent uses all three types together:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

def build_agent_context(
    user_input: str,
    short_term: ShortTermMemory,
    long_term: LongTermMemory,
    episodic: EpisodicMemory,
) -> list[dict]:
    # Start with short-term (conversation history)
    messages = short_term.get_context()

    # Inject relevant long-term memories
    relevant_facts = long_term.recall(user_input, top_k=3)
    if relevant_facts:
        memory_text = "Relevant information from previous interactions:\n"
        memory_text += "\n".join(f"- {fact}" for fact in relevant_facts)
        messages.insert(1, {"role": "system", "content": memory_text})

    # Inject lessons from similar past tasks
    lessons = episodic.get_lessons_for_task(user_input)
    if lessons:
        lesson_text = "Lessons from similar past tasks:\n"
        lesson_text += "\n".join(f"- {lesson}" for lesson in lessons)
        messages.insert(1, {"role": "system", "content": lesson_text})

    return messages

This function builds the complete context for each LLM call by layering memories: the current conversation (short-term), relevant facts (long-term), and past experience (episodic). The LLM receives all of this as context and can draw on any memory type during reasoning.

FAQ

How do I decide which memory type to implement first?

Start with short-term memory — you already have it (the messages array). Add long-term memory next if your agent serves repeat users or needs access to a knowledge base. Add episodic memory last, as it requires tracking complete task executions and extracting lessons, which adds significant complexity.

Will memory make my agent slower?

Long-term memory recall adds latency (typically 50-200ms for a vector database query). However, the accuracy gains far outweigh the latency cost. You can mitigate this by running memory retrieval in parallel with other operations, caching frequent queries, and limiting the number of memories injected into context.

How do I prevent memory from growing indefinitely?

Implement memory eviction policies. For short-term memory, use a sliding window or summarize older messages. For long-term memory, set a maximum size and evict based on recency, relevance score, or access frequency. For episodic memory, keep only episodes from the last N days or the top-K most relevant episodes per task category.

#AIMemory #AIAgents #RAG #VectorDatabase #Python #AgenticAI #LearnAI #AIEngineering

AI Agent Memory Systems: Short-Term, Long-Term, and Episodic Memory

Why Agents Need Memory

The Three Types of Agent Memory

1. Short-Term Memory (Working Memory)

2. Long-Term Memory (Semantic Memory)

3. Episodic Memory (Experience Memory)

Combining All Three Memory Types

FAQ

How do I decide which memory type to implement first?

Will memory make my agent slower?

How do I prevent memory from growing indefinitely?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

Production RAG Agents with LangChain and RAGAS Evaluation in 2026