Skip to content
Agentic AI
Agentic AI10 min read0 views

Time-Decay Memory for Chat Agents: Ebbinghaus Curves in Practice

Good agent memory needs to forget. Time-decay weights recent memories higher; Ebbinghaus-style curves auto-evict stale entries; TTL tiers keep allergies forever and small-talk for an hour.

TL;DR — Agents that never forget end up flooded with stale, irrelevant context. Time-decay memory weights recent memories higher (exponential decay on recency), uses TTL tiers for category-specific lifetimes ("dietary allergies" = forever; "today's mood" = 24 hours), and auto-evicts low-utility entries. The 2026 best-of-class agents use Ebbinghaus-curve decay with reinforcement on recall.

The technique

Naive memory: dump every turn into a vector store, retrieve top-K each time. Three failures: (1) stale facts (the user moved cities a year ago); (2) salience inversion (the agent prefers a single vivid memory over a more recent contradicting one); (3) cost (memory grows without bound).

Time-decay memory multiplies semantic similarity by a recency function: score = sim * exp(-lambda * age). lambda controls half-life; longer half-lives for stable facts, shorter for volatile state.

Ebbinghaus-curve memory goes further: each memory has a continuous decay rate. Successful recalls reinforce the memory (push the curve out); unused memories decay and eventually evict.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
flowchart LR
  T[New turn] --> EX[Extract facts]
  EX --> TT{TTL tier}
  TT -->|allergy| INF[Infinite TTL]
  TT -->|preference| LONG[1y TTL]
  TT -->|context| SHORT[7d TTL]
  TT -->|chat-only| SES[session]
  INF --> S[(Memory store)]
  LONG --> S
  SHORT --> S
  Q[Query] --> R[Retrieve]
  R --> SC[score = sim * exp -lambda*age]
  SC --> RE[Reinforce on hit]
  RE --> S

How it works

Each memory entry: { id, text, embedding, created_at, last_accessed_at, ttl_tier, decay_lambda, hit_count }. At write time, an LLM tags the fact with a TTL tier (immutable / long / short / session) and an initial decay_lambda. At retrieval, the score is cos(q, m.embedding) * exp(-m.decay_lambda * (now - m.last_accessed_at)). On a hit, last_accessed_at updates; hit_count increments; decay_lambda decreases (memory hardens). A nightly job evicts entries where exp(-lambda * age) < 0.05 and hit_count == 0.

CallSphere implementation

Every CallSphere voice/chat agent runs time-decay memory:

  • Allergies + insurance numbers in Healthcare = infinite TTL
  • Preferred broker / preferred school district in OneRoof = 1-year TTL
  • Last 5 ticket subjects in UrackIT IT helpdesk = 30-day TTL
  • Mood, current task, in-call context = session-only

Decay parameters live per vertical. Healthcare's medication-allergy memory has lambda = 0 (immutable). Real-estate buyer urgency ("we want to close in 30 days") has lambda = 0.05/day so it fades after the buying window.

37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. See multi-turn memory at work on /demo.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Build steps with code

import math, time

TTL_TIERS = {
    "immutable": (0.0, None),     # never evict, no decay
    "long":      (0.001, 365*86400),  # 1 year
    "short":     (0.01, 30*86400),
    "session":   (0.1, 86400),
}

def write_memory(text):
    tier = classify_ttl(text)        # LLM call: returns one of TTL_TIERS keys
    lam, ttl = TTL_TIERS[tier]
    db.insert("memory", {
        "text": text, "embedding": embed(text),
        "created_at": time.time(), "last_accessed_at": time.time(),
        "ttl_tier": tier, "decay_lambda": lam, "hit_count": 0,
    })

def retrieve(q, top_k=5):
    cands = vector_search(embed(q), k=50)
    now = time.time()
    scored = [
        (m, m.cos_sim * math.exp(-m.decay_lambda * (now - m.last_accessed_at)))
        for m in cands
    ]
    top = sorted(scored, key=lambda x: -x[1])[:top_k]
    for m, _ in top:
        db.update("memory", m.id, {
            "last_accessed_at": now,
            "hit_count": m.hit_count + 1,
            "decay_lambda": m.decay_lambda * 0.9,  # reinforce
        })
    return [m for m, _ in top]
  1. LLM-classify TTL on write. The classifier is the silent ranker.
  2. Reinforce on retrieval; do not just return — update.
  3. Run a nightly evictor for hit_count == 0 and effective_score < 0.05.
  4. Cap memory size per user; spillover evicts oldest session-tier first.

Pitfalls

  • Wrong TTL classifier: tagging "I love pizza" as immutable pollutes future calls. Calibrate.
  • Decay too aggressive: agent forgets a real allergy. Always test on a golden set.
  • No staleness detection: a "highly retrieved" memory is not necessarily correct. Add explicit contradiction handling.
  • Reinforcement loop: mis-classified memory keeps getting hit, never decays. Add a max_hit_count guardrail.

FAQ

Decay or TTL? Both. TTL is the floor (mass eviction), decay is the score modifier.

Embedding store or graph? Hybrid — embedding for fuzzy recall, graph for entity-heavy recalls. See vw6g-15 on graph memory.

Per-user or global? Per-user always. Cross-user memory is a privacy violation.

Cost? ~$0.001 per memory write (the TTL classifier). Cheap.

See it on /demo? Yes — the multi-turn demo logs decay scores in the trace panel.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

Agentic AI

Chat Agents With Inline Surveys and Star Ratings: CSAT and NPS Without Friction in 2026

78% of issues resolve via AI bots and 87% of users report positive experiences. Here is how 2026 chat agents fire inline 1–5 stars, NPS chips, and follow-up CSAT without survey fatigue.

Agentic AI

Chat for Refund and Cancellation Flow in B2B SaaS: 2026 Production Patterns

Companies that safely automate 60 to 80 percent of refund requests with verifiable accuracy reduce costs and improve customer experience. Here is how to ship a chat-driven refund and cancellation flow without losing the customer.

AI Infrastructure

Agent Personalization at Scale: Patterns That Work for 1M Users

Personalizing agents for one user is easy. Personalizing them for a million users is a memory-tier problem. The hot/warm/cold split and what each tier optimizes for.

Agentic AI

Neo4j Knowledge Graph Memory for AI Agents in 2026

Neo4j's agent-memory project ships short-term, long-term, and reasoning memory in one graph. Microsoft Agent Framework and LangChain both wire it in. Here is the production pattern.

AI Strategy

Outbound Sales Chat in 2026: 11x, Artisan, and Why Pure-AI BDR Replacement Reverted

11x.ai and Artisan promised to replace BDRs entirely. By 2026 most adopters reverted to hybrid models. Here is the outbound chat pattern that actually works.

AI Engineering

Memory Consolidation Patterns for Long-Running Agents in 2026

Long-running agents accumulate noisy state. Five consolidation patterns — summarization, salience scoring, decay, dedup, and refactor — and when each one fits.