TL;DR — Agents that never forget end up flooded with stale, irrelevant context. Time-decay memory weights recent memories higher (exponential decay on recency), uses TTL tiers for category-specific lifetimes ("dietary allergies" = forever; "today's mood" = 24 hours), and auto-evicts low-utility entries. The 2026 best-of-class agents use Ebbinghaus-curve decay with reinforcement on recall.

The technique

Naive memory: dump every turn into a vector store, retrieve top-K each time. Three failures: (1) stale facts (the user moved cities a year ago); (2) salience inversion (the agent prefers a single vivid memory over a more recent contradicting one); (3) cost (memory grows without bound).

Time-decay memory multiplies semantic similarity by a recency function: score = sim * exp(-lambda * age). lambda controls half-life; longer half-lives for stable facts, shorter for volatile state.

Ebbinghaus-curve memory goes further: each memory has a continuous decay rate. Successful recalls reinforce the memory (push the curve out); unused memories decay and eventually evict.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

flowchart LR
  T[New turn] --> EX[Extract facts]
  EX --> TT{TTL tier}
  TT -->|allergy| INF[Infinite TTL]
  TT -->|preference| LONG[1y TTL]
  TT -->|context| SHORT[7d TTL]
  TT -->|chat-only| SES[session]
  INF --> S[(Memory store)]
  LONG --> S
  SHORT --> S
  Q[Query] --> R[Retrieve]
  R --> SC[score = sim * exp -lambda*age]
  SC --> RE[Reinforce on hit]
  RE --> S

How it works

Each memory entry: { id, text, embedding, created_at, last_accessed_at, ttl_tier, decay_lambda, hit_count }. At write time, an LLM tags the fact with a TTL tier (immutable / long / short / session) and an initial decay_lambda. At retrieval, the score is cos(q, m.embedding) * exp(-m.decay_lambda * (now - m.last_accessed_at)). On a hit, last_accessed_at updates; hit_count increments; decay_lambda decreases (memory hardens). A nightly job evicts entries where exp(-lambda * age) < 0.05 and hit_count == 0.

CallSphere implementation

Every CallSphere voice/chat agent runs time-decay memory:

Allergies + insurance numbers in Healthcare = infinite TTL
Preferred broker / preferred school district in OneRoof = 1-year TTL
Last 5 ticket subjects in UrackIT IT helpdesk = 30-day TTL
Mood, current task, in-call context = session-only

Decay parameters live per vertical. Healthcare's medication-allergy memory has lambda = 0 (immutable). Real-estate buyer urgency ("we want to close in 30 days") has lambda = 0.05/day so it fades after the buying window.

37 agents · 90+ tools · 115+ DB tables · 6 verticals. $149/$499/$1499, 14-day trial, 22% affiliate. See multi-turn memory at work on /demo.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Build steps with code

import math, time

TTL_TIERS = {
    "immutable": (0.0, None),     # never evict, no decay
    "long":      (0.001, 365*86400),  # 1 year
    "short":     (0.01, 30*86400),
    "session":   (0.1, 86400),
}

def write_memory(text):
    tier = classify_ttl(text)        # LLM call: returns one of TTL_TIERS keys
    lam, ttl = TTL_TIERS[tier]
    db.insert("memory", {
        "text": text, "embedding": embed(text),
        "created_at": time.time(), "last_accessed_at": time.time(),
        "ttl_tier": tier, "decay_lambda": lam, "hit_count": 0,
    })

def retrieve(q, top_k=5):
    cands = vector_search(embed(q), k=50)
    now = time.time()
    scored = [
        (m, m.cos_sim * math.exp(-m.decay_lambda * (now - m.last_accessed_at)))
        for m in cands
    ]
    top = sorted(scored, key=lambda x: -x[1])[:top_k]
    for m, _ in top:
        db.update("memory", m.id, {
            "last_accessed_at": now,
            "hit_count": m.hit_count + 1,
            "decay_lambda": m.decay_lambda * 0.9,  # reinforce
        })
    return [m for m, _ in top]

LLM-classify TTL on write. The classifier is the silent ranker.
Reinforce on retrieval; do not just return — update.
Run a nightly evictor for hit_count == 0 and effective_score < 0.05.
Cap memory size per user; spillover evicts oldest session-tier first.

Pitfalls

Wrong TTL classifier: tagging "I love pizza" as immutable pollutes future calls. Calibrate.
Decay too aggressive: agent forgets a real allergy. Always test on a golden set.
No staleness detection: a "highly retrieved" memory is not necessarily correct. Add explicit contradiction handling.
Reinforcement loop: mis-classified memory keeps getting hit, never decays. Add a max_hit_count guardrail.

FAQ

Decay or TTL? Both. TTL is the floor (mass eviction), decay is the score modifier.

Embedding store or graph? Hybrid — embedding for fuzzy recall, graph for entity-heavy recalls. See vw6g-15 on graph memory.

Per-user or global? Per-user always. Cross-user memory is a privacy violation.

Cost? ~$0.001 per memory write (the TTL classifier). Cheap.

See it on /demo? Yes — the multi-turn demo logs decay scores in the trace panel.

Time-Decay Memory for Chat Agents: Ebbinghaus Curves in Practice

The technique

How it works

CallSphere implementation

Build steps with code

Pitfalls

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Chat Agents With Inline Surveys and Star Ratings: CSAT and NPS Without Friction in 2026

Chat for Refund and Cancellation Flow in B2B SaaS: 2026 Production Patterns

Agent Personalization at Scale: Patterns That Work for 1M Users

Neo4j Knowledge Graph Memory for AI Agents in 2026

Outbound Sales Chat in 2026: 11x, Artisan, and Why Pure-AI BDR Replacement Reverted

Memory Consolidation Patterns for Long-Running Agents in 2026