Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval

The Retrieval Quality Problem

An agent's memory is only as good as its retrieval. Storing a thousand perfectly organized memories means nothing if the agent pulls back the wrong five when answering a question. Most naive implementations use a single signal — either recency (most recent first) or relevance (best embedding match). Both fail in predictable ways.

Recency-only retrieval ignores critical old memories. Relevance-only retrieval surfaces stale facts that matched the query words but are no longer accurate. Production agents need multi-signal ranking that balances recency, relevance, and importance.

The Three Scoring Functions

Each signal produces a score between 0 and 1 for every memory candidate.

flowchart TD
    MSG(["New message"])
    WORKING["Working memory<br/>rolling window"]
    EPISODIC[("Episodic memory<br/>past sessions")]
    SEMANTIC[("Semantic memory<br/>facts and preferences")]
    SUM["Summarizer<br/>compresses old turns"]
    ROUTER{"Retrieve<br/>needed memories"}
    PROMPT["Assembled context"]
    LLM["LLM"]
    UPD["Memory updater<br/>writes new facts"]
    MSG --> WORKING --> ROUTER
    ROUTER -->|Past sessions| EPISODIC
    ROUTER -->|User facts| SEMANTIC
    EPISODIC --> SUM --> PROMPT
    SEMANTIC --> PROMPT
    WORKING --> PROMPT --> LLM --> UPD
    UPD --> EPISODIC
    UPD --> SEMANTIC
    style ROUTER fill:#4f46e5,stroke:#4338ca,color:#fff
    style LLM fill:#f59e0b,stroke:#d97706,color:#1f2937
    style EPISODIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style SEMANTIC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b

Recency Score

Recency decays exponentially from the memory's last access time. Recent memories score near 1.0, and old memories approach 0.0.

import math
from datetime import datetime
from dataclasses import dataclass, field

@dataclass
class Memory:
    content: str
    embedding: list[float]
    created_at: datetime
    last_accessed: datetime
    importance: float = 0.5
    access_count: int = 0

def recency_score(
    memory: Memory,
    now: datetime,
    half_life_hours: float = 24.0,
) -> float:
    hours_elapsed = (
        (now - memory.last_accessed).total_seconds() / 3600
    )
    decay_rate = math.log(2) / half_life_hours
    return math.exp(-decay_rate * hours_elapsed)

The half-life parameter controls the decay speed. A 24-hour half-life means a memory accessed yesterday gets a recency score of 0.5. A 168-hour half-life (one week) gives the same memory a score of about 0.95.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Relevance Score

Relevance measures how semantically close a memory is to the current query. In production, this is the cosine similarity between the query embedding and the memory embedding.

def cosine_similarity(a: list[float], b: list[float]) -> float:
    dot = sum(x * y for x, y in zip(a, b))
    norm_a = math.sqrt(sum(x * x for x in a))
    norm_b = math.sqrt(sum(x * x for x in b))
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return dot / (norm_a * norm_b)

def relevance_score(
    memory: Memory,
    query_embedding: list[float],
) -> float:
    sim = cosine_similarity(memory.embedding, query_embedding)
    # Normalize from [-1, 1] to [0, 1]
    return (sim + 1) / 2

Importance Score

Importance is a property of the memory itself, not the query. It reflects how critical this information is regardless of context. User preferences, explicit instructions, and key decisions have high importance. Transient observations have low importance.

def importance_score(memory: Memory) -> float:
    base = memory.importance
    # Boost based on access frequency
    access_boost = min(memory.access_count * 0.02, 0.2)
    return min(base + access_boost, 1.0)

Combined Ranking

The three signals are combined with configurable weights. This lets you tune the retrieval behavior for different use cases.

@dataclass
class RetrievalWeights:
    recency: float = 0.3
    relevance: float = 0.5
    importance: float = 0.2

    def __post_init__(self):
        total = self.recency + self.relevance + self.importance
        self.recency /= total
        self.relevance /= total
        self.importance /= total

def combined_score(
    memory: Memory,
    query_embedding: list[float],
    now: datetime,
    weights: RetrievalWeights,
    half_life_hours: float = 24.0,
) -> float:
    r = recency_score(memory, now, half_life_hours)
    rel = relevance_score(memory, query_embedding)
    imp = importance_score(memory)
    return (
        weights.recency * r
        + weights.relevance * rel
        + weights.importance * imp
    )

def retrieve(
    memories: list[Memory],
    query_embedding: list[float],
    weights: RetrievalWeights | None = None,
    top_k: int = 5,
    half_life_hours: float = 24.0,
) -> list[Memory]:
    weights = weights or RetrievalWeights()
    now = datetime.now()
    scored = [
        (
            combined_score(
                m, query_embedding, now, weights, half_life_hours
            ),
            m,
        )
        for m in memories
    ]
    scored.sort(key=lambda x: x[0], reverse=True)
    results = []
    for _, mem in scored[:top_k]:
        mem.last_accessed = now
        mem.access_count += 1
        results.append(mem)
    return results

Tuning the Weights

Different agent scenarios need different weight profiles.

Customer support agents should weight importance heavily (0.4) so that account details and policies always surface. Recency matters moderately (0.3) because recent tickets provide context.

Research agents should weight relevance heavily (0.6) since the user is searching for specific knowledge. Recency and importance split the remainder.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Personal assistants should weight recency highly (0.4) because users usually ask about recent events. Importance handles persistent preferences.

# Weight profiles for common scenarios
SUPPORT_WEIGHTS = RetrievalWeights(
    recency=0.3, relevance=0.3, importance=0.4
)
RESEARCH_WEIGHTS = RetrievalWeights(
    recency=0.15, relevance=0.6, importance=0.25
)
ASSISTANT_WEIGHTS = RetrievalWeights(
    recency=0.4, relevance=0.35, importance=0.25
)

A/B Testing Your Retrieval

To tune weights empirically, log what the agent retrieves and whether the user's question was answered successfully. Compare retrieval quality across weight configurations.

@dataclass
class RetrievalLog:
    query: str
    weights_used: RetrievalWeights
    retrieved_ids: list[str]
    user_satisfied: bool | None = None

    def to_dict(self) -> dict:
        return {
            "query": self.query,
            "weights": {
                "recency": self.weights_used.recency,
                "relevance": self.weights_used.relevance,
                "importance": self.weights_used.importance,
            },
            "retrieved_count": len(self.retrieved_ids),
            "satisfied": self.user_satisfied,
        }

Collect these logs, segment by weight configuration, and compare the satisfaction rate. Shift weights toward configurations that produce higher satisfaction.

FAQ

Should the weights be static or adaptive?

Start with static weights tuned per use case. Adaptive weights that shift based on query type add complexity. For example, a question starting with "what did I just say" should boost recency, while "what is our refund policy" should boost importance. Implementing query-type detection is a good optimization once the static baseline works well.

What if two memories score identically?

Break ties with creation time — newer memories first. In practice, exact ties are rare because the three signals create a high-resolution scoring space. If you see many ties, your embeddings may lack discriminative power.

How many memories should I retrieve?

Start with 5 and adjust. Too few and the agent misses context. Too many and you waste context window tokens on low-value memories. Monitor context window utilization and reduce top_k if the agent is frequently truncating.

#MemoryRetrieval #SearchRanking #AgentMemory #Python #AgenticAI #LearnAI #AIEngineering

Memory Search Strategies: Recency, Relevance, and Importance-Weighted Retrieval

The Retrieval Quality Problem

The Three Scoring Functions

Recency Score

Relevance Score

Importance Score

Combined Ranking

Tuning the Weights

A/B Testing Your Retrieval

FAQ

Should the weights be static or adaptive?

What if two memories score identically?

How many memories should I retrieve?

Try CallSphere AI Voice Agents

Related Articles You May Like

Evaluating Agent Memory: Recall, Precision, and the Eval Pipeline Most Teams Don't Build

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

Agent Memory in LangGraph 2026: Short-Term, Long-Term, and the Patterns That Survive Production

Anthropic Skills System: Loadable Tool Packs for Claude Agents

Designing Agent Loops with the Claude Agent SDK

Enterprise CIO Guide: Hippocratic AI — Healthcare Agents at Scale