Text Similarity and Semantic Matching for Agent Applications

Why Agents Need Semantic Matching

An AI agent frequently needs to answer questions like: "Is this new support ticket a duplicate of an existing one?" "Which FAQ entry best matches the user's question?" "Are these two product descriptions referring to the same item?" These are all text similarity problems — and solving them with simple string matching fails immediately. "How do I reset my password?" and "I forgot my login credentials" mean the same thing but share almost no words.

Semantic matching compares the meaning of texts rather than their surface form. It is fundamental to agent capabilities including knowledge retrieval, deduplication, intent matching, and conversational memory search.

Cosine Similarity: The Foundation

Cosine similarity measures the angle between two vectors. When applied to text embeddings, it captures semantic closeness on a scale from -1 (opposite meaning) to 1 (identical meaning), with 0 indicating no relationship.

flowchart LR
    IN(["Input text"])
    TOK["Tokenizer<br/>BPE or SentencePiece"]
    EMB["Token plus position<br/>embeddings"]
    subgraph BLOCK["Transformer block (xN)"]
        ATTN["Multi head<br/>self attention"]
        NORM1["Layer norm"]
        FF["Feed forward<br/>MLP"]
        NORM2["Layer norm"]
    end
    HEAD["LM head plus<br/>softmax"]
    SAMP["Sampling<br/>top-p, temperature"]
    OUT(["Next token"])
    IN --> TOK --> EMB --> ATTN --> NORM1 --> FF --> NORM2 --> HEAD --> SAMP --> OUT
    SAMP -.->|Append| EMB
    style BLOCK fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style ATTN fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

import numpy as np

def cosine_similarity(vec_a: np.ndarray, vec_b: np.ndarray) -> float:
    """Compute cosine similarity between two vectors."""
    dot_product = np.dot(vec_a, vec_b)
    norm_a = np.linalg.norm(vec_a)
    norm_b = np.linalg.norm(vec_b)
    if norm_a == 0 or norm_b == 0:
        return 0.0
    return float(dot_product / (norm_a * norm_b))

The quality of cosine similarity depends entirely on the quality of the embeddings. TF-IDF vectors capture lexical overlap. Sentence embeddings capture semantic meaning. Always use embeddings designed for the similarity task at hand.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Sentence-Transformers: Semantic Embeddings

The sentence-transformers library produces dense embeddings optimized for semantic similarity.

from sentence_transformers import SentenceTransformer
import numpy as np

model = SentenceTransformer("all-MiniLM-L6-v2")

def compute_similarity(text_a: str, text_b: str) -> float:
    """Compute semantic similarity between two texts."""
    embeddings = model.encode([text_a, text_b])
    similarity = np.dot(embeddings[0], embeddings[1]) / (
        np.linalg.norm(embeddings[0]) * np.linalg.norm(embeddings[1])
    )
    return round(float(similarity), 4)

# Semantically similar but lexically different
score = compute_similarity(
    "How do I reset my password?",
    "I forgot my login credentials and need to recover access.",
)
print(score)  # ~0.72

# Semantically different
score = compute_similarity(
    "How do I reset my password?",
    "What are your business hours?",
)
print(score)  # ~0.15

Batch Similarity for Knowledge Base Matching

Agents often need to find the most relevant document from a large collection. Batch encoding and matrix operations make this efficient.

from sentence_transformers import SentenceTransformer
import numpy as np

class SemanticMatcher:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.corpus_embeddings = None
        self.corpus_texts = []

    def index_corpus(self, texts: list[str]):
        """Pre-compute embeddings for the entire corpus."""
        self.corpus_texts = texts
        self.corpus_embeddings = self.model.encode(
            texts,
            normalize_embeddings=True,
            show_progress_bar=False,
        )

    def search(self, query: str, top_k: int = 5) -> list[dict]:
        """Find the most similar documents to a query."""
        query_embedding = self.model.encode(
            [query], normalize_embeddings=True
        )

        # Cosine similarity via dot product (embeddings are normalized)
        scores = np.dot(self.corpus_embeddings, query_embedding.T).flatten()
        top_indices = np.argsort(scores)[-top_k:][::-1]

        return [
            {
                "text": self.corpus_texts[i],
                "score": round(float(scores[i]), 4),
                "index": int(i),
            }
            for i in top_indices
        ]

# Usage
matcher = SemanticMatcher()
matcher.index_corpus([
    "How to reset your password",
    "Billing and payment FAQ",
    "Account recovery steps",
    "Shipping and delivery times",
    "Return and refund policy",
])

results = matcher.search("I can't log into my account")
# [{'text': 'Account recovery steps', 'score': 0.68},
#  {'text': 'How to reset your password', 'score': 0.62}, ...]

Cross-Encoders for High-Precision Re-Ranking

Bi-encoders (sentence-transformers) are fast because they encode texts independently. Cross-encoders are slower but more accurate because they process both texts together, capturing fine-grained interactions.

The best pattern is a two-stage pipeline: bi-encoder retrieves candidates, cross-encoder re-ranks them.

from sentence_transformers import CrossEncoder

cross_encoder = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def rerank(query: str, candidates: list[str]) -> list[dict]:
    """Re-rank candidates using a cross-encoder for higher precision."""
    pairs = [[query, candidate] for candidate in candidates]
    scores = cross_encoder.predict(pairs)

    ranked = sorted(
        zip(candidates, scores),
        key=lambda x: x[1],
        reverse=True,
    )

    return [
        {"text": text, "score": round(float(score), 4)}
        for text, score in ranked
    ]

# Bi-encoder retrieves top 20, cross-encoder re-ranks to top 5
candidates = matcher.search("billing dispute", top_k=20)
candidate_texts = [c["text"] for c in candidates]
final_results = rerank("billing dispute", candidate_texts)[:5]

Text Deduplication

Agents processing large volumes of data need to detect and remove duplicates. Semantic deduplication catches paraphrased duplicates that exact-match deduplication misses.

from sentence_transformers import SentenceTransformer
import numpy as np

def deduplicate(
    texts: list[str],
    threshold: float = 0.85,
) -> list[str]:
    """Remove semantically duplicate texts."""
    model = SentenceTransformer("all-MiniLM-L6-v2")
    embeddings = model.encode(texts, normalize_embeddings=True)

    similarity_matrix = np.dot(embeddings, embeddings.T)
    unique_indices = []
    seen = set()

    for i in range(len(texts)):
        if i in seen:
            continue
        unique_indices.append(i)
        for j in range(i + 1, len(texts)):
            if similarity_matrix[i][j] > threshold:
                seen.add(j)

    return [texts[i] for i in unique_indices]

texts = [
    "How do I cancel my subscription?",
    "I want to cancel my subscription please",
    "What is your refund policy?",
    "Can I get a refund?",
    "How to unsubscribe from the service",
]

unique = deduplicate(texts, threshold=0.80)
# Removes near-duplicates, keeps representative texts

Choosing the Right Similarity Threshold

Thresholds vary by use case:

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Deduplication: 0.85-0.95 (high threshold, only very similar texts)
FAQ matching: 0.60-0.75 (moderate, allows paraphrasing)
Topic clustering: 0.50-0.65 (lower, groups related content)
Semantic search: 0.30-0.50 (lowest, broad retrieval)

Always calibrate thresholds on your specific data. Embed 100 known similar pairs and 100 known dissimilar pairs, compute similarities, and choose a threshold that maximizes your F1 score.

FAQ

What is the difference between bi-encoders and cross-encoders, and when should I use each?

Bi-encoders encode each text independently into a fixed vector, making them extremely fast for retrieval because you can pre-compute corpus embeddings. Cross-encoders process both texts simultaneously through the transformer, making them slower but significantly more accurate. Use bi-encoders for initial retrieval over large collections (thousands to millions of documents) and cross-encoders for re-ranking a small set of candidates (10-50 documents).

How do I handle multilingual text similarity?

Use multilingual sentence-transformers models like "paraphrase-multilingual-MiniLM-L12-v2" which embed texts from 50+ languages into the same vector space. This means a Spanish query will match an English document if they are semantically similar. No translation step is needed — the model handles cross-lingual alignment internally.

How much does embedding model size affect similarity quality?

Significantly. The "all-MiniLM-L6-v2" model (80MB) is a good balance of speed and quality for most agent applications. Larger models like "all-mpnet-base-v2" (420MB) provide roughly 3-5% better accuracy on benchmarks. For production systems where milliseconds matter, the smaller model is usually sufficient. For applications where accuracy is critical (legal document matching, medical record deduplication), invest in the larger model.

#TextSimilarity #SemanticSearch #SentenceTransformers #Embeddings #AIAgents #Python #AgenticAI #LearnAI #AIEngineering

Text Similarity and Semantic Matching for Agent Applications

Why Agents Need Semantic Matching

Cosine Similarity: The Foundation

Sentence-Transformers: Semantic Embeddings

Batch Similarity for Knowledge Base Matching

Cross-Encoders for High-Precision Re-Ranking

Text Deduplication

Choosing the Right Similarity Threshold

FAQ

What is the difference between bi-encoders and cross-encoders, and when should I use each?

How do I handle multilingual text similarity?

How much does embedding model size affect similarity quality?

Try CallSphere AI Voice Agents

Related Articles You May Like

Multi-Agent Handoffs with the OpenAI Agents SDK: The Pattern That Actually Scales (2026)

AI Agent M&A Activity 2026: Aircall–Vogent, Meta–PlayAI, OpenAI's Six Deals

Building Your First Agent with the OpenAI Agents SDK in 2026: A Hands-On Walkthrough

LangGraph Checkpointers in Production: Durable, Resumable Agents with Eval Replay

LangGraph State-Machine Architecture: A Principal-Engineer Deep Dive (2026)

LangGraph Supervisor Pattern: Orchestrating Multi-Agent Teams in 2026