The Problem with Keyword FAQ Search

Traditional FAQ systems match user questions to answers using keyword overlap or simple string matching. A customer asking "Can I get my money back?" will not match an FAQ titled "Refund Policy" because they share no common words. Semantic FAQ systems solve this by embedding both the question and the FAQ entries into a shared vector space, where meaning determines relevance.

Designing the FAQ Data Model

A semantic FAQ system stores each FAQ entry with multiple question variations. Different users phrase the same question differently, and pre-computing embeddings for several phrasings dramatically improves match quality.

flowchart LR
    USER(["Customer"])
    CHANNEL{"Channel"}
    CHAT["Chat agent"]
    VOICE["Voice agent"]
    EMAIL["Email agent"]
    TRIAGE["Triage and<br/>intent detection"]
    KB[("Knowledge base<br/>RAG")]
    CRM[("CRM context")]
    AUTORES{"Auto resolvable?"}
    RESOLVE(["Resolved with<br/>cited answer"])
    HUMAN(["Tier 2 agent"])
    USER --> CHANNEL --> CHAT --> TRIAGE
    CHANNEL --> VOICE --> TRIAGE
    CHANNEL --> EMAIL --> TRIAGE
    TRIAGE --> KB
    TRIAGE --> CRM
    TRIAGE --> AUTORES
    AUTORES -->|Yes| RESOLVE
    AUTORES -->|No| HUMAN
    style TRIAGE fill:#4f46e5,stroke:#4338ca,color:#fff
    style AUTORES fill:#f59e0b,stroke:#d97706,color:#1f2937
    style RESOLVE fill:#059669,stroke:#047857,color:#fff
    style HUMAN fill:#0ea5e9,stroke:#0369a1,color:#fff

from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np

@dataclass
class FAQEntry:
    id: str
    canonical_question: str
    answer: str
    question_variations: List[str] = field(default_factory=list)
    category: str = "general"
    metadata: dict = field(default_factory=dict)

    @property
    def all_questions(self) -> List[str]:
        return [self.canonical_question] + self.question_variations

# Example FAQ data
faqs = [
    FAQEntry(
        id="refund-001",
        canonical_question="What is your refund policy?",
        answer="We offer a full refund within 30 days of purchase...",
        question_variations=[
            "Can I get my money back?",
            "How do I request a refund?",
            "What if I'm not satisfied with my purchase?",
            "Is there a money-back guarantee?",
        ],
        category="billing",
    ),
    FAQEntry(
        id="shipping-001",
        canonical_question="How long does shipping take?",
        answer="Standard shipping takes 5-7 business days...",
        question_variations=[
            "When will my order arrive?",
            "What are the delivery times?",
            "How fast do you ship?",
        ],
        category="shipping",
    ),
]

Building the Semantic FAQ Engine

The engine embeds all question variations and maps them back to their parent FAQ entries. When a user asks a question, we find the closest variation and return the corresponding answer.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Tuple

class SemanticFAQEngine:
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
        self.faqs: List[FAQEntry] = []
        self.embeddings: Optional[np.ndarray] = None
        self.variation_to_faq: List[int] = []  # maps variation index -> FAQ index

    def load_faqs(self, faqs: List[FAQEntry]):
        """Embed all question variations and build the index."""
        self.faqs = faqs
        all_questions = []
        self.variation_to_faq = []

        for faq_idx, faq in enumerate(faqs):
            for question in faq.all_questions:
                all_questions.append(question)
                self.variation_to_faq.append(faq_idx)

        self.embeddings = self.model.encode(
            all_questions, normalize_embeddings=True
        )
        print(
            f"Indexed {len(faqs)} FAQs with "
            f"{len(all_questions)} total variations"
        )

    def find_answer(
        self,
        user_question: str,
        top_k: int = 3,
        threshold: float = 0.55,
    ) -> List[dict]:
        """Find the most relevant FAQ answers for a user question."""
        query_emb = self.model.encode(
            [user_question], normalize_embeddings=True
        )
        similarities = np.dot(self.embeddings, query_emb.T).flatten()

        top_indices = np.argsort(similarities)[::-1][:top_k * 3]

        seen_faq_ids = set()
        results = []

        for idx in top_indices:
            score = float(similarities[idx])
            if score < threshold:
                break

            faq_idx = self.variation_to_faq[idx]
            faq = self.faqs[faq_idx]

            if faq.id in seen_faq_ids:
                continue
            seen_faq_ids.add(faq.id)

            results.append({
                "faq_id": faq.id,
                "question": faq.canonical_question,
                "answer": faq.answer,
                "confidence": score,
                "category": faq.category,
            })

            if len(results) >= top_k:
                break

        return results

Threshold Tuning

The similarity threshold is critical. Too high and you miss valid matches; too low and you return irrelevant answers. Here is a systematic approach to finding the right threshold.

def tune_threshold(
    engine: SemanticFAQEngine,
    test_queries: List[dict],  # {"query": str, "expected_faq_id": str}
):
    """Find the threshold that maximizes F1 score."""
    thresholds = np.arange(0.30, 0.80, 0.05)
    best_f1 = 0
    best_threshold = 0.5

    for threshold in thresholds:
        tp, fp, fn = 0, 0, 0
        for test in test_queries:
            results = engine.find_answer(
                test["query"], top_k=1, threshold=threshold
            )
            if results:
                if results[0]["faq_id"] == test["expected_faq_id"]:
                    tp += 1
                else:
                    fp += 1
            else:
                fn += 1

        precision = tp / (tp + fp) if (tp + fp) > 0 else 0
        recall = tp / (tp + fn) if (tp + fn) > 0 else 0
        f1 = (2 * precision * recall / (precision + recall)
              if (precision + recall) > 0 else 0)

        if f1 > best_f1:
            best_f1 = f1
            best_threshold = threshold

        print(f"Threshold={threshold:.2f}: P={precision:.2f} "
              f"R={recall:.2f} F1={f1:.2f}")

    print(f"\nBest threshold: {best_threshold:.2f} (F1={best_f1:.2f})")
    return best_threshold

Graceful Fallback

When no FAQ matches above the threshold, the system should offer a helpful fallback rather than showing nothing.

def answer_with_fallback(
    engine: SemanticFAQEngine,
    user_question: str,
    threshold: float = 0.55,
) -> dict:
    """Return best FAQ answer or a structured fallback response."""
    results = engine.find_answer(user_question, top_k=3, threshold=threshold)

    if results and results[0]["confidence"] > 0.75:
        return {
            "type": "confident_match",
            "answer": results[0]["answer"],
            "confidence": results[0]["confidence"],
        }
    elif results:
        return {
            "type": "suggestions",
            "message": "I found some related questions:",
            "suggestions": [
                {"question": r["question"], "confidence": r["confidence"]}
                for r in results
            ],
        }
    else:
        return {
            "type": "fallback",
            "message": "I could not find a matching answer. "
                       "Would you like to contact support?",
            "query_logged": True,
        }

FAQ

How many question variations should each FAQ entry have?

Aim for 3-5 variations per FAQ entry. Each variation should represent a genuinely different phrasing, not just minor word swaps. Collect real user questions from support logs or chat transcripts to create authentic variations. More variations improve recall but also increase index size.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Should I embed the answer text as well?

Generally no. Embedding the question is more effective because users typically phrase their input as a question, and the FAQ answer text often contains detailed explanations that dilute the semantic signal. If you find that some answers contain key phrases users search for, consider adding those phrases as additional question variations instead.

How do I handle FAQ entries that are very similar to each other?

If two FAQ entries have similarity above 0.85, consider merging them or adding a disambiguation step. In the search results, you can group highly similar FAQs and present them as related topics, letting the user choose the most relevant one.

#FAQSystem #VectorSimilarity #SemanticSearch #CustomerSupport #NLP #AgenticAI #LearnAI #AIEngineering

Building a Semantic FAQ System: Finding Answers Using Vector Similarity

The Problem with Keyword FAQ Search

Designing the FAQ Data Model

Building the Semantic FAQ Engine

Threshold Tuning

Graceful Fallback

FAQ

How many question variations should each FAQ entry have?

Should I embed the answer text as well?

How do I handle FAQ entries that are very similar to each other?

Try CallSphere AI Voice Agents

Related Articles You May Like

GPT-5.5 vs Claude Opus 4.7 for Customer Support and Vertical AI Products

Customer-Support Agent Memory and Conversation History Patterns

Agent Adoption by Job Function: Sales, Support, Finance, HR, and Engineering Data

AI Voice Agents with Multilingual Support for Global Teams

Web Chat Widget Shipped: CallSphere vs Vapi Voice-Only Limitation

Post-Call Analytics with GPT-4o-mini: Sentiment, Lead Scoring, and Intent