Building a Semantic FAQ System: Finding Answers Using Vector Similarity
Build an intelligent FAQ system that understands user questions by meaning rather than keywords, using vector similarity to match queries to answers with confidence thresholds and graceful fallback behavior.
The Problem with Keyword FAQ Search
Traditional FAQ systems match user questions to answers using keyword overlap or simple string matching. A customer asking "Can I get my money back?" will not match an FAQ titled "Refund Policy" because they share no common words. Semantic FAQ systems solve this by embedding both the question and the FAQ entries into a shared vector space, where meaning determines relevance.
Designing the FAQ Data Model
A semantic FAQ system stores each FAQ entry with multiple question variations. Different users phrase the same question differently, and pre-computing embeddings for several phrasings dramatically improves match quality.
flowchart LR
USER(["Customer"])
CHANNEL{"Channel"}
CHAT["Chat agent"]
VOICE["Voice agent"]
EMAIL["Email agent"]
TRIAGE["Triage and<br/>intent detection"]
KB[("Knowledge base<br/>RAG")]
CRM[("CRM context")]
AUTORES{"Auto resolvable?"}
RESOLVE(["Resolved with<br/>cited answer"])
HUMAN(["Tier 2 agent"])
USER --> CHANNEL --> CHAT --> TRIAGE
CHANNEL --> VOICE --> TRIAGE
CHANNEL --> EMAIL --> TRIAGE
TRIAGE --> KB
TRIAGE --> CRM
TRIAGE --> AUTORES
AUTORES -->|Yes| RESOLVE
AUTORES -->|No| HUMAN
style TRIAGE fill:#4f46e5,stroke:#4338ca,color:#fff
style AUTORES fill:#f59e0b,stroke:#d97706,color:#1f2937
style RESOLVE fill:#059669,stroke:#047857,color:#fff
style HUMAN fill:#0ea5e9,stroke:#0369a1,color:#fff
from dataclasses import dataclass, field
from typing import List, Optional
import numpy as np
@dataclass
class FAQEntry:
id: str
canonical_question: str
answer: str
question_variations: List[str] = field(default_factory=list)
category: str = "general"
metadata: dict = field(default_factory=dict)
@property
def all_questions(self) -> List[str]:
return [self.canonical_question] + self.question_variations
# Example FAQ data
faqs = [
FAQEntry(
id="refund-001",
canonical_question="What is your refund policy?",
answer="We offer a full refund within 30 days of purchase...",
question_variations=[
"Can I get my money back?",
"How do I request a refund?",
"What if I'm not satisfied with my purchase?",
"Is there a money-back guarantee?",
],
category="billing",
),
FAQEntry(
id="shipping-001",
canonical_question="How long does shipping take?",
answer="Standard shipping takes 5-7 business days...",
question_variations=[
"When will my order arrive?",
"What are the delivery times?",
"How fast do you ship?",
],
category="shipping",
),
]
Building the Semantic FAQ Engine
The engine embeds all question variations and maps them back to their parent FAQ entries. When a user asks a question, we find the closest variation and return the corresponding answer.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Tuple
class SemanticFAQEngine:
def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
self.model = SentenceTransformer(model_name)
self.faqs: List[FAQEntry] = []
self.embeddings: Optional[np.ndarray] = None
self.variation_to_faq: List[int] = [] # maps variation index -> FAQ index
def load_faqs(self, faqs: List[FAQEntry]):
"""Embed all question variations and build the index."""
self.faqs = faqs
all_questions = []
self.variation_to_faq = []
for faq_idx, faq in enumerate(faqs):
for question in faq.all_questions:
all_questions.append(question)
self.variation_to_faq.append(faq_idx)
self.embeddings = self.model.encode(
all_questions, normalize_embeddings=True
)
print(
f"Indexed {len(faqs)} FAQs with "
f"{len(all_questions)} total variations"
)
def find_answer(
self,
user_question: str,
top_k: int = 3,
threshold: float = 0.55,
) -> List[dict]:
"""Find the most relevant FAQ answers for a user question."""
query_emb = self.model.encode(
[user_question], normalize_embeddings=True
)
similarities = np.dot(self.embeddings, query_emb.T).flatten()
top_indices = np.argsort(similarities)[::-1][:top_k * 3]
seen_faq_ids = set()
results = []
for idx in top_indices:
score = float(similarities[idx])
if score < threshold:
break
faq_idx = self.variation_to_faq[idx]
faq = self.faqs[faq_idx]
if faq.id in seen_faq_ids:
continue
seen_faq_ids.add(faq.id)
results.append({
"faq_id": faq.id,
"question": faq.canonical_question,
"answer": faq.answer,
"confidence": score,
"category": faq.category,
})
if len(results) >= top_k:
break
return results
Threshold Tuning
The similarity threshold is critical. Too high and you miss valid matches; too low and you return irrelevant answers. Here is a systematic approach to finding the right threshold.
def tune_threshold(
engine: SemanticFAQEngine,
test_queries: List[dict], # {"query": str, "expected_faq_id": str}
):
"""Find the threshold that maximizes F1 score."""
thresholds = np.arange(0.30, 0.80, 0.05)
best_f1 = 0
best_threshold = 0.5
for threshold in thresholds:
tp, fp, fn = 0, 0, 0
for test in test_queries:
results = engine.find_answer(
test["query"], top_k=1, threshold=threshold
)
if results:
if results[0]["faq_id"] == test["expected_faq_id"]:
tp += 1
else:
fp += 1
else:
fn += 1
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
f1 = (2 * precision * recall / (precision + recall)
if (precision + recall) > 0 else 0)
if f1 > best_f1:
best_f1 = f1
best_threshold = threshold
print(f"Threshold={threshold:.2f}: P={precision:.2f} "
f"R={recall:.2f} F1={f1:.2f}")
print(f"\nBest threshold: {best_threshold:.2f} (F1={best_f1:.2f})")
return best_threshold
Graceful Fallback
When no FAQ matches above the threshold, the system should offer a helpful fallback rather than showing nothing.
def answer_with_fallback(
engine: SemanticFAQEngine,
user_question: str,
threshold: float = 0.55,
) -> dict:
"""Return best FAQ answer or a structured fallback response."""
results = engine.find_answer(user_question, top_k=3, threshold=threshold)
if results and results[0]["confidence"] > 0.75:
return {
"type": "confident_match",
"answer": results[0]["answer"],
"confidence": results[0]["confidence"],
}
elif results:
return {
"type": "suggestions",
"message": "I found some related questions:",
"suggestions": [
{"question": r["question"], "confidence": r["confidence"]}
for r in results
],
}
else:
return {
"type": "fallback",
"message": "I could not find a matching answer. "
"Would you like to contact support?",
"query_logged": True,
}
FAQ
How many question variations should each FAQ entry have?
Aim for 3-5 variations per FAQ entry. Each variation should represent a genuinely different phrasing, not just minor word swaps. Collect real user questions from support logs or chat transcripts to create authentic variations. More variations improve recall but also increase index size.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Should I embed the answer text as well?
Generally no. Embedding the question is more effective because users typically phrase their input as a question, and the FAQ answer text often contains detailed explanations that dilute the semantic signal. If you find that some answers contain key phrases users search for, consider adding those phrases as additional question variations instead.
How do I handle FAQ entries that are very similar to each other?
If two FAQ entries have similarity above 0.85, consider merging them or adding a disambiguation step. In the search results, you can group highly similar FAQs and present them as related topics, letting the user choose the most relevant one.
#FAQSystem #VectorSimilarity #SemanticSearch #CustomerSupport #NLP #AgenticAI #LearnAI #AIEngineering
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.