Skip to content
Learn Agentic AI
Learn Agentic AI11 min read3 views

Building an FAQ Agent: Automatic Question Answering from Knowledge Bases

Build a production FAQ agent that retrieves answers from knowledge bases using semantic search, applies confidence thresholds to avoid hallucination, and tracks unanswered questions to improve coverage over time.

The Problem with Static FAQ Pages

Traditional FAQ pages fail customers in two ways. First, customers must guess the exact wording the company used to describe their problem. Second, the list grows unwieldy over time — a 200-item FAQ page helps no one. An FAQ agent solves both problems by understanding the customer's question semantically and retrieving the most relevant answer regardless of how it was phrased.

Architecture Overview

An FAQ agent has three core components: a knowledge base with embeddings, a retrieval layer that finds relevant answers, and a generation layer that synthesizes a natural response with confidence scoring.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
from dataclasses import dataclass
from openai import AsyncOpenAI
import numpy as np

@dataclass
class FAQEntry:
    id: str
    question: str
    answer: str
    embedding: list[float]
    category: str
    last_updated: str

@dataclass
class RetrievalResult:
    entry: FAQEntry
    similarity: float

class FAQKnowledgeBase:
    def __init__(self, client: AsyncOpenAI):
        self.client = client
        self.entries: list[FAQEntry] = []

    async def embed_text(self, text: str) -> list[float]:
        response = await self.client.embeddings.create(
            model="text-embedding-3-small",
            input=text,
        )
        return response.data[0].embedding

    async def add_entry(
        self, id: str, question: str, answer: str, category: str
    ):
        embedding = await self.embed_text(question)
        self.entries.append(
            FAQEntry(
                id=id,
                question=question,
                answer=answer,
                embedding=embedding,
                category=category,
                last_updated="2026-03-17",
            )
        )

Semantic Retrieval with Confidence Scoring

The retrieval layer computes cosine similarity between the user question and every FAQ entry. This is where confidence thresholds become critical — returning a wrong answer is far worse than admitting the agent does not know.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →
def cosine_similarity(a: list[float], b: list[float]) -> float:
    a_arr, b_arr = np.array(a), np.array(b)
    return float(
        np.dot(a_arr, b_arr)
        / (np.linalg.norm(a_arr) * np.linalg.norm(b_arr))
    )

class FAQRetriever:
    def __init__(self, kb: FAQKnowledgeBase):
        self.kb = kb
        self.high_confidence = 0.85
        self.low_confidence = 0.65

    async def retrieve(
        self, query: str, top_k: int = 3
    ) -> list[RetrievalResult]:
        query_embedding = await self.kb.embed_text(query)
        results = []
        for entry in self.kb.entries:
            sim = cosine_similarity(query_embedding, entry.embedding)
            results.append(RetrievalResult(entry=entry, similarity=sim))
        results.sort(key=lambda r: r.similarity, reverse=True)
        return results[:top_k]

    async def answer(self, query: str) -> dict:
        results = await self.retrieve(query)
        if not results:
            return {
                "answer": None,
                "confidence": "none",
                "should_track": True,
            }
        top = results[0]
        if top.similarity >= self.high_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "high",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": False,
            }
        elif top.similarity >= self.low_confidence:
            return {
                "answer": top.entry.answer,
                "confidence": "medium",
                "source_id": top.entry.id,
                "similarity": top.similarity,
                "should_track": True,
            }
        else:
            return {
                "answer": None,
                "confidence": "low",
                "should_track": True,
            }

Tracking Unanswered Questions

Every question the agent cannot confidently answer is an opportunity to improve the knowledge base. An unanswered question tracker clusters similar failures and surfaces the most impactful gaps.

from datetime import datetime
from collections import defaultdict

class UnansweredTracker:
    def __init__(self):
        self.questions: list[dict] = []

    def track(self, query: str, confidence: str, top_similarity: float):
        self.questions.append({
            "query": query,
            "confidence": confidence,
            "top_similarity": top_similarity,
            "timestamp": datetime.utcnow().isoformat(),
        })

    def get_gap_report(self, min_occurrences: int = 3) -> list[dict]:
        """Group similar unanswered questions and rank by frequency."""
        clusters = defaultdict(list)
        for q in self.questions:
            # Simple grouping by first 5 words
            key = " ".join(q["query"].lower().split()[:5])
            clusters[key].append(q)

        gaps = []
        for key, items in clusters.items():
            if len(items) >= min_occurrences:
                gaps.append({
                    "cluster_key": key,
                    "count": len(items),
                    "sample_queries": [i["query"] for i in items[:3]],
                    "avg_similarity": sum(
                        i["top_similarity"] for i in items
                    ) / len(items),
                })
        gaps.sort(key=lambda g: g["count"], reverse=True)
        return gaps

Generating Natural Responses

Rather than returning raw FAQ text, the agent uses an LLM to synthesize a conversational answer grounded in the retrieved content. This prevents hallucination by constraining the model to only use provided sources.

async def generate_faq_response(
    client: AsyncOpenAI, query: str, faq_answer: str
) -> str:
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a customer support assistant. Answer the "
                    "customer question using ONLY the provided FAQ "
                    "content. Do not add information not present in "
                    "the source. Be concise and helpful."
                ),
            },
            {
                "role": "user",
                "content": (
                    f"Customer question: {query}\n\n"
                    f"FAQ source: {faq_answer}"
                ),
            },
        ],
        max_tokens=300,
    )
    return response.choices[0].message.content

FAQ

What embedding model should I use for FAQ retrieval?

OpenAI's text-embedding-3-small offers an excellent balance of quality and cost for FAQ workloads. It handles paraphrases well and runs at a fraction of the cost of larger models. For multilingual FAQs, text-embedding-3-large performs better across languages.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

How do I set the right confidence threshold?

Start with a high threshold (0.85) and measure your false positive rate — cases where the agent returns a wrong answer confidently. Then lower the threshold gradually while monitoring accuracy. Most teams settle between 0.75 and 0.85 depending on their tolerance for incorrect responses versus unanswered questions.

How often should I update the knowledge base?

Review your unanswered question tracker weekly. Any cluster with more than five occurrences represents a meaningful gap. Also re-embed entries whenever the underlying answer content changes, since stale embeddings paired with updated text create inconsistencies.


#FAQAgent #KnowledgeBase #SemanticSearch #RAG #CustomerSupport #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like