TL;DR — Vector search misses exact matches (model names, error codes, drug names). Lexical search misses paraphrases. Hybrid retrieval — running both and merging with Reciprocal Rank Fusion — combines their strengths. In Postgres you get this for free with pgvector + pg_trgm + tsvector.

What you'll build

A single search_chunks(query, k) function that runs vector ANN, full-text BM25-flavored ranking, and trigram fuzzy matching, then fuses the three lists with RRF. Recall@10 climbs from ~0.62 (vector only) to ~0.84+.

Schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  doc_id UUID NOT NULL,
  body TEXT NOT NULL,
  body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED,
  embedding vector(1536) NOT NULL
);

CREATE INDEX chunks_tsv_idx ON chunks USING gin (body_tsv);
CREATE INDEX chunks_trgm_idx ON chunks USING gin (body gin_trgm_ops);
CREATE INDEX chunks_hnsw_idx ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 128);

Architecture

flowchart LR
  Q[User query] --> EMB[Embed]
  Q --> TS[Tsvector parse]
  Q --> TRG[Trigram tokens]
  EMB --> ANN[HNSW top-50]
  TS --> BM25[ts_rank top-50]
  TRG --> FUZZ[trgm sim top-50]
  ANN --> RRF[Reciprocal Rank Fusion]
  BM25 --> RRF
  FUZZ --> RRF
  RRF --> TOPK[Top-k to LLM]

Step 1 — Vector candidates

WITH vec AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rk
  FROM chunks ORDER BY embedding <=> $1::vector LIMIT 50
)
SELECT * FROM vec;

Step 2 — Full-text candidates

WITH lex AS (
  SELECT id, ROW_NUMBER() OVER (
    ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', $2)) DESC
  ) AS rk
  FROM chunks
  WHERE body_tsv @@ websearch_to_tsquery('english', $2)
  LIMIT 50
)
SELECT * FROM lex;

Step 3 — Trigram fuzzy candidates

WITH fuzz AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, $2) DESC) AS rk
  FROM chunks
  WHERE body % $2
  ORDER BY similarity(body, $2) DESC LIMIT 50
)
SELECT * FROM fuzz;

Step 4 — Fuse with RRF

CREATE OR REPLACE FUNCTION search_chunks(
  q_embed vector(1536),
  q_text  text,
  k       int DEFAULT 10,
  rrf_k   int DEFAULT 60
)
RETURNS TABLE (id bigint, body text, score float) AS $$
WITH
  vec AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> q_embed) AS rk
    FROM chunks ORDER BY embedding <=> q_embed LIMIT 50
  ),
  lex AS (
    SELECT id, ROW_NUMBER() OVER (
      ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', q_text)) DESC
    ) AS rk
    FROM chunks
    WHERE body_tsv @@ websearch_to_tsquery('english', q_text)
    LIMIT 50
  ),
  fuzz AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, q_text) DESC) AS rk
    FROM chunks WHERE body % q_text LIMIT 50
  ),
  fused AS (
    SELECT id, SUM(1.0 / (rrf_k + rk)) AS score FROM (
      SELECT id, rk FROM vec
      UNION ALL SELECT id, rk FROM lex
      UNION ALL SELECT id, rk FROM fuzz
    ) u GROUP BY id
  )
SELECT c.id, c.body, f.score
FROM fused f JOIN chunks c USING (id)
ORDER BY f.score DESC LIMIT k;
$$ LANGUAGE sql STABLE;

Step 5 — Call from Prisma

const rows = await prisma.$queryRaw`
  SELECT * FROM search_chunks(${vec}::vector, ${q}, 10)
`;

Step 6 — Tune RRF `k`

k=60 is the canonical TREC value. Lower (k=10) emphasizes top-1 hits; higher (k=120) flattens results. A/B test on your eval set.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Pitfalls

websearch_to_tsquery rejects empty strings — guard with COALESCE or a fallback.
Trigram on long text — % works on whole-string sim. For long docs, store title || summary as a separate trgm column.
Different LIMITs per branch — keep them equal (50 each) so RRF weights are fair.
Forgetting STABLE — required so the planner caches the function call.

CallSphere production note

CallSphere's knowledge agents use this exact RRF pattern across the 115+ DB tables that hold product docs, FAQs, and call transcripts. The Healthcare vertical (HIPAA, Prisma healthcare_voice) tunes rrf_k=40 to favor exact drug-name matches; OneRoof (RLS) keeps rrf_k=60; UrackIT (Supabase + ChromaDB) ports the same RRF shape to its non-HIPAA stack. 37 agents · 90+ tools · 6 verticals. Pricing: $149/$499/$1,499 — 14-day trial, 22% affiliate.

FAQ

Q: How much does hybrid actually beat vector-only? On TREC-COVID and BEIR splits, hybrid + RRF lifts nDCG@10 by 15–25%.

Q: Do I need a reranker after RRF? Optional. A cross-encoder reranker on the top-20 fused results adds another 3–7% nDCG.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Q: What about pg_textsearch / BM25 in Postgres? TigerData's pg_textsearch ships true BM25. Drop-in for ts_rank_cd if you need it.

Q: pg_trgm vs FTS — when does each win? Trgm wins on misspellings and partial words; FTS wins on phrase/stem matching.

Q: Single SQL vs three round trips? Single SQL is faster and atomic. Always fuse server-side.

pg_trgm + pgvector Hybrid Retrieval: Build Better RAG in Postgres (2026)

What you'll build

Schema

Architecture

Step 1 — Vector candidates

Step 2 — Full-text candidates

Step 3 — Trigram fuzzy candidates

Step 4 — Fuse with RRF

Step 5 — Call from Prisma

Step 6 — Tune RRF `k`

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Enterprise CIO Guide: Retell AI Knowledge Base — RAG Goes Native in Voice

What you'll build

Schema

Architecture

Step 1 — Vector candidates

Step 2 — Full-text candidates

Step 3 — Trigram fuzzy candidates

Step 4 — Fuse with RRF

Step 5 — Call from Prisma

Step 6 — Tune RRF k

Pitfalls

CallSphere production note

FAQ

Sources

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Enterprise CIO Guide: Retell AI Knowledge Base — RAG Goes Native in Voice

Step 6 — Tune RRF `k`