Skip to content
AI Engineering
AI Engineering12 min read0 views

pg_trgm + pgvector Hybrid Retrieval: Build Better RAG in Postgres (2026)

Pure vector search caps out around 62% precision. Adding pg_trgm + tsvector with Reciprocal Rank Fusion lifts it to 84%+. Real SQL, real RRF math, and a Prisma helper you can drop into any agent.

TL;DR — Vector search misses exact matches (model names, error codes, drug names). Lexical search misses paraphrases. Hybrid retrieval — running both and merging with Reciprocal Rank Fusion — combines their strengths. In Postgres you get this for free with pgvector + pg_trgm + tsvector.

What you'll build

A single search_chunks(query, k) function that runs vector ANN, full-text BM25-flavored ranking, and trigram fuzzy matching, then fuses the three lists with RRF. Recall@10 climbs from ~0.62 (vector only) to ~0.84+.

Schema

CREATE EXTENSION IF NOT EXISTS vector;
CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE TABLE chunks (
  id BIGSERIAL PRIMARY KEY,
  doc_id UUID NOT NULL,
  body TEXT NOT NULL,
  body_tsv tsvector GENERATED ALWAYS AS (to_tsvector('english', body)) STORED,
  embedding vector(1536) NOT NULL
);

CREATE INDEX chunks_tsv_idx ON chunks USING gin (body_tsv);
CREATE INDEX chunks_trgm_idx ON chunks USING gin (body gin_trgm_ops);
CREATE INDEX chunks_hnsw_idx ON chunks USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 128);

Architecture

flowchart LR
  Q[User query] --> EMB[Embed]
  Q --> TS[Tsvector parse]
  Q --> TRG[Trigram tokens]
  EMB --> ANN[HNSW top-50]
  TS --> BM25[ts_rank top-50]
  TRG --> FUZZ[trgm sim top-50]
  ANN --> RRF[Reciprocal Rank Fusion]
  BM25 --> RRF
  FUZZ --> RRF
  RRF --> TOPK[Top-k to LLM]

Step 1 — Vector candidates

WITH vec AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS rk
  FROM chunks ORDER BY embedding <=> $1::vector LIMIT 50
)
SELECT * FROM vec;

Step 2 — Full-text candidates

WITH lex AS (
  SELECT id, ROW_NUMBER() OVER (
    ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', $2)) DESC
  ) AS rk
  FROM chunks
  WHERE body_tsv @@ websearch_to_tsquery('english', $2)
  LIMIT 50
)
SELECT * FROM lex;

Step 3 — Trigram fuzzy candidates

WITH fuzz AS (
  SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, $2) DESC) AS rk
  FROM chunks
  WHERE body % $2
  ORDER BY similarity(body, $2) DESC LIMIT 50
)
SELECT * FROM fuzz;

Step 4 — Fuse with RRF

CREATE OR REPLACE FUNCTION search_chunks(
  q_embed vector(1536),
  q_text  text,
  k       int DEFAULT 10,
  rrf_k   int DEFAULT 60
)
RETURNS TABLE (id bigint, body text, score float) AS $$
WITH
  vec AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY embedding <=> q_embed) AS rk
    FROM chunks ORDER BY embedding <=> q_embed LIMIT 50
  ),
  lex AS (
    SELECT id, ROW_NUMBER() OVER (
      ORDER BY ts_rank_cd(body_tsv, websearch_to_tsquery('english', q_text)) DESC
    ) AS rk
    FROM chunks
    WHERE body_tsv @@ websearch_to_tsquery('english', q_text)
    LIMIT 50
  ),
  fuzz AS (
    SELECT id, ROW_NUMBER() OVER (ORDER BY similarity(body, q_text) DESC) AS rk
    FROM chunks WHERE body % q_text LIMIT 50
  ),
  fused AS (
    SELECT id, SUM(1.0 / (rrf_k + rk)) AS score FROM (
      SELECT id, rk FROM vec
      UNION ALL SELECT id, rk FROM lex
      UNION ALL SELECT id, rk FROM fuzz
    ) u GROUP BY id
  )
SELECT c.id, c.body, f.score
FROM fused f JOIN chunks c USING (id)
ORDER BY f.score DESC LIMIT k;
$$ LANGUAGE sql STABLE;

Step 5 — Call from Prisma

const rows = await prisma.$queryRaw`
  SELECT * FROM search_chunks(${vec}::vector, ${q}, 10)
`;

Step 6 — Tune RRF k

k=60 is the canonical TREC value. Lower (k=10) emphasizes top-1 hits; higher (k=120) flattens results. A/B test on your eval set.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Pitfalls

  • websearch_to_tsquery rejects empty strings — guard with COALESCE or a fallback.
  • Trigram on long text% works on whole-string sim. For long docs, store title || summary as a separate trgm column.
  • Different LIMITs per branch — keep them equal (50 each) so RRF weights are fair.
  • Forgetting STABLE — required so the planner caches the function call.

CallSphere production note

CallSphere's knowledge agents use this exact RRF pattern across the 115+ DB tables that hold product docs, FAQs, and call transcripts. The Healthcare vertical (HIPAA, Prisma healthcare_voice) tunes rrf_k=40 to favor exact drug-name matches; OneRoof (RLS) keeps rrf_k=60; UrackIT (Supabase + ChromaDB) ports the same RRF shape to its non-HIPAA stack. 37 agents · 90+ tools · 6 verticals. Pricing: $149/$499/$1,499 — 14-day trial, 22% affiliate.

FAQ

Q: How much does hybrid actually beat vector-only? On TREC-COVID and BEIR splits, hybrid + RRF lifts nDCG@10 by 15–25%.

Q: Do I need a reranker after RRF? Optional. A cross-encoder reranker on the top-20 fused results adds another 3–7% nDCG.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Q: What about pg_textsearch / BM25 in Postgres? TigerData's pg_textsearch ships true BM25. Drop-in for ts_rank_cd if you need it.

Q: pg_trgm vs FTS — when does each win? Trgm wins on misspellings and partial words; FTS wins on phrase/stem matching.

Q: Single SQL vs three round trips? Single SQL is faster and atomic. Always fuse server-side.

Sources

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.

AI Engineering

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Cognee builds and queries a knowledge graph from your unstructured data automatically. A walkthrough from install to your first agent integration in production.

AI Infrastructure

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Your agent's memory, embeddings, and conversation state all live in Postgres. Backups must include vector data and survive a full-region loss. Here's how CallSphere does PITR for 115+ tables.

AI Strategy

Enterprise CIO Guide: Retell AI Knowledge Base — RAG Goes Native in Voice

Enterprise CIO Guide perspective on Retell shipped first-class knowledge bases for voice agents, removing one of the last reasons to roll your own RAG layer.