pgvector Quickstart: Wire AI Agent Embeddings into Postgres in 30 Min (2026)
A working pgvector quickstart for AI agents — install the extension, design a hybrid table, build an HNSW index, and query top-k matches from Prisma and Python. Real schema, real code, no marketing fluff.
TL;DR — pgvector 0.8.x on Postgres 17/18 is the simplest sane vector store for AI agents in 2026. You get ACID, joins, RLS, and HNSW indexes without bolting on a separate vector DB. This guide takes you from
CREATE EXTENSIONto a working top-k query in under 30 minutes.
What you'll build
A single agent_memories table that stores 1536-dimension OpenAI embeddings alongside metadata, indexed for sub-50ms top-k retrieval. By the end you'll have:
- pgvector installed and verified
- A schema with vector + JSONB metadata + tenant isolation
- An HNSW index tuned for ~1M rows
- A working Prisma + Python retriever returning top-5 hits
Schema
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE agent_memories (
id BIGSERIAL PRIMARY KEY,
tenant_id UUID NOT NULL,
agent_id UUID NOT NULL,
content TEXT NOT NULL,
embedding vector(1536) NOT NULL,
metadata JSONB NOT NULL DEFAULT '{}'::jsonb,
created_at TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE INDEX agent_memories_embedding_hnsw
ON agent_memories
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
CREATE INDEX agent_memories_tenant_idx ON agent_memories (tenant_id, agent_id);
CREATE INDEX agent_memories_meta_gin ON agent_memories USING gin (metadata);
Architecture
flowchart LR
USER[User turn] --> EMB[OpenAI embed]
EMB --> PG[(Postgres + pgvector)]
PG -->|top-k cosine| RET[Retrieved chunks]
RET --> LLM[GPT-4o]
LLM --> REPLY[Agent reply]
REPLY --> WRITE[Write to memory]
WRITE --> PG
Step 1 — Install pgvector
On Postgres 18 with the postgresql-18-pgvector package:
sudo apt install postgresql-18-pgvector
psql -d agents -c "CREATE EXTENSION vector;"
psql -d agents -c "SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';"
Step 2 — Define the Prisma model
model AgentMemory {
id BigInt @id @default(autoincrement())
tenantId String @map("tenant_id") @db.Uuid
agentId String @map("agent_id") @db.Uuid
content String
embedding Unsupported("vector(1536)")
metadata Json @default("{}")
createdAt DateTime @default(now()) @map("created_at")
@@map("agent_memories")
@@index([tenantId, agentId])
}
Prisma cannot natively express vector, so use Unsupported and run raw SQL for inserts and ANN searches.
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
Step 3 — Insert with raw SQL
import { PrismaClient } from "@prisma/client";
const prisma = new PrismaClient();
export async function remember(
tenantId: string,
agentId: string,
content: string,
embedding: number[],
) {
const vec = `[${embedding.join(",")}]`;
await prisma.$executeRaw`
INSERT INTO agent_memories (tenant_id, agent_id, content, embedding)
VALUES (${tenantId}::uuid, ${agentId}::uuid, ${content}, ${vec}::vector)
`;
}
Step 4 — Top-k retrieval
export async function recall(
tenantId: string,
agentId: string,
query: number[],
k = 5,
) {
const vec = `[${query.join(",")}]`;
return prisma.$queryRaw`
SELECT id, content, metadata,
1 - (embedding <=> ${vec}::vector) AS similarity
FROM agent_memories
WHERE tenant_id = ${tenantId}::uuid
AND agent_id = ${agentId}::uuid
ORDER BY embedding <=> ${vec}::vector
LIMIT ${k}
`;
}
The <=> operator is cosine distance. Use <-> for L2, <#> for inner-product.
Step 5 — Tune ef_search per query
SET LOCAL hnsw.ef_search = 100;
SELECT id FROM agent_memories
ORDER BY embedding <=> $1::vector LIMIT 10;
Default ef_search is 40 — too low for >100k rows. Set it per session or per query for higher recall.
Step 6 — Python equivalent (LangChain agents)
import psycopg, os
from openai import OpenAI
client = OpenAI()
conn = psycopg.connect(os.environ["DATABASE_URL"])
def embed(text: str) -> list[float]:
return client.embeddings.create(
model="text-embedding-3-small", input=text
).data[0].embedding
def recall(tenant: str, agent: str, query: str, k: int = 5):
v = embed(query)
with conn.cursor() as cur:
cur.execute(
"""
SELECT content, 1 - (embedding <=> %s::vector) AS sim
FROM agent_memories
WHERE tenant_id = %s AND agent_id = %s
ORDER BY embedding <=> %s::vector LIMIT %s
""",
(v, tenant, agent, v, k),
)
return cur.fetchall()
Pitfalls
- Forgetting to cast —
'[1,2,3]'::vectoris required; raw arrays fail. - Index built before data load — HNSW build is much faster on populated tables; bulk load first, then index.
- Default
ef_search = 40— too low; bump to 100+ for production recall. - No tenant filter — without it, ANN scans the whole graph. Always include
tenant_idin the WHERE.
CallSphere production note
CallSphere runs 115+ DB tables across 6 verticals on PostgreSQL — including a dedicated healthcare_voice Prisma schema for HIPAA workloads, OneRoof's RLS-enforced tenant model, and UrackIT's Supabase + ChromaDB hybrid. Across 37 specialized agents and 90+ tools, every long-term memory write goes through pgvector with HNSW. Pricing tiers: $149 / $499 / $1,499, 14-day trial, 22% affiliate program.
FAQ
Q: Should I use IVFFlat or HNSW?
HNSW for ≥99% workloads in 2026 — better recall, no lists to tune. IVFFlat only when memory is constrained.
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
Q: How many vectors before pgvector hurts? Up to ~10M with HNSW + decent hardware. Beyond that consider sharding (Citus) or pgvectorscale.
Q: Can I store 3072-dim embeddings (text-embedding-3-large)? Yes, but HNSW caps at 2,000 dims. Use halfvec or matryoshka-truncate to 1536.
Q: Does pgvector support metadata filtering?
Yes — combine WHERE clauses with ANN ordering. Postgres planner handles it.
Q: Cosine vs L2 vs dot product? Cosine for normalized OpenAI embeddings, dot product when you control normalization yourself.
Sources
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.