Vector Databases for RAG: Comparing pgvector, Pinecone, Chroma, and Weaviate

Why Vector Databases Matter for RAG

The retrieval step in RAG depends on finding document chunks whose embedding vectors are closest to the query vector. A vector database is purpose-built for this operation: it stores high-dimensional vectors, builds indexes for fast approximate nearest-neighbor (ANN) search, and returns results in milliseconds even across millions of documents.

Choosing the right vector database depends on your scale, infrastructure preferences, and operational requirements. This post gives you working code and honest tradeoffs for four leading options.

Option 1: pgvector — Vectors Inside PostgreSQL

pgvector is a PostgreSQL extension that adds vector data types and similarity search operators. If you already run Postgres, this is the lowest-friction path to production RAG.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff

import psycopg2
import numpy as np

conn = psycopg2.connect("postgresql://user:pass@localhost/ragdb")
cur = conn.cursor()

# Enable the extension
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create a table with a vector column
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT NOT NULL,
        metadata JSONB DEFAULT '{}',
        embedding vector(1536)
    )
""")

# Insert a document with its embedding
embedding = np.random.rand(1536).tolist()  # replace with real embedding
cur.execute(
    "INSERT INTO documents (content, metadata, embedding) VALUES (%s, %s, %s)",
    ("Refund policy for enterprise...", '{"source": "policies.md"}', str(embedding))
)

# Query: find 5 nearest neighbors
query_vec = np.random.rand(1536).tolist()
cur.execute("""
    SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> %s::vector
    LIMIT 5
""", (str(query_vec), str(query_vec)))

results = cur.fetchall()
for doc_id, content, sim in results:
    print(f"[{sim:.3f}] {content[:80]}...")

conn.commit()

Create an HNSW index for fast queries at scale:

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

Pros: No new infrastructure — lives in your existing Postgres. Full SQL joins with relational data. ACID transactions. Metadata filtering via standard WHERE clauses.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live →

Try Live Demo →

Cons: Slower than purpose-built vector DBs at very large scale (50M+ vectors). Limited to single-node without partitioning.

Option 2: Pinecone — Fully Managed Cloud Vector DB

Pinecone is a managed service that handles scaling, replication, and index management. You interact through an API — no servers to operate.

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create an index
pc.create_index(
    name="rag-docs",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-docs")

# Upsert vectors
index.upsert(vectors=[
    {
        "id": "doc-001",
        "values": embedding_vector,
        "metadata": {"source": "policies.md", "category": "billing"}
    }
])

# Query with metadata filter
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={"category": {"$eq": "billing"}}
)

for match in results["matches"]:
    print(f"[{match['score']:.3f}] {match['id']} — {match['metadata']}")

Pros: Zero infrastructure management. Scales to billions of vectors. Built-in metadata filtering. SOC2 compliant.

Cons: Vendor lock-in. Network latency for every query. Monthly costs grow with scale. Data leaves your network.

Option 3: Chroma — Open-Source and Embedded

Chroma is an open-source embedding database designed for simplicity. It can run in-process (embedded) or as a client-server deployment.

import chromadb
from chromadb.utils import embedding_functions

# In-memory or persistent
client = chromadb.PersistentClient(path="./chroma_db")

# Use OpenAI embeddings automatically
ef = embedding_functions.OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="documents",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"}
)

# Add documents — Chroma embeds them automatically
collection.add(
    ids=["doc-001", "doc-002"],
    documents=["Refund policy for enterprise plans...", "Billing cycle details..."],
    metadatas=[{"source": "policies.md"}, {"source": "billing.md"}]
)

# Query
results = collection.query(
    query_texts=["What is the refund policy?"],
    n_results=5,
    where={"source": "policies.md"}
)

for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"[{1-dist:.3f}] {doc[:80]}...")

Pros: Simple API. Embedded mode has zero network overhead. Free and open-source. Great for prototyping.

Cons: Single-node only in embedded mode. Limited production tooling (no built-in backups, monitoring). Performance degrades past a few million vectors.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Try Live Demo → Book 30-min Walkthrough See Pricing

Option 4: Weaviate — Hybrid Search Built-In

Weaviate is an open-source vector database that natively supports both vector search and keyword (BM25) search, making hybrid retrieval straightforward.

import weaviate
import weaviate.classes.config as wc

client = weaviate.connect_to_local()  # or connect_to_weaviate_cloud()

# Define a collection with vectorizer
collection = client.collections.create(
    name="Document",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    properties=[
        wc.Property(name="content", data_type=wc.DataType.TEXT),
        wc.Property(name="source", data_type=wc.DataType.TEXT),
    ]
)

# Insert — Weaviate auto-embeds the content
collection.data.insert({"content": "Refund policy...", "source": "policies.md"})

# Hybrid search (vector + keyword)
results = collection.query.hybrid(
    query="refund policy enterprise",
    alpha=0.7,  # 0=pure keyword, 1=pure vector
    limit=5,
)

for obj in results.objects:
    print(f"[{obj.metadata.score:.3f}] {obj.properties['content'][:80]}...")

client.close()

Pros: Native hybrid search. Auto-vectorization. Multi-tenancy support. Active open-source community.

Cons: Heavier operational footprint. Steeper learning curve. Java-based runtime requires more memory.

Quick Comparison Table

Feature	pgvector	Pinecone	Chroma	Weaviate
Hosting	Self-managed	Managed	Either	Either
Hybrid search	Manual BM25	Keyword filter only	No	Native
Max scale	~10M vectors	Billions	~5M vectors	~100M vectors
Best for	Postgres shops	Zero-ops teams	Prototyping	Hybrid search

FAQ

Can I start with Chroma and migrate to Pinecone or pgvector later?

Yes. Your embedding vectors are portable — they are just arrays of floats. Export them from Chroma and import into any other vector store. The main migration effort is adapting your query code and metadata schema to the target system's API.

Should I use a vector database or just compute cosine similarity in application code?

For under 10,000 documents, brute-force cosine similarity in NumPy is fast enough and simpler. Beyond that, ANN indexes in a vector database provide sub-linear search time that brute force cannot match. The crossover point where a vector DB becomes necessary is typically around 50K-100K vectors.

Is pgvector production-ready?

Yes. pgvector is used in production at companies of all sizes. With HNSW indexing, it handles millions of vectors with low-millisecond query times. The main limitation is that it runs on a single PostgreSQL node, so if you need distributed vector search across billions of vectors, a purpose-built solution like Pinecone or Weaviate is more appropriate.

#RAG #VectorDatabase #Pgvector #Pinecone #Chroma #Weaviate #AgenticAI #LearnAI #AIEngineering

Vector Databases for RAG: Comparing pgvector, Pinecone, Chroma, and Weaviate

Why Vector Databases Matter for RAG

Option 1: pgvector — Vectors Inside PostgreSQL

Option 2: Pinecone — Fully Managed Cloud Vector DB

Option 3: Chroma — Open-Source and Embedded

Option 4: Weaviate — Hybrid Search Built-In

Quick Comparison Table

FAQ

Can I start with Chroma and migrate to Pinecone or pgvector later?

Should I use a vector database or just compute cosine similarity in application code?

Is pgvector production-ready?

Try CallSphere AI Voice Agents

Related Articles You May Like

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Enterprise CIO Guide: Retell AI Knowledge Base — RAG Goes Native in Voice