Skip to content
Learn Agentic AI
Learn Agentic AI12 min read23 views

Vector Databases for RAG: Comparing pgvector, Pinecone, Chroma, and Weaviate

A practical comparison of four popular vector databases for RAG — pgvector, Pinecone, Chroma, and Weaviate — covering setup, indexing, query performance, and when to choose each one.

Why Vector Databases Matter for RAG

The retrieval step in RAG depends on finding document chunks whose embedding vectors are closest to the query vector. A vector database is purpose-built for this operation: it stores high-dimensional vectors, builds indexes for fast approximate nearest-neighbor (ANN) search, and returns results in milliseconds even across millions of documents.

Choosing the right vector database depends on your scale, infrastructure preferences, and operational requirements. This post gives you working code and honest tradeoffs for four leading options.

Option 1: pgvector — Vectors Inside PostgreSQL

pgvector is a PostgreSQL extension that adds vector data types and similarity search operators. If you already run Postgres, this is the lowest-friction path to production RAG.

flowchart LR
    Q(["User query"])
    EMB["Embed query<br/>text-embedding-3"]
    VEC[("Vector DB<br/>pgvector or Pinecone")]
    RET["Top-k retrieval<br/>k = 8"]
    PROMPT["Augmented prompt<br/>system plus context"]
    LLM["LLM generation<br/>Claude or GPT"]
    CITE["Inline citations<br/>and page anchors"]
    OUT(["Grounded answer"])
    Q --> EMB --> VEC --> RET --> PROMPT --> LLM --> CITE --> OUT
    style EMB fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style VEC fill:#ede9fe,stroke:#7c3aed,color:#1e1b4b
    style LLM fill:#4f46e5,stroke:#4338ca,color:#fff
    style OUT fill:#059669,stroke:#047857,color:#fff
import psycopg2
import numpy as np

conn = psycopg2.connect("postgresql://user:pass@localhost/ragdb")
cur = conn.cursor()

# Enable the extension
cur.execute("CREATE EXTENSION IF NOT EXISTS vector")

# Create a table with a vector column
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT NOT NULL,
        metadata JSONB DEFAULT '{}',
        embedding vector(1536)
    )
""")

# Insert a document with its embedding
embedding = np.random.rand(1536).tolist()  # replace with real embedding
cur.execute(
    "INSERT INTO documents (content, metadata, embedding) VALUES (%s, %s, %s)",
    ("Refund policy for enterprise...", '{"source": "policies.md"}', str(embedding))
)

# Query: find 5 nearest neighbors
query_vec = np.random.rand(1536).tolist()
cur.execute("""
    SELECT id, content, 1 - (embedding <=> %s::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> %s::vector
    LIMIT 5
""", (str(query_vec), str(query_vec)))

results = cur.fetchall()
for doc_id, content, sim in results:
    print(f"[{sim:.3f}] {content[:80]}...")

conn.commit()

Create an HNSW index for fast queries at scale:

CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);

Pros: No new infrastructure — lives in your existing Postgres. Full SQL joins with relational data. ACID transactions. Metadata filtering via standard WHERE clauses.

Hear it before you finish reading

Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.

Try Live Demo →

Cons: Slower than purpose-built vector DBs at very large scale (50M+ vectors). Limited to single-node without partitioning.

Option 2: Pinecone — Fully Managed Cloud Vector DB

Pinecone is a managed service that handles scaling, replication, and index management. You interact through an API — no servers to operate.

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="your-api-key")

# Create an index
pc.create_index(
    name="rag-docs",
    dimension=1536,
    metric="cosine",
    spec=ServerlessSpec(cloud="aws", region="us-east-1")
)

index = pc.Index("rag-docs")

# Upsert vectors
index.upsert(vectors=[
    {
        "id": "doc-001",
        "values": embedding_vector,
        "metadata": {"source": "policies.md", "category": "billing"}
    }
])

# Query with metadata filter
results = index.query(
    vector=query_vector,
    top_k=5,
    include_metadata=True,
    filter={"category": {"$eq": "billing"}}
)

for match in results["matches"]:
    print(f"[{match['score']:.3f}] {match['id']} — {match['metadata']}")

Pros: Zero infrastructure management. Scales to billions of vectors. Built-in metadata filtering. SOC2 compliant.

Cons: Vendor lock-in. Network latency for every query. Monthly costs grow with scale. Data leaves your network.

Option 3: Chroma — Open-Source and Embedded

Chroma is an open-source embedding database designed for simplicity. It can run in-process (embedded) or as a client-server deployment.

import chromadb
from chromadb.utils import embedding_functions

# In-memory or persistent
client = chromadb.PersistentClient(path="./chroma_db")

# Use OpenAI embeddings automatically
ef = embedding_functions.OpenAIEmbeddingFunction(
    model_name="text-embedding-3-small"
)

collection = client.get_or_create_collection(
    name="documents",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"}
)

# Add documents — Chroma embeds them automatically
collection.add(
    ids=["doc-001", "doc-002"],
    documents=["Refund policy for enterprise plans...", "Billing cycle details..."],
    metadatas=[{"source": "policies.md"}, {"source": "billing.md"}]
)

# Query
results = collection.query(
    query_texts=["What is the refund policy?"],
    n_results=5,
    where={"source": "policies.md"}
)

for doc, dist in zip(results["documents"][0], results["distances"][0]):
    print(f"[{1-dist:.3f}] {doc[:80]}...")

Pros: Simple API. Embedded mode has zero network overhead. Free and open-source. Great for prototyping.

Cons: Single-node only in embedded mode. Limited production tooling (no built-in backups, monitoring). Performance degrades past a few million vectors.

Still reading? Stop comparing — try CallSphere live.

CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.

Option 4: Weaviate — Hybrid Search Built-In

Weaviate is an open-source vector database that natively supports both vector search and keyword (BM25) search, making hybrid retrieval straightforward.

import weaviate
import weaviate.classes.config as wc

client = weaviate.connect_to_local()  # or connect_to_weaviate_cloud()

# Define a collection with vectorizer
collection = client.collections.create(
    name="Document",
    vectorizer_config=wc.Configure.Vectorizer.text2vec_openai(
        model="text-embedding-3-small"
    ),
    properties=[
        wc.Property(name="content", data_type=wc.DataType.TEXT),
        wc.Property(name="source", data_type=wc.DataType.TEXT),
    ]
)

# Insert — Weaviate auto-embeds the content
collection.data.insert({"content": "Refund policy...", "source": "policies.md"})

# Hybrid search (vector + keyword)
results = collection.query.hybrid(
    query="refund policy enterprise",
    alpha=0.7,  # 0=pure keyword, 1=pure vector
    limit=5,
)

for obj in results.objects:
    print(f"[{obj.metadata.score:.3f}] {obj.properties['content'][:80]}...")

client.close()

Pros: Native hybrid search. Auto-vectorization. Multi-tenancy support. Active open-source community.

Cons: Heavier operational footprint. Steeper learning curve. Java-based runtime requires more memory.

Quick Comparison Table

Feature pgvector Pinecone Chroma Weaviate
Hosting Self-managed Managed Either Either
Hybrid search Manual BM25 Keyword filter only No Native
Max scale ~10M vectors Billions ~5M vectors ~100M vectors
Best for Postgres shops Zero-ops teams Prototyping Hybrid search

FAQ

Can I start with Chroma and migrate to Pinecone or pgvector later?

Yes. Your embedding vectors are portable — they are just arrays of floats. Export them from Chroma and import into any other vector store. The main migration effort is adapting your query code and metadata schema to the target system's API.

Should I use a vector database or just compute cosine similarity in application code?

For under 10,000 documents, brute-force cosine similarity in NumPy is fast enough and simpler. Beyond that, ANN indexes in a vector database provide sub-linear search time that brute force cannot match. The crossover point where a vector DB becomes necessary is typically around 50K-100K vectors.

Is pgvector production-ready?

Yes. pgvector is used in production at companies of all sizes. With HNSW indexing, it handles millions of vectors with low-millisecond query times. The main limitation is that it runs on a single PostgreSQL node, so if you need distributed vector search across billions of vectors, a purpose-built solution like Pinecone or Weaviate is more appropriate.


#RAG #VectorDatabase #Pgvector #Pinecone #Chroma #Weaviate #AgenticAI #LearnAI #AIEngineering

Share

Try CallSphere AI Voice Agents

See how AI voice agents work for your industry. Live demo available -- no signup required.

Related Articles You May Like

AI Engineering

Build a Chat Agent with Haystack RAG + Open LLM (Llama 3.2, 2026)

Haystack 2.7's Agent component plus an Ollama-served Llama 3.2 gives you tool-calling RAG with citations. Here's a complete pipeline against your own document store.

Agentic AI

Agentic RAG with LangGraph: Iterative Retrieval, Self-Correction, and Eval Pipelines

Beyond single-shot RAG — agentic RAG with LangGraph that re-retrieves, self-grades, and rewrites queries. With evals that catch silent retrieval drift.

Agentic AI

Production RAG Agents with LangChain and RAGAS Evaluation in 2026

Build a production RAG agent with LangChain, then measure faithfulness, answer relevance, and context precision with RAGAS. The four metrics that matter and how to wire them up.

AI Engineering

Cognee: Knowledge-Graph Memory for Agents — A Getting-Started Guide

Cognee builds and queries a knowledge graph from your unstructured data automatically. A walkthrough from install to your first agent integration in production.

AI Infrastructure

Database Backup and Recovery for AI Agent State: Postgres + pgvector

Your agent's memory, embeddings, and conversation state all live in Postgres. Backups must include vector data and survive a full-region loss. Here's how CallSphere does PITR for 115+ tables.

AI Strategy

Enterprise CIO Guide: Retell AI Knowledge Base — RAG Goes Native in Voice

Enterprise CIO Guide perspective on Retell shipped first-class knowledge bases for voice agents, removing one of the last reasons to roll your own RAG layer.