Cost Math for Vector Databases at Scale: Storage, Compute, and Egress
Per-vector cost economics matter at scale. The 2026 numbers for storage, compute, egress, and how to model TCO.
What Costs Money in Vector DBs
Three lines:
- Storage (the vectors and the index)
- Compute (queries, inserts, indexing)
- Egress (data transfer out of the cloud)
Plus operational overhead: monitoring, backups, ops staff. At small scale these are noise. At 100M+ vectors they decide whether the project is viable.
The Storage Math
A 1024-dim float32 vector is 4 KB. With HNSW graph overhead (typically 2-3x the raw vectors):
- 1M vectors: ~12 GB
- 10M: ~120 GB
- 100M: ~1.2 TB
- 1B: ~12 TB
Quantization changes these:
- int8: divide by ~3
- binary: divide by ~30
- Matryoshka 512: divide by ~2
For a 100M-vector corpus with int8 quantization, you fit in 400 GB — manageable on a single beefy node.
The Compute Math
Vector queries are CPU/GPU-bound on the HNSW traversal. Cost depends on:
Hear it before you finish reading
Talk to a live CallSphere AI voice agent in your browser — 60 seconds, no signup.
- Index size (in RAM)
- Query rate
- Top-K
- Reranking compute
For 1000 QPS on a 10M-vector HNSW index in 2026, a typical 16-core, 64GB-RAM instance suffices. Cost: hundreds of dollars per month on cloud, less on dedicated hardware.
For 10x QPS, you typically need horizontal scaling — replicas, not bigger nodes.
The Egress Math
Cloud providers charge for egress. If your vector DB is in cloud A and your application is in cloud B, every query result moves money.
Mitigations:
- Co-locate vector DB and application in the same region
- Use private connectivity (PrivateLink, Interconnect) for cross-region
- Process at the vector DB and return only summaries
For high-volume systems, egress can be 20-40 percent of vector DB costs.
Cost Curves by Scale
flowchart LR
Small[1M vectors] --> Cost1[~50/mo cloud]
Mid[10M] --> Cost2[~500/mo cloud]
Large[100M] --> Cost3[~3-8K/mo cloud]
XL[1B] --> Cost4[~30-100K/mo cloud]
Numbers vary widely by provider and configuration. The shape: cost scales roughly linearly with vector count when the index fits in RAM; jumps when you cross hardware boundaries.
Self-Hosted vs Managed
Managed vector DBs (Pinecone, Qdrant Cloud, Weaviate Cloud) are easy but more expensive at scale. The 2026 crossover for most workloads:
Still reading? Stop comparing — try CallSphere live.
CallSphere ships complete AI voice agents per industry — 14 tools for healthcare, 10 agents for real estate, 4 specialists for salons. See how it actually handles a call before you book a demo.
- Up to ~10M vectors: managed wins on ergonomics
- 10M-100M: depends on team capability
- 100M+: self-hosted typically substantially cheaper
Self-hosted requires monitoring, backup, and incident response — real ops cost.
Hidden Costs
Beyond the headline:
- Re-embedding when the model upgrades (compute + egress)
- Backups (storage cost ~ 1-3x of primary)
- Replicas (multiply primary cost)
- Multi-region (multiply primary cost; egress between regions)
- Compliance (BAA, residency, audits)
For a typical mid-sized deployment, hidden costs add 30-100 percent to the headline cost.
TCO Modeling
For a credible TCO model:
- Vector storage cost
- Index overhead (1-3x storage)
- Replicas (typically 2-3 for HA)
- Backup storage (1-3x primary)
- Compute for queries (peak QPS × hours)
- Egress (per-query × volume)
- Re-embedding per year (corpus size × frequency)
- Operational labor (10-20 percent of compute cost)
Forecast over 3 years for the right capex/opex picture.
Cost-Reduction Levers
- Quantization (4-30x storage reduction)
- Matryoshka truncation (2-4x reduction)
- Hot/cold tiering (cold tier on cheaper storage)
- Read replicas instead of larger primaries
- Co-location to eliminate egress
- Caching at the application layer (avoids repeated queries)
What CallSphere Spends
For our blog dedup system on pgvector with ~3K vectors, the cost is essentially zero (covered by the existing Postgres instance). For our agent memory layer at higher scale, we run Qdrant on a dedicated VM — costs in the low hundreds per month.
For the volumes most teams operate at, vector DB cost is a minor line item. It becomes major only at very large scale.
Sources
- Pinecone pricing — https://www.pinecone.io/pricing
- Qdrant Cloud pricing — https://qdrant.tech/pricing
- AWS S3 + EC2 calculators — https://calculator.aws
- "Vector DB cost analysis" — https://thenewstack.io
- "Cloud egress costs" — https://www.cloudflare.com/the-net
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.