pgvector vs Pinecone: A 90-Day Production Benchmark (2026)
4.8M documents, 110k daily queries, real users. The cost and latency numbers, not the marketing.
Table of Contents
- The Workload
- The Systems Under Test
- Cost Breakdown
- Latency
- Update Latency: Where pgvector Wins
- Operational Complexity: Where Pinecone Wins
- When Pinecone Genuinely Wins
- When pgvector Genuinely Wins
- The Wildcards: Qdrant, Weaviate, Turbopuffer
- The Pinecone Investor Question
- Migration Path, In Case You're Already on Pinecone
- Tuning pgvector for the Specific Workload Matters
- A Note on Embedding Model Choice
- So Which One Should You Pick?
Table of Contents
- The Workload
- The Systems Under Test
- Cost Breakdown
- Latency
- Update Latency: Where pgvector Wins
- Operational Complexity: Where Pinecone Wins
- When Pinecone Genuinely Wins
- When pgvector Genuinely Wins
- The Wildcards: Qdrant, Weaviate, Turbopuffer
- The Pinecone Investor Question
- Migration Path, In Case You're Already on Pinecone
- Tuning pgvector for the Specific Workload Matters
- A Note on Embedding Model Choice
- So Which One Should You Pick?
Pinecone's leaked S-1 in April 2026 surfaced what most infrastructure engineers already suspected: enterprise revenue is concentrated in 200-some accounts, growth has flattened, and pgvector has eaten the long tail. I ran a 90-day production benchmark across both systems on the same workload (semantic search over a 4.8M-document content corpus with 110k daily queries), and the cost-performance gap is not what marketing pages claim.
This is what we actually measured, the failure modes we hit, and where each system genuinely wins.
The Workload
The benchmark target was a content recommendation system for a media platform: a corpus of articles with OpenAI text-embedding-3-small embeddings (1536 dimensions), updated continuously as new content is published. Search queries are user-context vectors. Latency budget: 95ms p95 end-to-end. Update budget: new article searchable within 60 seconds of publish.
For people who want to think better, not scroll more
Most people consume content. A few use it to gain clarity.
Get a curated set of ideas, insights, and breakdowns — that actually help you understand what’s going on.
No noise. No spam. Just signal.
One issue every Tuesday. No spam. Unsubscribe in one click.
Test setup:
- 4,832,114 documents at peak corpus size.
- 110,400 search queries/day average.
- 1,400 new documents/day.
- 800 document updates/day.
- 50 concurrent query workers at peak.
We ran each system in production for 30 days each, with a 30-day overlap period where we A/B'd traffic. Real users, real content, real money. The product surface was identical: a "More like this" widget at the bottom of every article, identical to the architecture described in browser AI's impact on edge function bills, except instead of running inference at the edge we were running similarity search at the data layer.
The Systems Under Test
Pinecone: serverless tier, US-East-1, p1.x1 indexes. Auto-scaling enabled. Two indexes (corpus + user-context), dotproduct metric.
pgvector: Neon Postgres, scale plan, pgvector 0.8.0. HNSW index (m=16, ef_construction=64). Same metric. Single Neon project, autoscaling 2-8 CU.
Both systems served traffic from a Next.js API route on Vercel edge functions, with identical caching and identical reranking via Cohere Rerank 3.5. Same recall-tuning approach. Same observability stack (PostHog + Honeycomb).
Cost Breakdown
This is the headline:
| Cost Component | Pinecone | pgvector (Neon) | |---|---|---| | Storage (4.8M vectors) | $487/mo | included in DB | | Query operations (3.3M/mo) | $1,260/mo | included in DB | | Database compute | n/a | $340/mo | | Database storage | $98/mo (metadata) | $190/mo (everything) | | Bandwidth | $42/mo | $18/mo | | Total monthly | $1,887 | $548 |
Pinecone cost 3.4x what pgvector cost for this workload. The ratio holds roughly down to 500k vectors and roughly up to 20M vectors. Above 50M vectors the equation changes. pgvector requires more careful tuning and the Neon compute cost starts to scale faster than Pinecone's tier pricing.
A few things to note about the comparison. The Neon storage number is higher because it includes the full article metadata, not just the vectors. If you have similar metadata stored elsewhere with Pinecone, the gap narrows. The Pinecone bandwidth was higher mostly because of larger response payloads from the metadata-filter API. Pinecone returns metadata inline by default, while pgvector lets us project only the columns we need.
We also ran a sanity check on Pinecone's reserved-capacity pricing, which is marketed at "up to 60% savings" but in practice required a 12-month commit. For our workload, even reserved capacity put Pinecone at $1,140/month. Still more than 2x pgvector and with significantly less flexibility.
Latency
Pinecone wins here, but by less than the marketing implies.
| Latency | Pinecone p50 | Pinecone p95 | pgvector p50 | pgvector p95 | |---|---|---|---|---| | Vector search only | 18ms | 47ms | 23ms | 71ms | | End-to-end (with rerank, cached) | 38ms | 84ms | 41ms | 93ms | | End-to-end (with rerank, uncached) | 71ms | 142ms | 78ms | 156ms |
Pinecone is roughly 20% faster at the vector layer. At the end-to-end level, both fit comfortably in a sub-100ms p95 budget. The difference for users is invisible. Both feel instant.
The end-to-end p95 of 156ms on pgvector pushed us briefly over our 150ms internal SLO, which forced two optimizations:
- Pre-computing user-context vectors on session start instead of on each query.
- Pulling the rerank step out of the critical path for the top 80% of cached queries.
After those two fixes, p95 was 91ms on pgvector. Within budget without needing to switch.
Worth noting: latency on both systems degrades with restrictive metadata filters. Pinecone handles filters better at extreme cardinality (10M+ distinct filter values) because their query engine maintains specialized filter indexes. For our workload, filters were coarse (category, language, publish date range) and both systems handled them equally.
Update Latency: Where pgvector Wins
The forgotten dimension of vector DB choice is how fast new data becomes searchable. We measured the time from "INSERT or upsert to vector DB" to "vector appears in top-k results for a relevant query."
Pinecone: 4-15 seconds typical, occasionally 60+ seconds during their serverless scaling events. We had three incidents during the test month where freshness lagged for 5+ minutes.
pgvector with HNSW: under 100ms. The vector is searchable as soon as the transaction commits.
For a publication where editorial wants articles searchable the moment they hit publish, this matters. For a static catalog updated nightly, it doesn't. The product team at the media client specifically cited freshness as the deciding factor when we presented both options. They'd been burned by delayed indexing in their previous Algolia setup and wanted no surprises.
Operational Complexity: Where Pinecone Wins
I want to be honest about this. pgvector running in production is more work than Pinecone serverless. Specific things we had to deal with:
- Index maintenance. HNSW index size grew faster than we expected. We rebuilt the index once during the test, which required a 4-minute write-pause window. Pinecone you never think about.
- Vacuum and bloat. Heavy update workloads on a vector column cause heap bloat. We tune
autovacuumvacuumscale_factorto 0.05 for the article table. Pinecone you never think about. - Connection pooling. Neon's serverless adapter handles this, but we had to switch from the standard adapter to the pooled adapter mid-test when a traffic spike exhausted connections. Pinecone scales transparently.
- Query planning. Postgres's planner sometimes picks the wrong path between HNSW index and a sequential scan when filters are restrictive. We
SET enable_seqscan = OFFon certain query paths. Pinecone you never write SQL for. - Backup and restore. Backing up an HNSW index is cheap; restoring is not. pg_restore rebuilds the index from scratch, which took 22 minutes on our 4.8M-row table. For DR scenarios this is acceptable but worth planning for.
If you're a three-engineer team without a database operator, the operational cost of pgvector is real. Plan for one engineer-day a month of tuning at meaningful scale. The flip side: most of that engineer-day is reusable knowledge. Postgres skills transfer to a hundred other problems in your stack. Pinecone skills transfer to Pinecone.
When Pinecone Genuinely Wins
After all this, I still recommend Pinecone for some teams. Massive scale (more than 100M vectors, especially with frequent metadata filters). Multi-tenant SaaS where you need per-tenant indexes with isolation guarantees Postgres doesn't naturally provide. Teams without a DB engineer who'd rather pay 3x for managed reliability. Workloads where freshness doesn't matter but absolute query latency does. Sparse-vector hybrid search at high volume, where Pinecone's serverless hybrid index outperforms what's currently easy to do in pgvector (though the gap is narrowing with the pg_trgm + pgvector hybrid patterns that landed in Postgres 17).
When pgvector Genuinely Wins
For most teams I work with, pgvector wins on cost (3-5x cheaper at the 100k-50M scale), freshness (instant search-after-insert), operational simplicity if you already run Postgres, transactional consistency (your vector and your metadata are in the same transaction), joins (SQL joins between vector search results and your business tables are vastly simpler than the metadata-filter pattern Pinecone forces you into), and privacy. Your data never leaves your database. For some compliance regimes (notably anything touching HIPAA-protected health information or EU AI Act-regulated decisioning) this is non-negotiable.
The Wildcards: Qdrant, Weaviate, Turbopuffer
We didn't test these in production but we have data from short tests.
Qdrant: strong feature set, especially around hybrid search and filtering. Open-source. Cost competitive with pgvector if self-hosted. Their cloud offering is reasonable. Particularly strong for image and multimodal search where you need sparse-and-dense hybrid retrieval.
Weaviate: historically the strongest hybrid search story. Cloud pricing has come down since their March 2026 repricing. Their built-in modules for re-ranking and generative search make it appealing for teams that don't want to assemble the pipeline.
Turbopuffer: the newest entrant, S3-backed, very cheap at large scale, latency higher than the others. We've seen teams use it for cold-storage of long-tail vectors with a warm-cache pgvector layer for the top 1%.
If you're starting from scratch and Postgres isn't already in your stack, Qdrant is the strongest non-Postgres option in my view. If Postgres is already in your stack, the case to add another vector database for most workloads is hard to make.
The Pinecone Investor Question
Pinecone's IPO speculation began in late 2025 after they tried and failed to raise at a $5B valuation. The leaked S-1 in April showed slowing growth and customer concentration. If you're betting infrastructure on a vendor whose financials look fragile, that's a risk factor independent of technical fit.
This isn't a Pinecone-is-doomed call. Their tech is solid and their customer base is sticky. But if a vendor goes through an acquisition or aggressive cost-cutting, expect price changes, expect feature freezes, expect the kind of mild enshittification that makes you wish you'd kept your data in a database you control. Teams that keep their vector data in Postgres pay an option premium (operational work) for the freedom to migrate cheaply if the market shifts.
Migration Path, In Case You're Already on Pinecone
If you're considering moving off Pinecone, the migration is more tractable than it looks. We did a parallel test on a different client (12M vectors) in five engineering days:
- Stand up the pgvector instance with the same dimensionality.
- Backfill via Pinecone's
fetchAPI in batches of 100. Roughly six hours wall-clock for 12M vectors. - Set up dual-write to both systems for new vectors.
- Shadow-read from pgvector and compare top-k overlap against Pinecone for one week.
- Cut over once overlap is stable above 95% (it was 97.4% on the first run for us).
The lock-in is operational habit, not data lock-in. Pinecone exports cleanly. Don't let the migration cost narrative scare you out of doing the math.
Tuning pgvector for the Specific Workload Matters
A subtle point: pgvector's defaults are conservative. The single biggest performance lever we found was ef_search at query time, the HNSW search-effort parameter. Defaults to 40. We raised it to 80 for our hot query path and recall went from 94% to 99% with only a 6ms latency cost. For a recommendation widget where users notice missing-but-relevant items more than they notice latency, that trade-off was obviously worth it.
Other tuning that moved the needle:
maintenanceworkmem = 4GB during index builds. Default is 64MB which makes initial index creation glacial.
SET enable_seqscan = OFF on specific query paths where the planner kept choosing a sequential scan over the HNSW index. We later wrapped this in a Postgres function so application code doesn't need to set it manually.
Partial indexes for filtered subsets. We have one HNSW index over published-only articles and another over all articles. Queries that always filter on status hit the partial index and skip a redundant filter step. Roughly 12ms saved per query.
pgvector 0.8.0 added quantization support. We're testing scalar quantization (int8) for the long tail of our corpus. It halves storage with negligible recall loss for our use case.
If you're going to invest in pgvector, plan to spend two engineer-days getting these dials right. The defaults are safe, not optimal.
A Note on Embedding Model Choice
Independent of vector DB choice, the embedding model matters more than people credit. We initially used OpenAI's text-embedding-3-large (3072 dimensions). Switching to text-embedding-3-small (1536 dimensions) halved storage, cut latency 30%, and reduced recall by less than 1% on our evaluation set. For most production workloads, the smaller embedding wins unless you've measurably shown otherwise on your specific corpus.
Cohere's embed-english-v3 is also worth testing. We A/B'd it against OpenAI for two weeks on a different client and got marginally better recall at slightly higher cost. The right answer is to run your own eval. There's no universal best embedding model, and the choice interacts with your domain in ways that benchmark tables never capture.
So Which One Should You Pick?
For most teams in the 100k-50M vector range, which is most teams, pgvector on a managed Postgres host gives you 90% of Pinecone's performance at 30% of the cost, with better freshness and simpler joins. Pinecone is still the right call for very large multi-tenant SaaS and teams without a database engineer. Everyone else should default to Postgres and stop paying the vector-DB premium. The benchmark math has moved decisively in pgvector's favor since the 0.8.0 release, and there's no version of the Pinecone roadmap I've seen that closes the cost gap. Run your own benchmark on your own workload, but the headline number, $548 vs $1,887 for identical product behavior, is the kind of difference that pays for a Senior Database Engineer's time many times over.
💡 Key Takeaways
- Pinecone's leaked S-1 in April 2026 surfaced what most infrastructure engineers already suspected: enterprise revenue is concentrated in 200-some accounts, growth has flattened, and pgvector has eaten the long tail.
- This is what we actually measured, the failure modes we hit, and where each system genuinely wins.
- The benchmark target was a content recommendation system for a media platform: a corpus of articles with OpenAI text-embedding-3-small embeddings (1536 dimensions), updated continuously as new content is published.
Ask AI About This Topic
Get instant answers trained on this exact article.
Frequently Asked Questions
Nilesh Kasar
Community MemberAn active community contributor shaping discussions on Technology.
You Might Also Like
Enjoying this story?
Get more in your inbox
Join 12,000+ readers who get the best stories delivered daily.
Subscribe to The Stack Stories →Nilesh Kasar
Community MemberAn active community contributor shaping discussions on Technology.
The Stack Stories
One thoughtful read, every Tuesday.
Responses
Join the conversation
You need to log in to read or write responses.
No responses yet. Be the first to share your thoughts!