Vector Databases in 2026: pgvector, Pinecone, and the Rise of Hybrid Search
on Vector database, Ai, Rag, Pgvector, Pinecone, Embeddings, Machine learning
The RAG Revolution Needed a New Kind of Storage
Retrieval-Augmented Generation (RAG) changed how we build AI applications. Instead of baking all knowledge into a model’s weights, you retrieve relevant documents at inference time. Suddenly, every team building AI products needed to store and query high-dimensional vectors efficiently.
The vector database market exploded: Pinecone, Weaviate, Qdrant, Milvus, Chroma, LanceDB. And then PostgreSQL quietly shipped pgvector, and a lot of teams reconsidered whether they needed a new database at all.
Photo by Taylor Vick on Unsplash
How Vector Search Works
Before comparing products, a quick primer.
An embedding is a dense numerical representation of content — text, images, code, audio — projected into a high-dimensional space (typically 384 to 3072 dimensions) where semantic similarity corresponds to geometric proximity.
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("BAAI/bge-m3") # 1024-dim, multilingual
texts = [
"How to configure a Kubernetes ingress",
"What is the capital of France?",
"Setting up NGINX as a reverse proxy",
]
embeddings = model.encode(texts)
print(embeddings.shape) # (3, 1024)
# "Configure Kubernetes ingress" and "NGINX reverse proxy"
# will have high cosine similarity despite different keywords
Approximate Nearest Neighbor (ANN) algorithms (HNSW, IVF, LSH) make it practical to find the K most similar vectors among millions in milliseconds.
The Contenders in 2026
pgvector (PostgreSQL Extension)
The most impactful development in the vector space wasn’t a new database — it was a 150KB PostgreSQL extension.
-- Enable extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Add vector column to existing table
ALTER TABLE documents ADD COLUMN embedding vector(1536);
-- Create HNSW index for fast ANN search
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- Semantic search
SELECT id, title, content,
1 - (embedding <=> $1::vector) AS similarity
FROM documents
WHERE 1 - (embedding <=> $1::vector) > 0.75
ORDER BY embedding <=> $1::vector
LIMIT 10;
pgvector 0.7+ added:
- HNSW index with improved recall/performance
- Parallel index build
- Sparse vector support (
sparsevec) - Half-precision storage (
halfvec) — 2x storage savings
Why teams choose it:
- Already running PostgreSQL? Zero new infrastructure.
- ACID transactions across vector and relational data
- Familiar operations team skills
pgvector.rs(the Rust rewrite) handles 1M+ vectors well on modern hardware- Works with Supabase, Neon, RDS, AlloyDB
When it struggles:
- 100M+ vectors at sub-10ms latency requirements
- Multi-tenant SaaS with per-tenant isolation at scale
- Pure throughput: dedicated vector DBs still win benchmarks
Pinecone
The managed vector database that dominated enterprise adoption. Pinecone Serverless (launched 2024) changed the economics:
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
# Serverless index — no pods to manage
index = pc.Index("knowledge-base")
# Upsert vectors with metadata
index.upsert(
vectors=[
{
"id": "doc-001",
"values": embedding, # 1536-dim list
"metadata": {
"title": "Getting Started with Kubernetes",
"category": "infrastructure",
"updated_at": "2026-05-01",
"source": "internal-wiki",
}
}
],
namespace="engineering-docs",
)
# Query with metadata filter
results = index.query(
vector=query_embedding,
top_k=10,
filter={"category": {"$in": ["infrastructure", "devops"]}},
namespace="engineering-docs",
include_metadata=True,
)
Pinecone strengths:
- Managed fully — zero ops overhead
- Excellent at 10M-1B+ vector scale
- Namespace isolation for multi-tenancy
- 99.99% SLA with enterprise support
- Hybrid dense+sparse search
Downsides: Cost at scale is real. At 100M vectors, you’re spending meaningful money. And vendor lock-in is complete — there’s no self-hosted option.
Qdrant
The open-source vector database that became the “serious engineering team” choice:
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
client = QdrantClient(url="http://localhost:6333")
# Create collection with named vectors (multi-vector support)
client.create_collection(
collection_name="products",
vectors_config={
"text": VectorParams(size=768, distance=Distance.COSINE),
"image": VectorParams(size=512, distance=Distance.DOT),
},
)
# Insert with payload
client.upsert(
collection_name="products",
points=[
PointStruct(
id=1,
vector={
"text": text_embedding,
"image": image_embedding,
},
payload={
"name": "Mechanical Keyboard",
"price": 149.99,
"in_stock": True,
"tags": ["hardware", "peripherals"],
},
)
],
)
# Multi-vector search with payload filter
results = client.search(
collection_name="products",
query_vector=("text", query_embedding),
query_filter={"must": [{"key": "in_stock", "match": {"value": True}}]},
limit=10,
)
Qdrant standouts:
- Self-hosted (Docker, Kubernetes) or Qdrant Cloud
- Written in Rust — impressive performance
- Payload indexing for fast pre-filtering
- Sparse vector + dense hybrid search native
- Multi-vector support per point
- WASM-based client for edge deployments
LanceDB
The youngest entrant but the most interesting for AI workloads:
import lancedb
import numpy as np
db = lancedb.connect("./lancedb") # Local or cloud (lancedb.connect("s3://bucket/path"))
table = db.open_table("documents")
# Lance columnar format is queryable without loading into memory
results = (
table.search(query_embedding)
.metric("cosine")
.limit(10)
.where("category = 'technical'")
.select(["id", "title", "content"])
.to_pandas()
)
LanceDB uses the Lance columnar format — designed for ML workloads. It supports storing the raw data (text, images, video) alongside embeddings, enabling multimodal retrieval without separate storage systems.
Hybrid Search: The Production Standard
Pure vector search has a well-known weakness: it misses exact matches. A search for “CVE-2024-1234” or “getUserById” performs poorly in embedding space. Production RAG systems combine:
- Dense retrieval: Semantic similarity via embeddings
- Sparse retrieval: BM25/TF-IDF keyword matching (exact and partial)
- Reciprocal Rank Fusion (RRF): Combine and re-rank results
from qdrant_client.models import SparseVector, SparseIndexParams
# Hybrid search in Qdrant
results = client.query_points(
collection_name="docs",
prefetch=[
# Dense retrieval
models.Prefetch(query=dense_vector, using="text-dense", limit=50),
# Sparse retrieval (BM25)
models.Prefetch(
query=models.SparseVector(indices=sparse_indices, values=sparse_values),
using="text-sparse",
limit=50,
),
],
query=models.FusionQuery(fusion=models.Fusion.RRF), # Re-rank via RRF
limit=10,
)
Photo by Mariia Shalabaieva on Unsplash
Choosing the Right Tool
| Use Case | Recommended |
|---|---|
| Existing PostgreSQL, <5M vectors | pgvector |
| Managed, 10M+ vectors, enterprise | Pinecone Serverless |
| Self-hosted, need full control | Qdrant |
| AI pipelines + raw data co-location | LanceDB |
| Supabase/Neon stack | pgvector (built-in) |
| Multi-modal (text + images) | LanceDB or Qdrant |
Embedding Models Matter As Much As the Database
The choice of embedding model often impacts retrieval quality more than the vector store:
| Model | Dimensions | Best For |
|---|---|---|
text-embedding-3-small (OpenAI) | 1536 | General purpose, English |
text-embedding-3-large (OpenAI) | 3072 | High-accuracy, English |
BAAI/bge-m3 | 1024 | Multilingual, hybrid search |
nomic-embed-text | 768 | Local deployment, good quality |
jina-embeddings-v3 | 1024 | Long documents, multilingual |
For Korean content (increasingly relevant given K-language AI growth), BAAI/bge-m3 and jina-embeddings-v3 consistently outperform English-optimized models.
Conclusion
The vector database landscape has matured from “grab whatever works” to a clear set of tradeoffs. For most teams in 2026:
- Start with pgvector if you’re on PostgreSQL and scale is modest. The operational simplicity is underrated.
- Graduate to Qdrant (self-hosted) or Pinecone (managed) when you hit pgvector’s limits.
- Always use hybrid search in production — pure dense retrieval leaves too many relevant results on the table.
The real competitive advantage isn’t the vector store — it’s the quality of your embeddings, the freshness of your data, and how well you re-rank and post-process results before sending to the LLM. The database is plumbing. The strategy is what wins.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
