Vector Databases Explained: The Secret Sauce Behind Modern AI Applications
in Ai on Vector-database, Ai, Embeddings, Rag, Pinecone, Weaviate, Llm, Machine-learning
Vector Databases Explained: The Secret Sauce Behind Modern AI Applications
Every AI application you’ve used recently — ChatGPT with memory, semantic search, recommendation engines — relies on vector databases. Here’s everything you need to know to build with them.
Photo by DeepMind on Unsplash
What is a Vector Database?
Traditional databases store data in rows and columns. Vector databases store embeddings — high-dimensional numerical representations of data.
Text: "The quick brown fox jumps over the lazy dog"
↓ Embedding Model
Vector: [0.23, -0.45, 0.12, 0.89, ..., 0.34] # 1536 dimensions
Why does this matter? Similar meanings produce similar vectors. “I love pizza” and “Pizza is my favorite food” have nearly identical vectors, even though the words differ.
The Math (Simplified)
Similarity is measured by cosine similarity or Euclidean distance:
import numpy as np
def cosine_similarity(a, b):
return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))
# Similar sentences → similarity ≈ 0.95
# Different sentences → similarity ≈ 0.20
Vector Database Comparison
| Database | Type | Hosting | Best For | Pricing |
|---|---|---|---|---|
| Pinecone | Managed | Cloud only | Production at scale | $70/mo+ |
| Weaviate | Open source | Self-host or Cloud | Hybrid search | Free / Managed |
| Milvus | Open source | Self-host or Zilliz | Large scale | Free / Managed |
| Qdrant | Open source | Self-host or Cloud | High performance | Free / Managed |
| pgvector | Extension | PostgreSQL | Simple use cases | Free |
| Chroma | Open source | Self-host or Cloud | Local dev, prototyping | Free |
Quick Recommendations
- Just getting started? → pgvector (if you have Postgres) or Chroma
- Production RAG? → Pinecone or Qdrant Cloud
- Self-hosting at scale? → Milvus or Qdrant
- Need hybrid search? → Weaviate
Photo by Luke Chesser on Unsplash
Core Concepts
1. Embeddings
Vectors are created by embedding models. The most common:
| Model | Dimensions | Provider | Best For |
|---|---|---|---|
| text-embedding-3-large | 3072 | OpenAI | General purpose |
| text-embedding-3-small | 1536 | OpenAI | Cost-effective |
| voyage-3 | 1024 | Voyage AI | Retrieval-focused |
| bge-large-en-v1.5 | 1024 | BAAI (open) | Self-hosting |
| mxbai-embed-large | 1024 | Mixedbread | Open, high quality |
2. Indexing
Vector databases use specialized indexes for fast similarity search:
- HNSW (Hierarchical Navigable Small World) — Best for most cases
- IVF (Inverted File Index) — Good for large datasets
- Flat — Exact search, slow but accurate
3. Distance Metrics
| Metric | Use When |
|---|---|
| Cosine | Text similarity (most common) |
| Euclidean (L2) | When magnitude matters |
| Dot Product | Normalized vectors |
Building a RAG Application
The most common use case. Let’s build one step by step.
Step 1: Create Embeddings
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
# Embed your documents
documents = [
"Python is a programming language.",
"Machine learning uses statistical models.",
"Vector databases store embeddings.",
]
embeddings = [get_embedding(doc) for doc in documents]
Step 2: Store in Vector Database
With Pinecone:
from pinecone import Pinecone
pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-index")
# Upsert vectors
vectors = [
{"id": f"doc_{i}", "values": emb, "metadata": {"text": doc}}
for i, (doc, emb) in enumerate(zip(documents, embeddings))
]
index.upsert(vectors=vectors)
With pgvector:
import psycopg2
from pgvector.psycopg2 import register_vector
conn = psycopg2.connect("postgresql://...")
register_vector(conn)
cur = conn.cursor()
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id SERIAL PRIMARY KEY,
content TEXT,
embedding vector(1536)
)
""")
for doc, emb in zip(documents, embeddings):
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(doc, emb)
)
conn.commit()
With Qdrant:
from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct
client = QdrantClient(url="http://localhost:6333")
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)
points = [
PointStruct(id=i, vector=emb, payload={"text": doc})
for i, (doc, emb) in enumerate(zip(documents, embeddings))
]
client.upsert(collection_name="documents", points=points)
Step 3: Search
def search(query: str, top_k: int = 5):
query_embedding = get_embedding(query)
# Pinecone
results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
# pgvector
cur.execute("""
SELECT content, embedding <=> %s::vector AS distance
FROM documents
ORDER BY distance
LIMIT %s
""", (query_embedding, top_k))
# Qdrant
results = client.search(
collection_name="documents",
query_vector=query_embedding,
limit=top_k
)
return results
Step 4: Generate with Context (RAG)
def rag_query(question: str) -> str:
# Search for relevant context
results = search(question, top_k=3)
context = "\n".join([r.payload["text"] for r in results])
# Generate answer with context
response = client.chat.completions.create(
model="gpt-4",
messages=[
{"role": "system", "content": f"Answer based on this context:\n{context}"},
{"role": "user", "content": question}
]
)
return response.choices[0].message.content
Advanced Patterns
Hybrid Search
Combine vector (semantic) and keyword (BM25) search:
# Weaviate hybrid search
result = client.query.get("Document", ["content"])\
.with_hybrid(query="machine learning", alpha=0.5)\
.with_limit(5)\
.do()
Filtering
Most vector DBs support metadata filtering:
# Pinecone
results = index.query(
vector=query_embedding,
filter={"category": {"$eq": "technology"}, "year": {"$gte": 2024}},
top_k=10
)
# Qdrant
from qdrant_client.models import Filter, FieldCondition, MatchValue
results = client.search(
collection_name="docs",
query_vector=query_embedding,
query_filter=Filter(
must=[FieldCondition(key="category", match=MatchValue(value="technology"))]
)
)
Chunking Strategies
Documents too long for embedding models need chunking:
from langchain.text_splitter import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=512,
chunk_overlap=50,
separators=["\n\n", "\n", ". ", " "]
)
chunks = splitter.split_text(long_document)
Chunking best practices:
- 256-512 tokens for precise retrieval
- 512-1024 tokens for more context
- Always include overlap
- Preserve semantic boundaries (paragraphs, sentences)
Re-ranking
Improve results with a re-ranker:
from sentence_transformers import CrossEncoder
reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")
def search_with_rerank(query: str, top_k: int = 5):
# Get initial results (fetch more than needed)
initial_results = search(query, top_k=20)
# Re-rank
pairs = [[query, r.payload["text"]] for r in initial_results]
scores = reranker.predict(pairs)
# Sort by reranker score
reranked = sorted(zip(initial_results, scores), key=lambda x: x[1], reverse=True)
return [r for r, s in reranked[:top_k]]
Performance Optimization
Indexing Parameters
For HNSW (most common):
| Parameter | Low | Medium | High |
|---|---|---|---|
| ef_construction | 128 | 256 | 512 |
| M | 16 | 32 | 64 |
| ef_search | 64 | 128 | 256 |
Higher values = better accuracy, slower speed, more memory.
Batch Operations
Always batch inserts:
# Good - batch insert
index.upsert(vectors=batch_of_1000)
# Bad - one at a time
for v in vectors:
index.upsert(vectors=[v])
Dimension Reduction
Smaller dimensions = faster search:
# OpenAI supports dimension reduction
response = client.embeddings.create(
model="text-embedding-3-large",
input=text,
dimensions=1024 # Reduced from 3072
)
Pricing Comparison (2026)
| Service | Free Tier | Starter | Production |
|---|---|---|---|
| Pinecone | 100K vectors | $70/mo (1M) | Custom |
| Weaviate Cloud | 100K vectors | $25/mo | Custom |
| Qdrant Cloud | 1GB | $25/mo | Custom |
| Zilliz (Milvus) | 100K vectors | $65/mo | Custom |
| pgvector | Self-host | Self-host | Self-host |
When to Use What
pgvector: You have Postgres, <1M vectors, simple queries
Pinecone: Production RAG, managed solution, need reliability
Qdrant: High performance, good filtering, open source preference
Weaviate: Hybrid search, multi-modal, GraphQL API
Milvus: Massive scale (billions of vectors), distributed
Chroma: Prototyping, local development, quick experiments
Conclusion
Vector databases are the foundation of modern AI applications. Whether you’re building a chatbot with memory, semantic search, or recommendation engine, you’ll need to understand embeddings and vector search.
Quick start path:
- Start with pgvector or Chroma for prototyping
- Learn chunking and embedding strategies
- Move to managed (Pinecone/Qdrant Cloud) for production
- Add hybrid search and re-ranking as needed
The technology is mature enough for production. The main decisions are hosting (managed vs self-hosted) and scale (millions vs billions of vectors).
Building with vectors? The nearest neighbor is closer than you think.
이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)
