Vector Databases Explained: The Secret Sauce Behind Modern AI Applications

Complete guide to vector databases for AI. Compare Pinecone, Weaviate, Milvus, Qdrant, and pgvector. Learn embeddings, indexing, and RAG implementation.



Vector Databases Explained: The Secret Sauce Behind Modern AI Applications

Every AI application you’ve used recently — ChatGPT with memory, semantic search, recommendation engines — relies on vector databases. Here’s everything you need to know to build with them.

AI Technology Photo by DeepMind on Unsplash

What is a Vector Database?

Traditional databases store data in rows and columns. Vector databases store embeddings — high-dimensional numerical representations of data.

Text: "The quick brown fox jumps over the lazy dog"
                    ↓ Embedding Model
Vector: [0.23, -0.45, 0.12, 0.89, ..., 0.34]  # 1536 dimensions

Why does this matter? Similar meanings produce similar vectors. “I love pizza” and “Pizza is my favorite food” have nearly identical vectors, even though the words differ.

The Math (Simplified)

Similarity is measured by cosine similarity or Euclidean distance:

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# Similar sentences → similarity ≈ 0.95
# Different sentences → similarity ≈ 0.20

Vector Database Comparison

DatabaseTypeHostingBest ForPricing
PineconeManagedCloud onlyProduction at scale$70/mo+
WeaviateOpen sourceSelf-host or CloudHybrid searchFree / Managed
MilvusOpen sourceSelf-host or ZillizLarge scaleFree / Managed
QdrantOpen sourceSelf-host or CloudHigh performanceFree / Managed
pgvectorExtensionPostgreSQLSimple use casesFree
ChromaOpen sourceSelf-host or CloudLocal dev, prototypingFree

Quick Recommendations

  • Just getting started? → pgvector (if you have Postgres) or Chroma
  • Production RAG? → Pinecone or Qdrant Cloud
  • Self-hosting at scale? → Milvus or Qdrant
  • Need hybrid search? → Weaviate

Database Visualization Photo by Luke Chesser on Unsplash

Core Concepts

1. Embeddings

Vectors are created by embedding models. The most common:

ModelDimensionsProviderBest For
text-embedding-3-large3072OpenAIGeneral purpose
text-embedding-3-small1536OpenAICost-effective
voyage-31024Voyage AIRetrieval-focused
bge-large-en-v1.51024BAAI (open)Self-hosting
mxbai-embed-large1024MixedbreadOpen, high quality

2. Indexing

Vector databases use specialized indexes for fast similarity search:

  • HNSW (Hierarchical Navigable Small World) — Best for most cases
  • IVF (Inverted File Index) — Good for large datasets
  • Flat — Exact search, slow but accurate

3. Distance Metrics

MetricUse When
CosineText similarity (most common)
Euclidean (L2)When magnitude matters
Dot ProductNormalized vectors

Building a RAG Application

The most common use case. Let’s build one step by step.

Step 1: Create Embeddings

from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

# Embed your documents
documents = [
    "Python is a programming language.",
    "Machine learning uses statistical models.",
    "Vector databases store embeddings.",
]

embeddings = [get_embedding(doc) for doc in documents]

Step 2: Store in Vector Database

With Pinecone:

from pinecone import Pinecone

pc = Pinecone(api_key="your-api-key")
index = pc.Index("my-index")

# Upsert vectors
vectors = [
    {"id": f"doc_{i}", "values": emb, "metadata": {"text": doc}}
    for i, (doc, emb) in enumerate(zip(documents, embeddings))
]
index.upsert(vectors=vectors)

With pgvector:

import psycopg2
from pgvector.psycopg2 import register_vector

conn = psycopg2.connect("postgresql://...")
register_vector(conn)

cur = conn.cursor()
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        embedding vector(1536)
    )
""")

for doc, emb in zip(documents, embeddings):
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (doc, emb)
    )
conn.commit()

With Qdrant:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1536, distance=Distance.COSINE)
)

points = [
    PointStruct(id=i, vector=emb, payload={"text": doc})
    for i, (doc, emb) in enumerate(zip(documents, embeddings))
]
client.upsert(collection_name="documents", points=points)
def search(query: str, top_k: int = 5):
    query_embedding = get_embedding(query)
    
    # Pinecone
    results = index.query(vector=query_embedding, top_k=top_k, include_metadata=True)
    
    # pgvector
    cur.execute("""
        SELECT content, embedding <=> %s::vector AS distance
        FROM documents
        ORDER BY distance
        LIMIT %s
    """, (query_embedding, top_k))
    
    # Qdrant
    results = client.search(
        collection_name="documents",
        query_vector=query_embedding,
        limit=top_k
    )
    
    return results

Step 4: Generate with Context (RAG)

def rag_query(question: str) -> str:
    # Search for relevant context
    results = search(question, top_k=3)
    context = "\n".join([r.payload["text"] for r in results])
    
    # Generate answer with context
    response = client.chat.completions.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": f"Answer based on this context:\n{context}"},
            {"role": "user", "content": question}
        ]
    )
    return response.choices[0].message.content

Advanced Patterns

Combine vector (semantic) and keyword (BM25) search:

# Weaviate hybrid search
result = client.query.get("Document", ["content"])\
    .with_hybrid(query="machine learning", alpha=0.5)\
    .with_limit(5)\
    .do()

Filtering

Most vector DBs support metadata filtering:

# Pinecone
results = index.query(
    vector=query_embedding,
    filter={"category": {"$eq": "technology"}, "year": {"$gte": 2024}},
    top_k=10
)

# Qdrant
from qdrant_client.models import Filter, FieldCondition, MatchValue

results = client.search(
    collection_name="docs",
    query_vector=query_embedding,
    query_filter=Filter(
        must=[FieldCondition(key="category", match=MatchValue(value="technology"))]
    )
)

Chunking Strategies

Documents too long for embedding models need chunking:

from langchain.text_splitter import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=512,
    chunk_overlap=50,
    separators=["\n\n", "\n", ". ", " "]
)

chunks = splitter.split_text(long_document)

Chunking best practices:

  • 256-512 tokens for precise retrieval
  • 512-1024 tokens for more context
  • Always include overlap
  • Preserve semantic boundaries (paragraphs, sentences)

Re-ranking

Improve results with a re-ranker:

from sentence_transformers import CrossEncoder

reranker = CrossEncoder("cross-encoder/ms-marco-MiniLM-L-6-v2")

def search_with_rerank(query: str, top_k: int = 5):
    # Get initial results (fetch more than needed)
    initial_results = search(query, top_k=20)
    
    # Re-rank
    pairs = [[query, r.payload["text"]] for r in initial_results]
    scores = reranker.predict(pairs)
    
    # Sort by reranker score
    reranked = sorted(zip(initial_results, scores), key=lambda x: x[1], reverse=True)
    return [r for r, s in reranked[:top_k]]

Performance Optimization

Indexing Parameters

For HNSW (most common):

ParameterLowMediumHigh
ef_construction128256512
M163264
ef_search64128256

Higher values = better accuracy, slower speed, more memory.

Batch Operations

Always batch inserts:

# Good - batch insert
index.upsert(vectors=batch_of_1000)

# Bad - one at a time
for v in vectors:
    index.upsert(vectors=[v])

Dimension Reduction

Smaller dimensions = faster search:

# OpenAI supports dimension reduction
response = client.embeddings.create(
    model="text-embedding-3-large",
    input=text,
    dimensions=1024  # Reduced from 3072
)

Pricing Comparison (2026)

ServiceFree TierStarterProduction
Pinecone100K vectors$70/mo (1M)Custom
Weaviate Cloud100K vectors$25/moCustom
Qdrant Cloud1GB$25/moCustom
Zilliz (Milvus)100K vectors$65/moCustom
pgvectorSelf-hostSelf-hostSelf-host

When to Use What

pgvector: You have Postgres, <1M vectors, simple queries

Pinecone: Production RAG, managed solution, need reliability

Qdrant: High performance, good filtering, open source preference

Weaviate: Hybrid search, multi-modal, GraphQL API

Milvus: Massive scale (billions of vectors), distributed

Chroma: Prototyping, local development, quick experiments

Conclusion

Vector databases are the foundation of modern AI applications. Whether you’re building a chatbot with memory, semantic search, or recommendation engine, you’ll need to understand embeddings and vector search.

Quick start path:

  1. Start with pgvector or Chroma for prototyping
  2. Learn chunking and embedding strategies
  3. Move to managed (Pinecone/Qdrant Cloud) for production
  4. Add hybrid search and re-ranking as needed

The technology is mature enough for production. The main decisions are hosting (managed vs self-hosted) and scale (millions vs billions of vectors).


Building with vectors? The nearest neighbor is closer than you think.

이 글이 도움이 되셨다면 공감 및 광고 클릭을 부탁드립니다 :)