Vector Databases | Skills Pool
Vector Databases Vector database selection, embedding storage, approximate nearest neighbor (ANN) algorithms, and vector search optimization. Use when choosing vector stores, designing semantic search, or optimizing similarity search performance.
melodic-software 58 スター 2025/12/27 When to Use This Skill
Use this skill when:
Choosing between vector database options
Designing semantic/similarity search systems
Optimizing vector search performance
Understanding ANN algorithm trade-offs
Scaling vector search infrastructure
Implementing hybrid search (vectors + filters)
Keywords: vector database, embeddings, vector search, similarity search, ANN, approximate nearest neighbor, HNSW, IVF, FAISS, Pinecone, Weaviate, Milvus, Qdrant, Chroma, pgvector, cosine similarity, semantic search
Vector Database Comparison
Managed Services
Database Strengths Limitations Best For Pinecone Fully managed, easy scaling, enterprise Vendor lock-in, cost at scale Enterprise production
クイックインストール
Vector Databases npx skillvault add melodic-software/melodic-software-claude-code-plugins-plugins-systems-design-skills-vector-databases-skill-md
スター 58
更新日 2025/12/27
職業 GraphQL, hybrid search, modules
Zilliz Cloud Milvus-based, high performance Learning curve High-scale production
MongoDB Atlas Vector Existing MongoDB users Newer feature MongoDB shops
Elastic Vector Existing Elastic stack Resource heavy Search platforms
Self-Hosted Options Database Strengths Limitations Best For Milvus Feature-rich, scalable, GPU support Operational complexity Large-scale production Qdrant Rust performance, filtering, easy Smaller ecosystem Performance-focused Weaviate Modules, semantic, hybrid Memory usage Knowledge applications Chroma Simple, Python-native Limited scale Development, prototyping pgvector PostgreSQL extension Performance limits Postgres shops FAISS Library, not DB, fastest No persistence, no filtering Research, embedded
Selection Decision Tree Need managed, don't want operations?
├── Yes → Pinecone (simplest) or Weaviate Cloud
└── No (self-hosted)
└── Already using PostgreSQL?
├── Yes, <1M vectors → pgvector
└── No
└── Need maximum performance at scale?
├── Yes → Milvus or Qdrant
└── No
└── Prototyping/development?
├── Yes → Chroma
└── No → Qdrant (balanced choice)
ANN Algorithms
Algorithm Overview Exact KNN:
• Search ALL vectors
• O(n) time complexity
• Perfect accuracy
• Impractical at scale
Approximate NN (ANN):
• Search SUBSET of vectors
• O(log n) to O(1) complexity
• Near-perfect accuracy
• Practical at any scale
HNSW (Hierarchical Navigable Small World) Layer 3: ○───────────────────────○ (sparse, long connections)
│ │
Layer 2: ○───○───────○───────○───○ (medium density)
│ │ │ │ │
Layer 1: ○─○─○─○─○─○─○─○─○─○─○─○─○ (denser)
│││││││││││││││││││││││
Layer 0: ○○○○○○○○○○○○○○○○○○○○○○○○○ (all vectors)
Search: Start at top layer, greedily descend
• Fast: O(log n) search time
• High recall: >95% typically
• Memory: Extra graph storage
Parameter Description Trade-off MConnections per node Memory vs. recall ef_constructionBuild-time search width Build time vs. recall ef_searchQuery-time search width Latency vs. recall
IVF (Inverted File Index) Clustering Phase:
┌─────────────────────────────────────────┐
│ Cluster vectors into K centroids │
│ │
│ ● ● ● ● │
│ /│\ /│\ /│\ /│\ │
│ ○○○○○ ○○○○○ ○○○○○ ○○○○○ │
│ Cluster 1 Cluster 2 Cluster 3 Cluster 4│
└─────────────────────────────────────────┘
Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive
Parameter Description Trade-off nlistNumber of clusters Build time vs. search quality nprobeClusters to search Latency vs. recall
IVF-PQ (Product Quantization) Original Vector (128 dim):
[0.1, 0.2, ..., 0.9] (128 × 4 bytes = 512 bytes)
PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90] (8 bytes)
Memory reduction: 64x
Accuracy trade-off: ~5% recall drop
Algorithm Comparison Algorithm Search Speed Memory Build Time Recall Flat/Brute Slow (O(n)) Low None 100% IVF Fast Low Medium 90-95% IVF-PQ Very fast Very low Medium 85-92% HNSW Very fast High Slow 95-99% HNSW+PQ Very fast Medium Slow 90-95%
When to Use Which < 100K vectors:
└── Flat index (exact search is fast enough)
100K - 1M vectors:
└── HNSW (best recall/speed trade-off)
1M - 100M vectors:
├── Memory available → HNSW
└── Memory constrained → IVF-PQ or HNSW+PQ
> 100M vectors:
└── Sharded IVF-PQ or distributed HNSW
Distance Metrics
Common Metrics Metric Formula Range Best For Cosine Similarity A·B / (||A|| ||B||)[-1, 1] Normalized embeddings Dot Product A·B(-∞, ∞) When magnitude matters Euclidean (L2) √Σ(A-B)²[0, ∞) Absolute distances Manhattan (L1) Σ|A-B|[0, ∞) High-dimensional sparse
Metric Selection Embeddings pre-normalized (unit vectors)?
├── Yes → Cosine = Dot Product (use Dot, faster)
└── No
└── Magnitude meaningful?
├── Yes → Dot Product
└── No → Cosine Similarity
Note: Most embedding models output normalized vectors
→ Dot product is usually the best choice
Filtering and Hybrid Search
Pre-filtering vs Post-filtering Pre-filtering (Filter → Search):
┌─────────────────────────────────────────┐
│ 1. Apply metadata filter │
│ (category = "electronics") │
│ Result: 10K of 1M vectors │
│ │
│ 2. Vector search on 10K vectors │
│ Much faster, guaranteed filter match │
└─────────────────────────────────────────┘
Post-filtering (Search → Filter):
┌─────────────────────────────────────────┐
│ 1. Vector search on 1M vectors │
│ Return top-1000 │
│ │
│ 2. Apply metadata filter │
│ May return < K results! │
└─────────────────────────────────────────┘
Hybrid Search Architecture Query: "wireless headphones under $100"
│
┌─────┴─────┐
▼ ▼
┌───────┐ ┌───────┐
│Vector │ │Filter │
│Search │ │ Build │
│"wire- │ │price │
│less │ │< 100 │
│head- │ │ │
│phones"│ │ │
└───────┘ └───────┘
│ │
└─────┬─────┘
▼
┌───────────┐
│ Combine │
│ Results │
└───────────┘
Metadata Type Index Strategy Query Example Categorical Bitmap/hash index category = "books" Numeric range B-tree price BETWEEN 10 AND 50 Keyword search Inverted index tags CONTAINS "sale" Geospatial R-tree/geohash location NEAR (lat, lng)
Scaling Strategies
Sharding Approaches Naive Sharding (by ID):
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ IDs 0-N │ │IDs N-2N │ │IDs 2N-3N│
└─────────┘ └─────────┘ └─────────┘
Query → Search ALL shards → Merge results
Semantic Sharding (by cluster):
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Tech │ │ Health │ │ Finance │
│ docs │ │ docs │ │ docs │
└─────────┘ └─────────┘ └─────────┘
Query → Route to relevant shard(s) → Faster!
Replication ┌─────────────────────────────────────────┐
│ Load Balancer │
└─────────────────────────────────────────┘
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│Replica 1│ │Replica 2│ │Replica 3│
│ (Read) │ │ (Read) │ │ (Read) │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└───────────┼───────────┘
│
┌─────────┐
│ Primary │
│ (Write) │
└─────────┘
Scaling Decision Matrix Scale (vectors) Architecture Replication < 1M Single node Optional 1-10M Single node, more RAM For HA 10-100M Sharded, few nodes Required 100M-1B Sharded, many nodes Required > 1B Sharded + tiered Required
Index Build Optimization Optimization Description Impact Batch insertion Insert in batches of 1K-10K 10x faster Parallel build Multi-threaded index construction 2-4x faster Incremental index Add to existing index Avoids rebuild GPU acceleration Use GPU for training (IVF) 10-100x faster
Query Optimization Optimization Description Impact Warm cache Keep index in memory 10x latency reduction Query batching Batch similar queries Higher throughput Reduce dimensions PCA, random projection 2-4x faster Early termination Stop when "good enough" Lower latency
Memory Optimization Memory per vector:
┌────────────────────────────────────────┐
│ 1536 dims × 4 bytes = 6KB per vector │
│ │
│ 1M vectors: │
│ Raw: 6GB │
│ + HNSW graph: +2-4GB (M-dependent) │
│ = 8-10GB total │
│ │
│ With PQ (64 subquantizers): │
│ 1M vectors: ~64MB │
│ = 100x reduction │
└────────────────────────────────────────┘
Operational Considerations
Backup and Recovery Strategy Description RPO/RTO Snapshots Periodic full backup Hours WAL replication Write-ahead log streaming Minutes Real-time sync Synchronous replication Seconds
Monitoring Metrics Metric Description Alert Threshold Query latency p99 99th percentile latency > 100ms Recall Search accuracy < 90% QPS Queries per second Capacity dependent Memory usage Index memory > 80% Index freshness Time since last update Domain dependent
Index Maintenance ┌─────────────────────────────────────────┐
│ Index Maintenance Tasks │
├─────────────────────────────────────────┤
│ • Compaction: Merge small segments │
│ • Reindex: Rebuild degraded index │
│ • Vacuum: Remove deleted vectors │
│ • Optimize: Tune parameters │
│ │
│ Schedule during low-traffic periods │
└─────────────────────────────────────────┘
Common Patterns
Multi-Tenant Vector Search Option 1: Namespace/Collection per tenant
┌─────────────────────────────────────────┐
│ tenant_1_collection │
│ tenant_2_collection │
│ tenant_3_collection │
└─────────────────────────────────────────┘
Pro: Complete isolation
Con: Many indexes, operational overhead
Option 2: Single collection + tenant filter
┌─────────────────────────────────────────┐
│ shared_collection │
│ metadata: { tenant_id: "..." } │
│ Pre-filter by tenant_id │
└─────────────────────────────────────────┘
Pro: Simpler operations
Con: Requires efficient filtering
Real-Time Updates Write Path:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Write │ │ Buffer │ │ Merge │
│ Request │───▶│ (Memory) │───▶│ to Index │
└─────────────┘ └─────────────┘ └─────────────┘
Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically
Embedding Versioning Version 1 embeddings ──┐
│
Version 2 embeddings ──┼──▶ Parallel indexes during migration
│
│ ┌─────────────────────┐
└───▶│ Gradual reindexing │
│ Blue-green switch │
└─────────────────────┘
Cost Estimation
Storage Costs Cost = (vectors × dimensions × bytes × replication) / GB × $/GB/month
Example:
10M vectors × 1536 dims × 4 bytes × 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage
Note: Memory (for serving) costs more than storage
Compute Costs Factors:
• QPS (queries per second)
• Latency requirements
• Index type (HNSW needs more RAM)
• Filtering complexity
Rule of thumb:
• 1M vectors, HNSW, <50ms latency: 16GB RAM
• 10M vectors, HNSW, <50ms latency: 64-128GB RAM
• 100M vectors: Distributed system required
rag-architecture - Using vector databases in RAG systems
llm-serving-patterns - LLM inference with vector retrieval
ml-system-design - End-to-end ML pipeline design
estimation-techniques - Capacity planning for vector systems
Version History
v1.0.0 (2025-12-26): Initial release - Vector database patterns for systems design
Last Updated 02
Vector Database Comparison