Name: Vector Databases
Author: melodic-software

スキルを検索.../

Vector Databases | Skills Pool

Database	Strengths	Limitations	Best For
Milvus	Feature-rich, scalable, GPU support	Operational complexity	Large-scale production
Qdrant	Rust performance, filtering, easy	Smaller ecosystem	Performance-focused
Weaviate	Modules, semantic, hybrid	Memory usage	Knowledge applications
Chroma	Simple, Python-native	Limited scale	Development, prototyping
pgvector	PostgreSQL extension	Performance limits	Postgres shops
FAISS	Library, not DB, fastest	No persistence, no filtering	Research, embedded

Need managed, don't want operations?
├── Yes → Pinecone (simplest) or Weaviate Cloud
└── No (self-hosted)
    └── Already using PostgreSQL?
        ├── Yes, <1M vectors → pgvector
        └── No
            └── Need maximum performance at scale?
                ├── Yes → Milvus or Qdrant
                └── No
                    └── Prototyping/development?
                        ├── Yes → Chroma
                        └── No → Qdrant (balanced choice)

Exact KNN:
• Search ALL vectors
• O(n) time complexity
• Perfect accuracy
• Impractical at scale

Approximate NN (ANN):
• Search SUBSET of vectors
• O(log n) to O(1) complexity
• Near-perfect accuracy
• Practical at any scale

Layer 3: ○───────────────────────○  (sparse, long connections)
          │                       │
Layer 2: ○───○───────○───────○───○  (medium density)
          │   │       │       │   │
Layer 1: ○─○─○─○─○─○─○─○─○─○─○─○─○  (denser)
          │││││││││││││││││││││││
Layer 0: ○○○○○○○○○○○○○○○○○○○○○○○○○  (all vectors)

Search: Start at top layer, greedily descend
• Fast: O(log n) search time
• High recall: >95% typically
• Memory: Extra graph storage

Clustering Phase:
┌─────────────────────────────────────────┐
│     Cluster vectors into K centroids    │
│                                         │
│    ●         ●         ●         ●     │
│   /│\       /│\       /│\       /│\    │
│  ○○○○○     ○○○○○     ○○○○○     ○○○○○   │
│ Cluster 1  Cluster 2 Cluster 3 Cluster 4│
└─────────────────────────────────────────┘

Search Phase:
1. Find nprobe nearest centroids
2. Search only those clusters
3. Much faster than exhaustive

Original Vector (128 dim):
[0.1, 0.2, ..., 0.9]  (128 × 4 bytes = 512 bytes)

PQ Compressed (8 subvectors, 8-bit codes):
[23, 45, 12, 89, 56, 34, 78, 90]  (8 bytes)

Memory reduction: 64x
Accuracy trade-off: ~5% recall drop

< 100K vectors:
└── Flat index (exact search is fast enough)

100K - 1M vectors:
└── HNSW (best recall/speed trade-off)

1M - 100M vectors:
├── Memory available → HNSW
└── Memory constrained → IVF-PQ or HNSW+PQ

> 100M vectors:
└── Sharded IVF-PQ or distributed HNSW

Metric	Formula	Range	Best For
Cosine Similarity	`A·B / (\|\|A\|\| \|\|B\|\|)`	[-1, 1]	Normalized embeddings
Dot Product	`A·B`	(-∞, ∞)	When magnitude matters
Euclidean (L2)	`√Σ(A-B)²`	[0, ∞)	Absolute distances
Manhattan (L1)	`Σ\|A-B\|`	[0, ∞)	High-dimensional sparse

Embeddings pre-normalized (unit vectors)?
├── Yes → Cosine = Dot Product (use Dot, faster)
└── No
    └── Magnitude meaningful?
        ├── Yes → Dot Product
        └── No → Cosine Similarity

Note: Most embedding models output normalized vectors
      → Dot product is usually the best choice

Pre-filtering (Filter → Search):
┌─────────────────────────────────────────┐
│ 1. Apply metadata filter               │
│    (category = "electronics")           │
│    Result: 10K of 1M vectors           │
│                                         │
│ 2. Vector search on 10K vectors        │
│    Much faster, guaranteed filter match │
└─────────────────────────────────────────┘

Post-filtering (Search → Filter):
┌─────────────────────────────────────────┐
│ 1. Vector search on 1M vectors         │
│    Return top-1000                      │
│                                         │
│ 2. Apply metadata filter               │
│    May return < K results!             │
└─────────────────────────────────────────┘

Query: "wireless headphones under $100"
           │
     ┌─────┴─────┐
     ▼           ▼
 ┌───────┐  ┌───────┐
 │Vector │  │Filter │
 │Search │  │ Build │
 │"wire- │  │price  │
 │less   │  │< 100  │
 │head-  │  │       │
 │phones"│  │       │
 └───────┘  └───────┘
     │           │
     └─────┬─────┘
           ▼
    ┌───────────┐
    │  Combine  │
    │  Results  │
    └───────────┘

Metadata Type	Index Strategy	Query Example
Categorical	Bitmap/hash index	category = "books"
Numeric range	B-tree	price BETWEEN 10 AND 50
Keyword search	Inverted index	tags CONTAINS "sale"
Geospatial	R-tree/geohash	location NEAR (lat, lng)

Naive Sharding (by ID):
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ IDs 0-N │ │IDs N-2N │ │IDs 2N-3N│
└─────────┘ └─────────┘ └─────────┘
Query → Search ALL shards → Merge results

Semantic Sharding (by cluster):
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Shard 1 │ │ Shard 2 │ │ Shard 3 │
│ Tech    │ │ Health  │ │ Finance │
│ docs    │ │ docs    │ │ docs    │
└─────────┘ └─────────┘ └─────────┘
Query → Route to relevant shard(s) → Faster!

┌─────────────────────────────────────────┐
│              Load Balancer              │
└─────────────────────────────────────────┘
         │           │           │
         ▼           ▼           ▼
    ┌─────────┐ ┌─────────┐ ┌─────────┐
    │Replica 1│ │Replica 2│ │Replica 3│
    │  (Read) │ │  (Read) │ │  (Read) │
    └─────────┘ └─────────┘ └─────────┘
         │           │           │
         └───────────┼───────────┘
                     │
                ┌─────────┐
                │ Primary │
                │ (Write) │
                └─────────┘

Optimization	Description	Impact
Batch insertion	Insert in batches of 1K-10K	10x faster
Parallel build	Multi-threaded index construction	2-4x faster
Incremental index	Add to existing index	Avoids rebuild
GPU acceleration	Use GPU for training (IVF)	10-100x faster

Optimization	Description	Impact
Warm cache	Keep index in memory	10x latency reduction
Query batching	Batch similar queries	Higher throughput
Reduce dimensions	PCA, random projection	2-4x faster
Early termination	Stop when "good enough"	Lower latency

Memory per vector:
┌────────────────────────────────────────┐
│ 1536 dims × 4 bytes = 6KB per vector   │
│                                        │
│ 1M vectors:                            │
│   Raw: 6GB                             │
│   + HNSW graph: +2-4GB (M-dependent)   │
│   = 8-10GB total                       │
│                                        │
│ With PQ (64 subquantizers):            │
│   1M vectors: ~64MB                    │
│   = 100x reduction                     │
└────────────────────────────────────────┘

Metric	Description	Alert Threshold
Query latency p99	99th percentile latency	> 100ms
Recall	Search accuracy	< 90%
QPS	Queries per second	Capacity dependent
Memory usage	Index memory	> 80%
Index freshness	Time since last update	Domain dependent

┌─────────────────────────────────────────┐
│        Index Maintenance Tasks          │
├─────────────────────────────────────────┤
│ • Compaction: Merge small segments      │
│ • Reindex: Rebuild degraded index       │
│ • Vacuum: Remove deleted vectors        │
│ • Optimize: Tune parameters             │
│                                         │
│ Schedule during low-traffic periods     │
└─────────────────────────────────────────┘

Option 1: Namespace/Collection per tenant
┌─────────────────────────────────────────┐
│ tenant_1_collection                     │
│ tenant_2_collection                     │
│ tenant_3_collection                     │
└─────────────────────────────────────────┘
Pro: Complete isolation
Con: Many indexes, operational overhead

Option 2: Single collection + tenant filter
┌─────────────────────────────────────────┐
│ shared_collection                       │
│   metadata: { tenant_id: "..." }        │
│   Pre-filter by tenant_id               │
└─────────────────────────────────────────┘
Pro: Simpler operations
Con: Requires efficient filtering

Write Path:
┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Write     │    │   Buffer    │    │   Merge     │
│   Request   │───▶│   (Memory)  │───▶│   to Index  │
└─────────────┘    └─────────────┘    └─────────────┘

Strategy:
1. Buffer writes in memory
2. Periodically merge to main index
3. Search: main index + buffer
4. Compact periodically

Version 1 embeddings ──┐
                       │
Version 2 embeddings ──┼──▶ Parallel indexes during migration
                       │
                       │    ┌─────────────────────┐
                       └───▶│ Gradual reindexing  │
                            │ Blue-green switch   │
                            └─────────────────────┘

Cost = (vectors × dimensions × bytes × replication) / GB × $/GB/month

Example:
10M vectors × 1536 dims × 4 bytes × 3 replicas = 184 GB
At $0.10/GB/month = $18.40/month storage

Note: Memory (for serving) costs more than storage

Factors:
• QPS (queries per second)
• Latency requirements
• Index type (HNSW needs more RAM)
• Filtering complexity

Rule of thumb:
• 1M vectors, HNSW, <50ms latency: 16GB RAM
• 10M vectors, HNSW, <50ms latency: 64-128GB RAM
• 100M vectors: Distributed system required

Parameter	Description	Trade-off
`M`	Connections per node	Memory vs. recall
`ef_construction`	Build-time search width	Build time vs. recall
`ef_search`	Query-time search width	Latency vs. recall

Parameter	Description	Trade-off
`nlist`	Number of clusters	Build time vs. search quality
`nprobe`	Clusters to search	Latency vs. recall

Algorithm	Search Speed	Memory	Build Time	Recall
Flat/Brute	Slow (O(n))	Low	None	100%
IVF	Fast	Low	Medium	90-95%
IVF-PQ	Very fast	Very low	Medium	85-92%
HNSW	Very fast	High	Slow	95-99%
HNSW+PQ	Very fast	Medium	Slow	90-95%

Scale (vectors)	Architecture	Replication
< 1M	Single node	Optional
1-10M	Single node, more RAM	For HA
10-100M	Sharded, few nodes	Required
100M-1B	Sharded, many nodes	Required
> 1B	Sharded + tiered	Required

Strategy	Description	RPO/RTO
Snapshots	Periodic full backup	Hours
WAL replication	Write-ahead log streaming	Minutes
Real-time sync	Synchronous replication	Seconds

Vector Databases

When to Use This Skill

Vector Database Comparison

Managed Services

Vector Databases

When to Use This Skill

Vector Database Comparison

Managed Services

Self-Hosted Options

Selection Decision Tree

ANN Algorithms

Algorithm Overview

HNSW (Hierarchical Navigable Small World)

IVF (Inverted File Index)

IVF-PQ (Product Quantization)

Algorithm Comparison

When to Use Which

Distance Metrics

Common Metrics

Metric Selection

Filtering and Hybrid Search

Pre-filtering vs Post-filtering

Hybrid Search Architecture

Metadata Index Design

Scaling Strategies

Sharding Approaches

Replication

Scaling Decision Matrix

Performance Optimization

Index Build Optimization

Query Optimization

Memory Optimization

Operational Considerations

Backup and Recovery

Monitoring Metrics

Index Maintenance

Common Patterns

Multi-Tenant Vector Search

Real-Time Updates

Embedding Versioning

Cost Estimation

Storage Costs

Compute Costs

Related Skills

Version History

Last Updated

Vector Index Tuning

Azure Resource Manager Redis Dotnet

Redis Expert

Elasticsearch

Cache Expert

Abp Mongodb