Design MongoDB architectures with document modeling, indexing (ESR rule), sharding, aggregation pipelines, replica sets, and WiredTiger tuning.
Invoke this skill when designing, reviewing, or optimizing MongoDB database architectures for applications requiring flexible schema, document-based data models, horizontal scalability, or high availability.
Trigger Conditions:
Out of Scope:
NOW_ET using NIST/time.gov semantics (America/New_York, ISO-8601).Abort Conditions:
Use Case: Fast path for common scenarios (80% of requests).
Steps:
{status: "active"}).sort({created_at: -1})){age: {$gt: 18}})db.users.createIndex({status: 1, created_at: -1, age: 1})Output: Schema design decision, 3-5 index recommendations with ESR justification, top 3 bottlenecks.
Use Case: Comprehensive architecture for production deployments.
Steps:
Embedding vs Referencing Table:
| Criteria | Embed | Reference |
|---|---|---|
| Relationship | 1-to-1, 1-to-many (low cardinality) | many-to-many, 1-to-many (high cardinality) |
| Access Pattern | Always queried together | Often queried independently |
| Update Frequency | Infrequent updates | Frequent updates to related data |
| Data Growth | Bounded, predictable | Unbounded, grows over time |
| Document Size | <16 MB total | Risk of exceeding 16 MB limit |
| Atomic Writes | Need atomicity across related data | Atomicity not required |
Schema Validation (MongoDB 8.0):
db.createCollection("users", {
validator: {
$jsonSchema: {
bsonType: "object",
required: ["email", "created_at"],
properties: {
email: {bsonType: "string", pattern: "^.+@.+$"},
age: {bsonType: "int", minimum: 0, maximum: 150},
created_at: {bsonType: "date"}
}
}
}
})
Index Types and Use Cases:
| Index Type | Use Case | Example |
|---|---|---|
| Single Field | Equality or range on one field | db.users.createIndex({email: 1}) |
| Compound (ESR) | Multi-field queries (Equality, Sort, Range) | db.orders.createIndex({status: 1, created_at: -1, total: 1}) |
| Multikey | Arrays (e.g., tags, categories) | db.products.createIndex({tags: 1}) |
| Text | Full-text search | db.posts.createIndex({content: "text"}) |
| Geospatial | Location-based queries (2dsphere) | db.locations.createIndex({coordinates: "2dsphere"}) |
| Hashed | Sharding, equality-only queries | db.sessions.createIndex({session_id: "hashed"}) |
| Wildcard | Flexible schema with many fields | db.events.createIndex({"metadata.$**": 1}) |
| Partial | Index subset of documents | db.users.createIndex({last_login: 1}, {partialFilterExpression: {active: true}}) |
Covered Query Optimization:
// Covered query: all fields in index, no document access
db.orders.createIndex({user_id: 1, status: 1, total: 1})
db.orders.find({user_id: 12345, status: "shipped"}, {_id: 0, user_id: 1, status: 1, total: 1})
// explain() shows: totalDocsExamined: 0 (covered by index)
ESR Rule Application:
// Query: Find active users, sort by created_at descending, filter age > 18
db.users.find({status: "active", age: {$gt: 18}}).sort({created_at: -1})
// Correct index (ESR):
// Equality: status (exact match)
// Sort: created_at (sort field)
// Range: age (range filter)
db.users.createIndex({status: 1, created_at: -1, age: 1})
Shard Key Selection (MongoDB 8.0):
| Shard Key Type | Use Case | Pros | Cons |
|---|---|---|---|
| Hashed | Monotonically increasing IDs, even distribution | Uniform write distribution, no hotspots | Cannot use range queries efficiently on shard key |
| Ranged | Time-series data, natural ordering | Efficient range queries, targeted reads | Risk of hotspots if monotonic (e.g., timestamp) |
| Compound | Multi-tenant apps, complex access patterns | Balances distribution and query targeting | More complex to design |
Hashed Sharding Example:
// Enable sharding on database
sh.enableSharding("myapp")
// Shard collection with hashed shard key
sh.shardCollection("myapp.users", {user_id: "hashed"})
// MongoDB 8.0: Move unsharded collection to specific shard
db.adminCommand({moveCollection: "myapp.analytics", toShard: "shard02"})
Ranged Sharding Example (Time-Series):
// Shard by timestamp for time-series data
sh.shardCollection("myapp.events", {timestamp: 1})
// Zone sharding (MongoDB 8.0): Route data by date ranges to specific shards
sh.addShardToZone("shard01", "recent")
sh.updateZoneKeyRange("myapp.events", {timestamp: ISODate("2025-01-01")}, {timestamp: MaxKey}, "recent")
Avoid Scatter-Gather Queries:
db.users.find({user_id: 12345}) → targets single sharddb.users.find({email: "[email protected]"}) → scatter-gather across all shards (slow)Pipeline Stages (Execution Order Matters):
// Optimized aggregation: $match early, $project late, use indexes
db.orders.aggregate([
// Stage 1: $match FIRST (uses index, reduces documents)
{$match: {status: "shipped", created_at: {$gte: ISODate("2025-01-01")}}},
// Stage 2: $sort (uses index if compound index exists)
{$sort: {created_at: -1}},
// Stage 3: $lookup (join with users collection)
{$lookup: {
from: "users",
localField: "user_id",
foreignField: "_id",
as: "user_details"
}},
// Stage 4: $group (aggregation after filtering)
{$group: {
_id: "$user_id",
total_spent: {$sum: "$total"},
order_count: {$sum: 1}
}},
// Stage 5: $project LAST (reduce network transfer)
{$project: {_id: 1, total_spent: 1, order_count: 1}}
])
Index Sort Optimization:
{status: 1, created_at: -1} index exists, $match + $sort uses index (no in-memory sort).allowDiskUse: true).Sharded Aggregation (MongoDB 8.0):
$match early to enable shard targeting.Standard 3-Member Replica Set:
rs.initiate({
_id: "myReplicaSet",
members: [
{_id: 0, host: "mongo1.example.com:27017", priority: 2}, // Primary (high priority)
{_id: 1, host: "mongo2.example.com:27017", priority: 1}, // Secondary
{_id: 2, host: "mongo3.example.com:27017", arbiterOnly: true} // Arbiter (no data)
]
})
Read Preferences:
primary (default): All reads from primary (strong consistency).primaryPreferred: Read from primary, fallback to secondary if unavailable.secondary: Read from secondary (may read stale data).secondaryPreferred: Read from secondary, fallback to primary.nearest: Read from lowest-latency member.Write Concerns:
w: 1 (default): Acknowledge after primary write (fast, risk of data loss on primary failure).w: "majority": Acknowledge after majority of replica set members (slower, durable).w: 3: Acknowledge after 3 members (explicit count).j: true: Wait for write to journal (disk) before acknowledging.MongoDB 8.0 Replica Set Enhancements:
WiredTiger Cache Sizing (MongoDB 8.0):
# Default: 50% of RAM - 1 GB