As a Senior specialist of documentDB and MongoDB optimization, you have been tasked to analyze the MongoDB usage in the API and its Kafka Consumer. Your goal is to identify all the CRUD queries that are made to MongoDB in both projects, and propose optimizations.
🚀 Phased Execution Strategy
This skill uses automatic checkpointing with session memory to handle the heavy analysis workload without timeouts.
Execution Workflow
The analysis is broken into 4 independent phases (3-5 minutes each). Each phase:
✅ Checks for previous phase data in session memory
📊 Processes its specific scope
💾 Saves results to /memories/session/mongodb-analysis-{phase}.json
📝 Shows a summary of findings
➡️ Indicates next phase to run
How to use:
Run: @workspace /fc-optimize-mongodb (or with phase=1)
Review phase summary
Continue: @workspace /fc-optimize-mongodb phase=2
関連 Skill
Repeat until Phase 4 generates the final report
Recovery: If interrupted, just restart from the last completed phase. All previous work is preserved in session memory.
Report Generation
Duration: 1-2 min
Focus: Final markdown report + PlantUML diagrams
Output: mongodbAnalysis.md + .puml files
Scope and Source of Truth
CRITICAL Analyse all the files ONLY in the directories api_python and kafka-consumer.
But ignore files in api_python/admin/database/migrations/.
These 2 projects are separated but they use the same MongoDB database but with different beanie models definitions.
api_python is the project that is responsible for the database schema and migrations. While kafka-consumer is responsible for consuming the Kafka events and updating the MongoDB collections accordingly. So consider api_python as source of truth.
You will analyse current mongodb schema (api_python/admin/database/indexes/definition.py, api_python/odm/models.py).
CRITICAL Don't update source code, just analyze and generate intermediate results in /memories/session/ and final report in docs/ai/{date:YYYY-mm-dd}-mongodbAnalysis/mongodbAnalysis.md.
CRITICAL Don't invent any query, just list the queries accurately as they are in the codebase.
Pre-Analysis Checks (Phase 1 only)
If folder kafka-consumer doesn't exist, use askQuestions to propose creating symlink
Use askQuestions to ask if user has updated branches before starting
📋 Phase 1: Discovery & Inventory (2-3 min)
Objectives
Capture repository metadata (branch, commit hash)
Load collection inventory from index definitions
Build complete query inventory from both projects
Identify obsolete collections (collections with indexes but no queries)
Actions
Read api_python/admin/database/indexes/definition.py for collection list
Read api_python/odm/models.py for schema
Parallel grep search for all Beanie queries in both projects:
Partial Indexes (instead of sparse for complex filters)
Sparse Indexes (for optional fields)
Partial indexes with filter expressions (instead of indexing large arrays)
TTL Indexes (for automatic expiration)
5.4 Index Report format
For each index change suggestion, provide:
Current index definition
Proposed index definition
Rationale for change
Migration approach (if needed)
beanies code snippet
mongosh code snippet
Production verification query (explain())
6. Query Optimization
Propose optimizations to the queries themselves
Check for N+1 query patterns
Suggest batch operations where applicable
Recommend aggregation pipeline improvements
7. Production Verification Strategy
CRITICAL For each optimization recommendation, categorize:
🔴 CRITICAL - Full collection scan, missing index, immediate fix
⚠️ PARTIAL - Using compound index prefix, may be acceptable
🪛 TO CHECK - Needs production explain() verification before deciding
For each 🪛 TO CHECK item, provide exact explain() query:
// Example for music_urn single-field query
db.musics_instances.find({
music_urn: "urn:music:xyz123"
}).explain("executionStats")
// Check for:
// - totalDocsExamined vs nReturned ratio
// - executionTimeMillis < 100ms threshold
// - indexName used by query planner
docs/ai/{date:YYYY-mm-dd}-mongodbAnalysis/mongodbProposedSchema.puml - With recommended changes
13. Code Snippets
For each suggestion, provide:
Current code/index definition
Proposed code/index definition
Migration approach (if needed)
Production verification query (explain())
14. Consider These Specific Patterns
instrument/Array Field Indexing:
Check music.instrument and music.instrument_structure indexing
Recommend partial indexes over sparse indexes for existence checks
Compound Index Effectiveness:
Verify queries using only first field of compound index
Flag as 🪛 TO CHECK for production verification
Delete Operations:
Check if delete queries on compound index prefix need dedicated index
Example: user_musics_id deletes using compound unique index
Update Many Operations:
Ensure indexed fields for Kafka Consumer update_many operations
Example: music_urn updates in recommendation collections
Projection Analysis:
Cross-reference query .project() calls with index fields
Identify covered query opportunities
Priority Matrix
Use this matrix to prioritize findings:
Severity
Criteria
Action
Example
🔴 CRITICAL
Criteria: Full collection scan on frequent operation, missing index on high-volume update/delete
Action: Immediate fix required
Example: Missing music_urn index on Kafka update_many
🔥 HIGH
Criteria: Schema inconsistency, race conditions, data integrity risk
Action: Fix within sprint
Example: Kafka consumer model drift missing created_at
⚠️ MEDIUM
Criteria: Using compound index prefix, suboptimal but functional
Action: Optimize if bottleneck
Example: Single-field query on compound index
📀 OPTIMIZATION
Criteria: Recommended optimization
Action: Optimize if bottleneck
Example: Index projection improvement
🪛 TO CHECK
Criteria: Assumption needs production verification
Action: Test with explain()
Example: Compound index effectiveness
ℹ️ LOW
Criteria: Collection scan on infrequent admin task
Action: Monitor, fix if becomes issue
Example: Translation count query
🗑️ OBSOLETE
Criteria: Unused collection or index
Action: Safe to remove
Example: Collection with no queries
Anti-Patterns to Avoid
DO NOT:
❌ Assume compound index prefix is always suboptimal (flag as 🪛 TO CHECK instead)
❌ Recommend indexes for one-time migration queries
❌ Suggest indexes for infrequent admin tasks (collection scan acceptable)
❌ Mark as "missing index" without checking if query is actually used
❌ Accept sparse index on large arrays without considering partial index alternative
❌ Ignore collections just because they seem unused (verify with query count)
❌ Provide generic recommendations (always include specific code snippets)
❌ Skip production verification for assumptions (always provide explain() queries)
DO:
✅ Cross-reference every collection against actual query usage
✅ Verify timestamp and correlation_id consistency
✅ Flag single-field queries on compound indexes as 🪛 TO CHECK
✅ Analyze instrument/array indexing for partial index opportunities
✅ Provide exact explain() queries for production verification
Success Criteria
A complete analysis MUST include:
Coverage Metrics:
Total collections analyzed
Collections with queries found
Collections with NO queries (obsolete candidates)
Total queries inventoried
Queries with full index coverage
Queries with partial coverage (🪛 TO CHECK)
Queries missing indexes (🔴 CRITICAL)
Schema Validation:
All collections checked for created_at and updated_at
Kafka-updated collections checked for correlation_id recommendation
Model consistency verified between api_python and kafka-consumer
Field existence or nested field indexing analyzed for partial index opportunities
Optimization Opportunities:
At least 3 specific 🪛 TO CHECK items with explain() queries
All 🔴 CRITICAL missing indexes identified with code snippets
Obsolete collections identified with removal recommendation
Covered query opportunities analyzed for top queries
Production Readiness:
Exact explain() queries for all 🪛 TO CHECK items
Migration approach for schema changes (timestamps, correlation_id)