Systematic root cause analysis using lean methods (5 Whys, Ishikawa, Gemba). Use when encountering unexpected behavior, errors, or defects to find and fix the true root cause rather than symptoms.
Systematically identify and fix the root cause of defects, errors, or unexpected behavior using lean problem-solving methods. Stop fixing symptoms - find the true cause.
Shu (守): Follow the 5 Whys method strictly; document each step.
Ha (破): Combine methods (Ishikawa + 5 Whys); adapt depth to problem complexity.
Ri (離): Develop domain-specific diagnostic patterns; teach others the methods.
When to use:
When NOT to use:
/rai-story-plan instead)Inputs required:
Output:
| Method | Use When | Depth |
|---|---|---|
| 5 Whys | Single causal chain | Quick (5-10 min) |
| Ishikawa | Multiple possible causes | Medium (15-30 min) |
| Gemba | Need to observe actual behavior | Variable |
| A3 | Complex problems requiring documentation | Deep (30+ min) |
Go see the actual problem. Don't rely on descriptions.
Problem Statement Template:
WHAT is happening: [specific behavior]
WHEN it happens: [conditions/triggers]
WHERE it occurs: [location in code/system]
EXPECTED behavior: [what should happen]
Verification: Problem statement is specific and reproducible.
If you can't continue: Cannot reproduce → Gather more information, check logs.
Ask "Why?" five times to drill down to root cause.
Rules:
Template:
## 5 Whys Analysis
**Problem:** [Problem statement]
1. **Why?** [First-level cause]
→ Because: [Evidence/observation]
2. **Why?** [Second-level cause]
→ Because: [Evidence/observation]
3. **Why?** [Third-level cause]
→ Because: [Evidence/observation]
4. **Why?** [Fourth-level cause]
→ Because: [Evidence/observation]
5. **Why?** [Root cause]
→ Because: [Evidence/observation]
**Root Cause:** [Summary]
**Countermeasure:** [Fix]
Verification: Root cause is actionable and explains all symptoms.
If you can't continue: Chain branches → Use Ishikawa for multiple causes.
For problems with multiple potential causes, use the fishbone diagram.
Categories (6 M's for software):
┌─── Method (process, algorithm)
│
├─── Machine (hardware, infrastructure)
│
├─── Material (data, inputs, dependencies)
PROBLEM ◄───────────┤
├─── Measurement (metrics, monitoring)
│
├─── Manpower (skills, knowledge gaps)
│
└─── Milieu (environment, configuration)
For each category, list potential causes:
## Ishikawa Analysis
**Problem:** [Problem statement]
### Method
- [ ] Algorithm logic error
- [ ] Missing edge case handling
- [ ] Incorrect sequence of operations
### Machine
- [ ] Resource constraints (memory, CPU)
- [ ] Platform-specific behavior
- [ ] Network issues
### Material
- [ ] Invalid input data
- [ ] Dependency version mismatch
- [ ] Missing configuration
### Measurement
- [ ] Inadequate error messages
- [ ] Missing logs at failure point
- [ ] Incorrect assertions in tests
### Manpower
- [ ] Documentation gap
- [ ] Unclear requirements
- [ ] Knowledge not shared
### Milieu
- [ ] Environment variable missing
- [ ] Different dev vs prod config
- [ ] File path differences
**Most Likely Causes:** [Top 2-3]
**Investigation Order:** [Priority]
Verification: At least 3 categories explored; most likely causes identified.
If you can't continue: All causes eliminated → Broaden investigation scope.
Test hypotheses systematically.
Investigation Log:
## Investigation Log
| Hypothesis | Test | Result | Conclusion |
|------------|------|--------|------------|
| [Cause 1] | [How tested] | [What happened] | Confirmed/Eliminated |
| [Cause 2] | [How tested] | [What happened] | Confirmed/Eliminated |
Verification: Root cause confirmed with evidence.
If you can't continue: No hypothesis confirmed → Return to Step 3, add categories.
Fix the root cause, not symptoms.
Fix Checklist:
Verification: Problem no longer occurs; tests pass.
If you can't continue: Fix incomplete → Document partial fix, create follow-up task.
For significant issues, add prevention measures.
Prevention Options:
Verification: Prevention measure in place.
work/debug/{issue-name}/analysis.md (optional, for complex issues)| Symptom Pattern | Common Root Causes |
|---|---|
| "Works locally, fails in CI" | Environment config, path differences, missing deps |
| "Intermittent failure" | Race condition, external dependency, resource limit |
| "Broke after update" | Dependency change, API breaking change, config drift |
| "Only fails with certain data" | Edge case, encoding issue, type coercion |
This skill implements Jidoka principle: Stop and fix quality issues immediately.
When you detect a defect during any skill:
/rai-debug to find root cause| Problem Complexity | Max Investigation Time |
|---|---|
| Simple (single component) | 15 minutes |
| Medium (multiple components) | 30 minutes |
| Complex (system-wide) | 60 minutes |
If exceeding time box, document findings and escalate.