Use when encountering any bug, test failure, or unexpected behavior. 4-phase root cause investigation — NO fixes without understanding the problem first.
Random fixes waste time and create new bugs. Quick patches mask underlying issues.
Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.
Violating the letter of this process is violating the spirit of debugging.
The Iron Law
NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST
If you haven't completed Phase 1, you cannot propose fixes.
When to Use
Use for ANY technical issue:
Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues
Use this ESPECIALLY when:
Under time pressure (emergencies make guessing tempting)
Skills relacionados
"Just one quick fix" seems obvious
You've already tried multiple fixes
Previous fix didn't work
You don't fully understand the issue
Don't skip when:
Issue seems simple (simple bugs have root causes too)
You're in a hurry (rushing guarantees rework)
Someone wants it fixed NOW (systematic is faster than thrashing)
The Four Phases
You MUST complete each phase before proceeding to the next.
Phase 1: Root Cause Investigation
BEFORE attempting ANY fix:
1. Read Error Messages Carefully
Don't skip past errors or warnings
They often contain the exact solution
Read stack traces completely
Note line numbers, file paths, error codes
Action: Use read_file on the relevant source files. Use search_files to find the error string in the codebase.
2. Reproduce Consistently
Can you trigger it reliably?
What are the exact steps?
Does it happen every time?
If not reproducible → gather more data, don't guess
Action: Use the terminal tool to run the failing test or trigger the bug:
# Run specific failing test
pytest tests/test_module.py::test_name -v
# Run with verbose output
pytest tests/test_module.py -v --tb=long
3. Check Recent Changes
What changed that could cause this?
Git diff, recent commits
New dependencies, config changes
Action:
# Recent commits
git log --oneline -10
# Uncommitted changes
git diff
# Changes in specific file
git log -p --follow src/problematic_file.py | head -100
4. Gather Evidence in Multi-Component Systems
WHEN system has multiple components (API → service → database, CI → build → deploy):
BEFORE proposing fixes, add diagnostic instrumentation:
For EACH component boundary:
Log what data enters the component
Log what data exits the component
Verify environment/config propagation
Check state at each layer
Run once to gather evidence showing WHERE it breaks.
THEN analyze evidence to identify the failing component.
THEN investigate that specific component.
5. Trace Data Flow
WHEN error is deep in the call stack:
Where does the bad value originate?
What called this function with the bad value?
Keep tracing upstream until you find the source
Fix at the source, not at the symptom
Action: Use search_files to trace references:
# Find where the function is called
search_files("function_name(", path="src/", file_glob="*.py")
# Find where the variable is set
search_files("variable_name\\s*=", path="src/", file_glob="*.py")
Phase 1 Completion Checklist
Error messages fully read and understood
Issue reproduced consistently
Recent changes identified and reviewed
Evidence gathered (logs, state, data flow)
Problem isolated to specific component/code
Root cause hypothesis formed
STOP: Do not proceed to Phase 2 until you understand WHY it's happening.
Phase 2: Pattern Analysis
Find the pattern before fixing:
1. Find Working Examples
Locate similar working code in the same codebase
What works that's similar to what's broken?
Action: Use search_files to find comparable patterns:
Read errors, reproduce, check changes, gather evidence, trace data flow
Understand WHAT and WHY
2. Pattern
Find working examples, compare, identify differences
Know what's different
3. Hypothesis
Form theory, test minimally, one variable at a time
Confirmed or new hypothesis
4. Implementation
Create regression test, fix root cause, verify
Bug resolved, all tests pass
Hermes Agent Integration
Investigation Tools
Use these Hermes tools during Phase 1:
search_files — Find error strings, trace function calls, locate patterns
read_file — Read source code with line numbers for precise analysis
terminal — Run tests, check git history, reproduce bugs
web_search/web_extract — Research error messages, library docs
With delegate_task
For complex multi-component debugging, dispatch investigation subagents:
delegate_task(
goal="Investigate why [specific test/behavior] fails",
context="""
Follow systematic-debugging skill:
1. Read the error message carefully
2. Reproduce the issue
3. Trace the data flow to find root cause
4. Report findings — do NOT fix yet
Error: [paste full error]
File: [path to failing code]
Test command: [exact command]
""",
toolsets=['terminal', 'file']
)
With test-driven-development
When fixing bugs:
Write a test that reproduces the bug (RED)
Debug systematically to find root cause
Fix the root cause (GREEN)
The test proves the fix and prevents regression
Real-World Impact
From debugging sessions:
Systematic approach: 15-30 minutes to fix
Random fixes approach: 2-3 hours of thrashing
First-time fix rate: 95% vs 40%
New bugs introduced: Near zero vs common
No shortcuts. No guessing. Systematic always wins.