技能内容

Overview

Random fixes waste time and create new bugs. Quick patches mask underlying issues.

Core principle: ALWAYS find root cause before attempting fixes. Symptom fixes are failure.

Violating the letter of this process is violating the spirit of debugging.

The Iron Law

NO FIXES WITHOUT ROOT CAUSE INVESTIGATION FIRST

If you haven't completed Phase 1, you cannot propose fixes.

When to Use

Use for ANY technical issue:

Test failures
Bugs in production
Unexpected behavior
Performance problems
Build failures
Integration issues

Use this ESPECIALLY when:

Under time pressure (emergencies make guessing tempting)
"Just one quick fix" seems obvious

Read Error Messages Carefully
- Don't skip past errors or warnings
- They often contain the exact solution
- Read stack traces completely
- Note line numbers, file paths, error codes
Reproduce Consistently
- Can you trigger it reliably?
- What are the exact steps?
- Does it happen every time?
- If not reproducible → gather more data, don't guess
Check Recent Changes
- What changed that could cause this?
- Git diff, recent commits
- New dependencies, config changes
- Environmental differences

Gather Evidence in Multi-Component Systems

WHEN system has multiple components (CI → build → signing, API → service → database):

BEFORE proposing fixes, add diagnostic instrumentation:

For EACH component boundary:
  - Log what data enters component
  - Log what data exits component
  - Verify environment/config propagation
  - Check state at each layer

Run once to gather evidence showing WHERE it breaks
THEN analyze evidence to identify failing component
THEN investigate that specific component

Example (multi-layer system):

# Layer 1: Workflow
echo "=== Secrets available in workflow: ==="
echo "IDENTITY: ${IDENTITY:+SET}${IDENTITY:-UNSET}"

# Layer 2: Build script
echo "=== Env vars in build script: ==="
env | grep IDENTITY || echo "IDENTITY not in environment"

# Layer 3: Signing script
echo "=== Keychain state: ==="
security list-keychains
security find-identity -v

# Layer 4: Actual signing
codesign --sign "$IDENTITY" --verbose=4 "$APP"

This reveals: Which layer fails (secrets → workflow ✓, workflow → build ✗)

Trace Data Flow

WHEN error is deep in call stack:

Quick version:
- Where does bad value originate?
- What called this with bad value?
- Keep tracing up until you find the source
- Fix at source, not at symptom

Excuse	Reality
"Issue is simple, don't need process"	Simple issues have root causes too. Process is fast for simple bugs.
"Emergency, no time for process"	Systematic debugging is FASTER than guess-and-check thrashing.
"Just try this first, then investigate"	First fix sets the pattern. Do it right from the start.
"I'll write test after confirming fix works"	Untested fixes don't stick. Test first proves it.
"Multiple fixes at once saves time"	Can't isolate what worked. Causes new bugs.
"Reference too long, I'll adapt the pattern"	Partial understanding guarantees bugs. Read it completely.
"I see the problem, let me fix it"	Seeing symptoms ≠ understanding root cause.
"One more fix attempt" (after 2+ failures)	3+ failures = architectural problem. Question pattern, don't fix again.

Phase	Key Activities	Success Criteria
1. Root Cause	Read errors, reproduce, check changes, gather evidence	Understand WHAT and WHY
2. Pattern	Find working examples, compare	Identify differences
3. Hypothesis	Form theory, test minimally	Confirmed or new hypothesis
4. Implementation	Create test, fix, verify	Bug resolved, tests pass

Observe the Symptom

Error: git init failed in /Users/jesse/project/packages/core

Find Immediate Cause - What code directly causes this?

await execFileAsync('git', ['init'], { cwd: projectDir })

Ask: What Called This?

WorktreeManager.createSessionWorktree(projectDir, sessionId)
  → called by Session.initializeWorkspace()
  → called by Session.create()
  → called by test at Project.create()

Keep Tracing Up - What value was passed?
- projectDir = '' (empty string!)
- Empty string as cwd resolves to process.cwd()

Find Original Trigger - Where did empty string come from?

const context = setupCoreTest() // Returns { tempDir: '' }
Project.create('name', context.tempDir) // Accessed before beforeEach!

async function gitInit(directory: string) {
  const stack = new Error().stack
  console.error('DEBUG git init:', {
    directory,
    cwd: process.cwd(),
    nodeEnv: process.env.NODE_ENV,
    stack,
  })
  await execFileAsync('git', ['init'], { cwd: directory })
}

# Run tests one-by-one, stop at first polluter
for f in src/**/*.test.ts; do
  npm test "$f" && [ -d .git ] && echo "POLLUTER: $f" && break
done

function createProject(name: string, workingDirectory: string) {
  if (!workingDirectory || workingDirectory.trim() === '') {
    throw new Error('workingDirectory cannot be empty')
  }
  if (!existsSync(workingDirectory)) {
    throw new Error(`workingDirectory does not exist: ${workingDirectory}`)
  }
}

function initializeWorkspace(projectDir: string, sessionId: string) {
  if (!projectDir) {
    throw new Error('projectDir required for workspace initialization')
  }
}

async function gitInit(directory: string) {
  if (process.env.NODE_ENV === 'test') {
    const normalized = normalize(resolve(directory))
    const tmpDir = normalize(resolve(tmpdir()))
    if (!normalized.startsWith(tmpDir)) {
      throw new Error(`Refusing git init outside temp dir during tests`)
    }
  }
}

async function gitInit(directory: string) {
  logger.debug('About to git init', {
    directory,
    cwd: process.cwd(),
    stack: new Error().stack,
  })
}

Systematic Debugging | Skills Pool

Systematic Debugging

Systematic Debugging

Overview

The Iron Law

When to Use

The Four Phases

Phase 1: Root Cause Investigation

Phase 2: Pattern Analysis

Phase 3: Hypothesis and Testing

Phase 4: Implementation

Red Flags - STOP and Follow Process

Common Rationalizations

Quick Reference

Technique: Root Cause Tracing

The Tracing Process

Adding Stack Traces

Finding Which Test Causes Pollution

Technique: Defense-in-Depth Validation

Why Multiple Layers

The Four Layers

Applying Defense-in-Depth

Real-World Impact

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio