Deep consistency audit of the entire repository infrastructure. Launches 4 parallel specialist agents to find factual errors, code bugs, count mismatches, and cross-document inconsistencies. Then fixes all issues and loops until clean. Use when: after making broad changes, before releases, or when user says "audit", "find inconsistencies", "check everything".
Run a comprehensive consistency audit across the entire repository, fix all issues found, and loop until clean.
Before spawning agents, run the mechanical parity checks:
python3 scripts/check-skill-integrity.py --verbose
This catches four classes of bug that agent-based audits have historically missed:
allowed-tools ↔ body tool-invocation parity (e.g. body spawns but not in — the v1.7.0 PR #92 miss).TaskTaskallowed-toolsargument-hint ↔ body flag parity (flags documented but not advertised, or vice versa).[text](path#anchor) links — the #category-11-numerical-discipline miss on PR #87).paths: ↔ skill implementation parity (rule claims skill follows protocol but skill body has none of the protocol keywords — the /interview-me miss on PR #92).If Phase 0 reports P0 or P1 findings, fix them (or tune the regex if they are false positives) before launching the 4 agents. The mechanical layer is cheaper and more precise than agent prompts for these classes.
Launch these 4 agents simultaneously using Task with subagent_type=general-purpose. Each agent's prompt must tell it to read .claude/references/audit-pet-peeves.md and explicitly check for each class of bug before reporting clean. The pet-peeves file is a living catalogue of drift patterns review bots have caught; it grows with each PR.
Focus: guide/workflow-guide.qmd
Focus: all executable code in the repo — .claude/hooks/*.py, .claude/hooks/*.sh, scripts/*.py, scripts/*.sh, .claude/scripts/*.sh. Not just .claude/hooks/ — when PR #93 added new code under scripts/, the original narrow scope meant Copilot + Codex caught 5 bugs the audit missed.
Hook-specific checks (Stop/PreToolUse/SessionStart protocols, CLAUDE_PROJECT_DIR usage, hash-length consistency) apply only to .claude/hooks/. Everything below applies to ALL executable code:
/tmp/ usage in anything that manages state (should use ~/.claude/sessions/)[:8] across all hooks) [hooks only]try/except with sys.exit(0)). Python read_text() must catch UnicodeError (not just OSError) if the script is promised fail-open for corrupt files. Bash set -u without set -e or explicit post-command checks does NOT catch command failures — verify.{"decision": "block", "reason": "..."} on stdout — modern; this is what log-reminder.py uses and it works correctly
Non-blocking hooks always exit 0. PreCompact hooks MUST exit 0 (stdout is discarded by the harness — use stderr for diagnostics)from __future__ import annotations for Python 3.8+ compatibilitysource not type for SessionStart)Focus: .claude/skills/*/SKILL.md and .claude/rules/*.md
disable-model-invocation: trueallowed-tools values are sensibleallowed-tools actually covers every tool the skill body invokes. For every Task spawn, Bash command, Write/Edit call mentioned in the skill's Steps / Phases / Workflow body, verify the tool appears in the allowed-tools array. Common miss: skill body says "spawn agent-X via Task with context=fork" but Task is absent from allowed-tools — runtime permission error or silent bypass. Caught this class of bug after Codex/Copilot flagged it on PR #92 (4 skills promised Task in their Post-Flight sections but 3 of 4 had no Task permission).paths: scope matches skill implementation. If rule X lists skill Y in paths:, verify skill Y actually implements the protocol rule X mandates. A rule claiming a skill follows a protocol is meaningless if the skill doesn't.paths: reference existing directoriestemplates/Focus: README.md, docs/index.html, docs/workflow-guide.html
Categorize each finding:
Common false alarms to watch for:
## Title inside ::: divs — this is standard syntax, NOT a heading bugallowed-tools linter warning — known linter bug (Claude Code issue #25380), field IS validCHANGELOG.md under past version headings — those are snapshots; do NOT updatelog-reminder.py outputting {"decision": "block"} with sys.exit(0) — this IS the modern Claude Code Stop-hook block protocol, NOT a bugCount drift specifically: search for every phrasing variant. A common failure mode is that replace_all on one phrasing (e.g., "26 skills") misses sibling phrasings in the same repo. When checking counts, grep for ALL of:
"N skills", "N skill " (with space)"N slash commands""N specialized" (as in "N specialized agents")"template's N" (informal count in prose)"skills," vs "skills, and" are treated as different strings by replace_all
Verify zero matches for the OLD number across the whole tree before declaring clean.Apply fixes in parallel where possible. For each fix:
If guide/workflow-guide.qmd was modified:
quarto render guide/workflow-guide.qmd
cp guide/workflow-guide.html docs/workflow-guide.html
After fixing, launch a fresh set of 4 agents to verify.
Max loops: 5 (to prevent infinite cycling)
These are real bugs found across 7 rounds — check for these specifically:
| Bug Pattern | Where to Check | What Went Wrong |
|---|---|---|
| Stale counts ("19 skills" → "21") | Guide, README, landing page | Added skills but didn't update all mentions |
| Hook exit codes | All Python hooks | Exit 2 in PreCompact silently discards stdout |
| Hook field names | post-compact-restore.py | SessionStart uses source, not type |
| State in /tmp/ | All Python hooks | Should use ~/.claude/sessions/<hash>/ |
| Hash length mismatch | All Python hooks | Some used [:12], others [:8] |
| Missing fail-open | Python hooks __main__ | Unhandled exception → exit 1 → confusing behavior |
| Python 3.10+ syntax | Type hints like `dict | None` |
| Missing directories | quality_reports/specs/ | Referenced in rules but never created |
| Always-on rule listing | Guide + README | meta-governance omitted from listings |
| macOS-only commands | Skills, rules | open without xdg-open fallback |
| Stale hook references | Rules, guide, CHANGELOG, settings.json | Removed hooks still mentioned somewhere |
After each round, report:
## Round N Audit Results
### Issues Found: X genuine, Y false alarms
| # | Severity | File | Issue | Status |
|---|----------|------|-------|--------|
| 1 | Critical | file.py:42 | Description | Fixed |
| 2 | Medium | file.qmd:100 | Description | Fixed |
### Verification
- [ ] No stale counts (grep confirms)
- [ ] All hooks have fail-open + future annotations
- [ ] Guide renders successfully
- [ ] docs/ updated
### Result: [CLEAN | N issues remaining]