Audit, repair, and continuously correct Harness Engineering drift in repositories that already have some form of agent control plane. Use when inspecting root and local AGENTS.md, docs/PLANS.md, docs/OBSERVABILITY.md, docs/exec-plans, generated Harness manifests, and repo-local `python3 scripts/check_harness.py`; then auto-fix low-risk drift, refresh stale managed files, and create a remediation execution plan for high-risk semantic rewrites. Especially relevant for prompts such as `检查这个项目的 Harness 是否健康`, `修复文档索引和 AGENTS 偏离`, `纠正 Harness 漂移`, `做 doc-gardening`, `做 Harness audit`, or `$harness-garden`.
Detect and repair drift between the Harness Engineering control plane and the actual repository. The control plane is only useful when it tells the truth — this skill makes sure it does.
This is a low-freedom skill. Drift detection must be systematic and exhaustive. Remediation must be conservative and mechanically verifiable. Follow the four phases below in order. Do not skip phases. Do not auto-fix anything until Phase 2 (Semantic Drift Audit) is complete and all findings are triaged.
Announce at start: I'm using harness-garden to audit and repair control plane drift.
A control plane that doesn't reflect reality is worse than no control plane at all. Agents trust AGENTS.md and docs/ to navigate the codebase. When those files describe deleted modules, reference renamed paths, or list outdated build commands, agents make confident but wrong decisions. OpenAI calls this "a graveyard of stale rules" — and it's how monolithic manuals fail.
The primary risk this skill fights is false confidence: the control plane looks healthy (files exist, structure is intact, check_harness.py passes) but the no longer reflects reality. A manifest check tells you the control plane exists; a garden audit tells you it's still true.
A secondary risk is harness over-complexity. Per Anthropic's observation, every harness component encodes an assumption about what the model can't do — and those assumptions go stale fast. A module AGENTS.md created when the codebase had three modules may be unnecessary overhead after a refactor consolidated them into one. The garden should strip harness complexity that no longer pulls its weight.
All four Harness skills (harness-bootstrap, harness-garden, harness-feat, harness-fix) use these terms consistently:
| Term | Definition |
|---|---|
| Control plane | The set of AGENTS.md, docs/, scripts/, and manifest that guide agent work |
| Execution plan | A versioned, checkpointed plan in docs/exec-plans/ |
| Managed doc | A doc created and maintained under harness lifecycle |
| Unmanaged doc | An existing team doc the harness must NOT overwrite |
| Manifest | Machine-readable inventory of control plane artifacts (docs/generated/harness-manifest.md) |
| Preflight | Verification check run before starting any task (scripts/check_harness.py) |
| Drift | When control plane artifacts no longer reflect repo reality |
Start every garden audit with the mechanical checks. These are fast, deterministic, and surface the most obvious problems.
python3 scripts/check_harness.py
Record the output. This catches:
If check_harness.py does not exist, the repo has no harness — suggest running harness-bootstrap instead and stop.
Read docs/generated/harness-manifest.md. For every entry in the Control Plane Artifacts table:
directory is actually a directory)Scan the repo for harness-related files that are NOT in the manifest:
AGENTS.md files (root or nested) not listeddocs/exec-plans/active/*.md files not trackeddocs/generated/ not listedscripts/check_harness.py variant not listedUntracked artifacts are a sign of manual edits that bypassed the harness lifecycle. They need to be added to the manifest or removed.
Check all markdown links in ALL managed docs (not just root AGENTS.md):
[text](path) where path is a relative file reference must resolve to an existing file#section-name) should reference existing headings in the target fileRecord all broken links with their source file and line.
Go beyond file existence — check if the CONTENT matches the actual repository. This is the core of what makes a garden audit valuable beyond check_harness.py.
Read references/drift-taxonomy.md for the complete catalog of drift types. The summary below covers the most critical checks.
For the root AGENTS.md and every module-level AGENTS.md:
--help mode to verify they exist. If a command mentions a specific binary (e.g., ./gradlew, pnpm), confirm the binary exists.src/auth/") must exist.Read docs/ARCHITECTURE.md and cross-reference against the actual directory structure:
package.json / pom.xml / pyproject.toml / Cargo.tomlRead docs/OBSERVABILITY.md:
Scan docs/exec-plans/active/:
completed/active/ should be reviewed for abandonmentScan docs/exec-plans/completed/:
docs/PLANS.md should be validCheck docs/PLANS.md:
exec-plans/active/exec-plans/completed/Search for new directories or packages that look like modules but have no AGENTS.md and are not mentioned in the root AGENTS.md:
package.json, pom.xml, Cargo.toml, go.mod, etc.--since)Each discovered module should be evaluated against the same criteria from harness-bootstrap Phase 1.4 to decide if it warrants its own AGENTS.md.
Check for module-level AGENTS.md files that reference modules which have been:
These are high-confidence drift — the AGENTS.md is provably stale.
Evaluate whether current harness components are still load-bearing. Per Anthropic's rippable harness principle, every harness component encodes an assumption about what the model can't do — and those assumptions go stale as models improve or the codebase evolves.
Check for:
docs/ARCHITECTURE.md that essentially duplicates what's already in the READMEdocs/references/ or docs/exec-plans/completed/ that contain only .gitkeep after months of use — are they still needed?Flag these as "complexity reduction candidates" — suggestions, not auto-fixes.
Classify every finding from Phases 1 and 2 by risk level. Then apply fixes in order of confidence.
Read references/auto-fix-rules.md for the complete list of what can be safely auto-fixed.
Each finding gets a severity based on two axes:
Impact — how much damage does this drift cause to agent accuracy?
| Level | Description |
|---|---|
| Critical | Agent will make materially wrong decisions (wrong build commands, deleted APIs) |
| High | Agent will be confused or misled (broken links, stale architecture) |
| Medium | Agent may waste time (outdated patterns, missing index entries) |
| Low | Cosmetic or informational (stale dates, trivial naming) |
Certainty — how confident are we that this is actually drift?
| Level | Description |
|---|---|
| Proven | Mechanically verified (file doesn't exist, command not found) |
| High | Strong evidence from code search (imports don't match, path renamed) |
| Medium | Inferential evidence (pattern seems outdated, module looks consolidated) |
| Low | Suspicion only (semantic claim is hard to verify) |
Apply only when Impact ≤ Medium AND Certainty = Proven:
active/ to completed/After applying auto-fixes, re-run python3 scripts/check_harness.py to verify the fixes didn't introduce new issues.
Apply when Certainty ≥ High but Impact = High or the fix requires content rewriting:
For each proposed fix, include:
Apply when Certainty ≤ Medium OR Impact = Critical:
For each flag, include:
chore(harness): garden audit auto-fixesdocs(harness): garden audit remediationProduce a structured report that summarizes the audit. Read references/garden-report-template.md for the complete format.
=== Harness Garden Report ===
Repository: <repo name>
Audit date: <today>
Previous audit: <last manifest verification date, or "first audit">
## Health Score
Control plane health: <X>% (<N> of <M> artifacts current)
Checked: <number of checks performed>
Auto-fixed: <number of auto-fixes applied>
Proposed fixes: <number of proposed fixes>
Flagged for human: <number of flags>
## Auto-fixes Applied
<For each auto-fix:>
- [path] <description of fix> (evidence: <brief evidence>)
## Proposed Fixes
<For each proposed fix:>
### <title>
- What: <description>
- Why: <evidence>
- Risk: <what could go wrong>
- Proposed change: <diff or description>
## Flagged for Human Review
<For each flag:>
### <title>
- Concern: <description>
- Evidence: <what triggered this>
- Suggestion: <starting point>
## Manifest Update
Updated Last Verified dates for all passing artifacts.
<List of artifacts with updated dates>
## Complexity Audit
<If any complexity reduction candidates were found:>
- <description of candidate and recommendation>
## Next Audit
Recommended next garden audit: <date, based on freshness thresholds>
The health score measures what percentage of the control plane is current and accurate:
health_score = (passing_checks / total_checks) * 100
Where:
After generating the report:
docs/generated/harness-manifest.md — set Last Verified to today's date for all artifacts that passed or were auto-fixedGarden audits are safe to re-run at any time. The behavior:
These rules protect the repo from well-intentioned but harmful automation.
Never modify unmanaged docs. README, CONTRIBUTING, team-maintained docs — report on them if they contain broken links to harness artifacts, but never edit them.
Auto-fixes must be mechanically certain. If there's any ambiguity about the correct fix, it's a proposed fix or a flag, not an auto-fix. The threshold for auto-fix is: a script could do it deterministically.
Semantic drift detection uses code search, not guessing. To verify "the auth module uses JWT tokens", search for JWT-related imports or code in the auth module. Do not guess based on the module name.
Preserve git history. Garden changes are atomic commits with clear messages. Never force-push, rebase, or squash garden commits into unrelated work.
The manifest is the source of truth for what the harness manages. If a file isn't in the manifest, the garden doesn't audit its content (only notes it as potentially untracked).
Don't fix what you can't verify. If you can't determine whether a semantic claim is still accurate, flag it — don't rewrite it based on inference.
Log everything. Every finding, every auto-fix, every skip — the garden report is the audit trail. Another agent or human should be able to reconstruct exactly what happened.
Freshness refresh is not content validation. Updating the Last Verified date means the artifact was checked and found accurate (or fixed). Do not refresh dates for artifacts you couldn't fully verify.
| Repo state | Garden behavior |
|---|---|
| Freshly bootstrapped (<7 days) | Light audit: manifest check + cross-links only |
| Active development | Full audit with emphasis on semantic drift in ARCHITECTURE/AGENTS |
| Post-major-refactor | Deep audit: expect high drift, prioritize module-level accuracy |
| Stable / maintenance mode | Focus on staleness and complexity reduction |
| Monorepo with many modules | Audit module boundaries carefully; look for new/deleted modules |
| File | When to read |
|---|---|
references/drift-taxonomy.md | Phase 2: understanding all drift types |
references/auto-fix-rules.md | Phase 3: determining what can be safely auto-fixed |
references/garden-report-template.md | Phase 4: generating the final report |
references/freshness-thresholds.md | Deciding staleness thresholds by artifact type |