Claims-first codebase audit that extracts documentation claims and verifies them against code. Use when asked to "audit", "verify docs match code", "check if README claims are true", or "validate documentation accuracy". Falsification-first approach.
Documentation makes claims. Code makes behavior. This skill finds divergence.
Core principle: Extract every testable claim from documentation, then attempt to falsify each one against the actual codebase. Claims that survive falsification are VERIFIED. Claims that don't are exposed.
This is not a code review. Code reviews ask "is this code good?" This skill asks "is what the documentation says actually true?"
code-review-gemini or )code-review-claudecritical-research)Ask the user (or determine from context):
If not specified, default to: README.md + CLAUDE.md in the project root, verified against the full codebase.
Read each target document and extract every testable claim — any statement that asserts something about the codebase that can be verified as true or false.
Categories of claims:
npm test to execute tests", "Deploy with make deploy"Record each claim with its source location (file:line).
Ignore non-testable statements: opinions, aspirations, future plans ("we plan to..."), and subjective assessments ("easy to use").
Assign each claim a risk level based on the consequence of it being false:
| Risk | Criteria | Example |
|---|---|---|
| Critical | False claim causes data loss, security breach, or production failure | "Authentication is required for all API endpoints" |
| High | False claim causes broken setup, wasted developer hours, or wrong architecture decisions | "Run npm install to set up the project" |
| Medium | False claim causes confusion or minor inefficiency | "The config file supports hot-reloading" |
| Low | False claim is cosmetic or trivial | "The project follows Conventional Commits" |
Apply verification methods based on risk level:
| Method | All Claims | Critical + High | Critical Only |
|---|---|---|---|
| A: Static Analysis | Yes | Yes | Yes |
| B: Test Evidence + Mirror Test | Yes | Yes | Yes |
| C: Runtime Probe | — | Yes | Yes |
| D: Dependency Trace | — | Yes | Yes |
| E: Mutation Test | — | — | Yes |
Search the codebase for evidence that supports or contradicts the claim:
Grep to find relevant code patternsGlob to verify file structure claimsRead to inspect specific implementationsCheck if tests exist that verify the claim, then apply the Mirror Test:
For Critical and High risk claims, attempt to verify through execution:
Trace the dependency chain for claims about integrations:
For the highest-risk claims, consider what would happen if the claim were false:
For each claim, search for counter-evidence first:
This order matters. Starting with supporting evidence creates confirmation bias.
Each claim receives one of these verdicts (aligned with critical-research verdict system):
| Verdict | Meaning | Action Required |
|---|---|---|
| VERIFIED | Claim confirmed by code evidence | None |
| PARTIALLY VERIFIED | Claim is true in some cases but not all | Update docs to reflect scope |
| UNVERIFIED | Cannot confirm or deny with available tools | Flag for manual review |
| FALSE | Claim contradicted by code evidence | Fix code or fix docs |
| UNFALSIFIABLE | Claim is too vague to test | Rewrite claim to be testable |
For every FALSE or PARTIALLY VERIFIED claim, generate a specific remediation:
CLAIM: [quoted claim] (source: file:line)
VERDICT: FALSE
EVIDENCE: [what was found]
REMEDIATION: [specific action — either fix the code or fix the docs]
Compile findings into the output format below.
# Codebase Audit Report
## Audit Scope
- **Documents audited**: [list with file paths]
- **Codebase scope**: [directories/files checked]
- **Date**: [audit date]
## Documentation Accuracy Score
- **Total claims extracted**: X
- **VERIFIED**: Y (Z%)
- **PARTIALLY VERIFIED**: A (B%)
- **UNVERIFIED**: C (D%)
- **FALSE**: E (F%)
- **UNFALSIFIABLE**: G (H%)
- **Overall accuracy**: Y / (X - G) = Z% (excluding unfalsifiable)
## Critical + High Risk Findings
### FALSE Claims
| # | Claim | Source | Evidence | Remediation |
|---|-------|--------|----------|-------------|
| 1 | "..." | file:line | [what was found] | [fix code / fix docs] |
### PARTIALLY VERIFIED Claims
| # | Claim | Source | Scope Limitation | Remediation |
|---|-------|--------|------------------|-------------|
| 1 | "..." | file:line | [true for X, false for Y] | [update docs] |
## Medium + Low Risk Findings
### FALSE Claims
| # | Claim | Source | Evidence | Remediation |
|---|-------|--------|----------|-------------|
### PARTIALLY VERIFIED Claims
| # | Claim | Source | Scope Limitation | Remediation |
|---|-------|--------|------------------|-------------|
## UNVERIFIED Claims (Manual Review Needed)
| # | Claim | Source | Why Unverifiable |
|---|-------|--------|-----------------|
## UNFALSIFIABLE Claims (Docs Need Rewriting)
| # | Claim | Source | Suggested Rewrite |
|---|-------|--------|-------------------|
## Verified Claims (Passing)
[List of claims that survived falsification, grouped by document]
## Methodology Notes
- [Any limitations of this audit]
- [What could not be checked and why]
- [Assumptions made during verification]
User: "Audit the README against the actual codebase"
Step 1 → Scope: README.md, verified against full repo
Step 2 → Extract 23 claims (structure, setup, API, testing)
Step 3 → Classify: 3 Critical, 7 High, 8 Medium, 5 Low
Step 4 → Apply methods A+B for all, C+D for Critical+High, E for Critical
Step 5 → Falsification-first execution
Step 6 → Verdicts: 15 VERIFIED, 3 PARTIALLY VERIFIED, 2 FALSE, 1 UNVERIFIED, 2 UNFALSIFIABLE
Step 7 → Remediation for 5 non-passing claims
Step 8 → Report with 87% accuracy score
User: "Check if our API docs match the actual endpoints"
Step 1 → Scope: docs/api.md, verified against src/routes/
Step 2 → Extract 45 claims (endpoints, params, responses, auth)
Step 3 → Classify: 8 Critical (auth claims), 15 High (endpoint behavior), 22 Medium
Step 4 → Methods A+B+C+D for Critical+High, A+B for Medium
Step 5 → Find 3 endpoints documented but not implemented, 2 undocumented endpoints
Step 6 → 5 FALSE claims, rest VERIFIED
Step 7 → Remediation: add missing endpoints or remove from docs
Step 8 → Report with 89% accuracy score
npm test) and should be confirmed with the user before running.