Rigorous self-assessment checklist before marking any task as complete. Use when about to claim task completion, before final commit, when user asks "is it done?", or when transitioning from implementation to reporting. Prevents premature completion claims by requiring evidence for every assertion.
Workflow Reference: See Master Workflow for how this skill fits into the verification stage of the agentic workflow.
STOP. You are NOT done yet. Generate this checklist and provide EVIDENCE for every item.
flowchart TD
Start(["Begin WORKS Check -- Section 2"]) --> Q{"Task type?"}
Q -->|"Executable code -- compiled, scripted, or CLI-run"| A1["Execution check<br>Terminal output showing successful run<br>(exit code 0 is NOT enough)"]
Q -->|"Static asset -- docs, configs, analysis"| B1["Accuracy check<br>Verified against source code or schema?"]
A1 --> A2["Real data check<br>Ran changed code path against real data<br>not just read the diff?"]
A2 --> A3["Regression check<br>Evidence that existing tests still pass?"]
A3 --> A4["Edge case check<br>Evidence of testing failure scenarios?"]
A4 --> AEvidence["Record code evidence<br>execution output, real data test,<br>test results, edge case result"]
B1 --> B2["Clarity check<br>Follows the established format?"]
B2 --> B3["Validity check<br>Links and references resolve?"]
B3 --> BEvidence["Record static evidence<br>accuracy check method,<br>format standard, link validation method"]
AEvidence --> Done(["WORKS Check complete -- proceed to Section 3"])
BEvidence --> Done
EVIDENCE:
- Execution output: [paste actual output]
- Real data test: [command run, input used, output observed]
- Test results: [paste test output]
- Edge case tested: [describe scenario and result]
EVIDENCE:
- Accuracy check: [how verified]
- Format compliance: [standard followed]
- Links validated: [method used]
For bug fixes specifically:
EVIDENCE:
- Pre-fix behavior: [what was observed]
- Post-fix behavior: [what is now observed]
- Regression test added: [yes/no, location]
EVIDENCE:
- Pre-commit: [output or "not configured"]
- Linting: [tool and result]
- Type check: [tool and result]
If the task has an issue-classification field in its metadata, verify the response matched the issue type. If no issue-classification is present, mark N/A and proceed.
flowchart TD
Start(["Begin Proportional Response Check"]) --> Q1{"issue-classification<br>present in task metadata?"}
Q1 -->|"absent"| Skip["SKIP -- existing WORKS/FIXED/Quality Gates apply"]
Q1 -->|"present"| Q2{"Classification type?"}
Q2 -->|"procedural"| P["Sweep completeness<br>Codebase search returns zero<br>remaining instances of the pattern"]
Q2 -->|"defect"| D["Root cause addressed<br>Fix targets root cause from evidence chain<br>+ scenario in scenario-target succeeds"]
Q2 -->|"recurring-pattern"| R["Guardrail added<br>New gate/check exists AND<br>covers the defect CLASS not just instance"]
Q2 -->|"missing-guardrail"| M["Gate gap filled<br>Guardrail triggers in the<br>exposing scenario"]
Q2 -->|"unbounded-design"| U["Design implemented<br>Matches chosen direction +<br>trade-offs documented"]
P --> Evidence
D --> Evidence
R --> Evidence
M --> Evidence
U --> Evidence
Skip --> Done(["Proportional Check complete"])
Evidence["Record proportional evidence"] --> Done
EVIDENCE:
- Issue Classification: [type or "not classified"]
- Scenario Target: [scenario -> improvement, or "not specified"]
- Proportional Check: [PASS/FAIL/N/A]
- Check detail: [what was verified and result]
When work was delegated to a sub-agent, the agent's success report is NOT evidence.
git diff shows the expected changes?EVIDENCE:
- Agent report: [what agent claimed]
- VCS diff: [files changed, scope matches expectation]
- Independent verification: [command run, output observed]
If no agents were used, mark N/A and proceed.
If any of these thoughts occur, STOP and run the verification command:
| Rationalization | Response |
|---|---|
| "Should work now" | Run the verification command |
| "I'm confident" | Confidence is not evidence |
| "Just this once" | No exceptions |
| "Linter passed so build passes" | Linter does not check compilation |
| "Agent said success" | Verify independently (Section 6) |
| "I'm tired" | Exhaustion is not an excuse |
| "Partial check is enough" | Partial check proves nothing about the whole |
| "Different words so rule doesn't apply" | Spirit over letter |
Red flags in your own output — if you catch yourself writing any of these, the gate has not been passed:
If you cannot demonstrate it working in practice with evidence, it is NOT done.
| Claim | Required Evidence |
|---|---|
| "Code works" | Terminal output showing execution against real data |
| "Tests pass" | Actual test output, not assumption |
| "Bug fixed" | Before/after comparison |
| "Data synced" | Read the destination after writing — not the tool output |
| "Docs accurate" | Cross-reference with source |
| "Config valid" | Validation command output |
| "Root cause fixed" | Evidence chain from grooming + fix addresses root cause claim |
| "Guardrail added" | New gate/check exists and triggers in exposing scenario |
| "Agent completed" | VCS diff reviewed + independent verification command run |
VERIFICATION SUMMARY:
Task Type: [FIX/FEATURE/REFACTOR/DOCS/INVESTIGATION]
Works Check: [PASS/FAIL] - Evidence: ___
Fixed Check: [PASS/FAIL/N/A] - Evidence: ___
Proportional Check: [PASS/FAIL/N/A] - Evidence: ___
Quality Gates: [PASS/FAIL] - Evidence: ___
Agent Delegation: [PASS/FAIL/N/A] - Evidence: ___
Honesty Check: [PASS/FAIL]
VERDICT: [COMPLETE / NOT COMPLETE - reason]