Gather evidence that work actually works before declaring done. Run tests, check output, collect pass/fail tallies, report faithfully.
Before declaring a task complete, gather evidence that it actually works. Minimum: run the test, execute the script, check the output.
Use this skill when:
subagent-driven-developmentDO NOT use this skill for:
To declare something DONE, you MUST provide:
Example:
✅ Verification Complete
**Command:** pytest tests/ -v
**Output:**
tests/auth/test_login.py::test_valid_user PASSED
tests/auth/test_login.py::test_invalid_user PASSED
tests/auth/test_password_reset.py PASSED
5 passed in 0.24s
**Timestamp:** 2026-04-13 14:32 UTC
| Task Type | Verification |
|---|---|
| Feature | Feature test passes + manual E2E smoke test |
| Bug fix | Regression test passes + original bug no longer reproduces |
| Refactor | All existing tests pass (API unchanged) |
| Documentation | Links valid, examples run without error |
| Config/build | Build succeeds, artifact is usable |
| Utility function | Unit tests pass, no type errors |
If verification fails:
Example failure report:
❌ Verification Failed
**Command:** pytest tests/ -v
**Output:**
tests/auth/test_login.py::test_valid_user FAILED
AssertionError: Expected 200, got 401
tests/auth/test_password_reset.py PASSED
1 failed, 1 passed
**Issue:** Login test failing with 401. Check token generation in implementer fix.
❌ Claiming "all tests pass" when output shows failures ❌ Not running verification (just trusting the code looks right) ❌ Running only partial tests (run full suite relevant to task) ❌ Skipping manual verification for user-facing features ❌ Suppressing or simplifying failing checks to manufacture green result ❌ Saying "this should work" instead of actually running commands
✅ Always run the command, capture output ✅ Report outcomes faithfully (failures are OK, misreporting is not) ✅ Complete all relevant tests before marking done ✅ If you can't verify (no test exists, can't run), say so explicitly
After verifying every completed task:
## Verification Summary
**Task:** [Task name]
**Branch:** [git branch if applicable]
### Tests Run
- Command: `[exact command]`
- Result: [X/Y passed, Z failed]
- Output: [relevant snippet]
### Manual Checks (if applicable)
- [Feature smoke test]: [result]
- [Integration point]: [result]
### Evidence Collected
- ✅ All specified tests pass
- ✅ Output matches expected
- ✅ No regressions detected
### Conclusion
**Status:** [DONE / NEEDS_FIX with specific issues]
This skill is invoked after subagent-driven-development completes all tasks.
It gates the final "task complete" state: you cannot declare done without verification evidence.
If verification fails, it routes back to implementer subagent with specific failures for rework.
Verification before completion is not about extra bureaucracy. It's about confidence.
The benefit of running pytest -v is not just that tests pass — it's that YOU KNOW they passed because you watched it happen. No assumptions, no guesses, just evidence.
That evidence is what lets you confidently report "done" to the user instead of "probably done, let's hope."