Quality review of test files and manual evidence documents. Goes beyond existence checks — evaluates assertion coverage, edge case handling, naming conventions, and evidence completeness. Produces ADEQUATE/INCOMPLETE/MISSING verdict per story. Run before QA sign-off or on demand.
/smoke-check verifies that test files exist and pass. This skill
goes further — it reviews the quality of those tests and evidence documents.
A test file that exists and passes may still leave critical behaviour uncovered.
A manual evidence doc that exists may lack the sign-offs required for closure.
Output: Summary report (in conversation) + optional production/qa/evidence-review-[date].md
When to run:
/team-qa Phase 5)Modes:
/test-evidence-review [story-path] — review a single story's evidence/test-evidence-review sprint — review all stories in the current sprint/test-evidence-review [system-name] — review all stories in an epic/systemBased on the argument:
Single story: Read the story file directly. Extract: Story Type, Test Evidence section, story slug, system name.
Sprint: Read the most recently modified file in production/sprints/.
Extract the list of story file paths from the sprint plan. Read each story file.
System: Glob production/epics/[system-name]/story-*.md. Read each.
For each story, collect:
Type: field (Logic / Integration / Visual/Feel / UI / Config/Data)## Test Evidence section — the stated expected test file path or evidence docFor each story, find the evidence:
Logic stories: Glob tests/unit/[system]/[story-slug]_test.*
tests/unit/[system]/ for files
containing the story slugIntegration stories: Glob tests/integration/[system]/[story-slug]_test.*
production/session-logs/ for playtest records mentioning the storyVisual/Feel and UI stories: Glob production/qa/evidence/[story-slug]-evidence.*
Config/Data stories: Glob production/qa/smoke-*.md (any smoke check report)
Note what was found (path) or not found (gap) for each story.
For each test file found, read it and evaluate:
Count the number of distinct assertions (lines containing assert, expect, check, verify, or engine-specific assertion patterns). Low assertion count is a quality signal — a test that makes only 1 assertion per test function may not cover the range of expected behaviour.
Thresholds:
For each acceptance criterion in the story that contains a number, threshold, or "when X happens" conditional: check whether a test function name or test body references that specific case.
Heuristics:
Test function names should describe: the scenario + the expected result.
Pattern: test_[scenario]_[expected_outcome]
Flag functions named generically (test_1, test_run, testBasic) as
naming issues — they make failures harder to diagnose.
For Logic stories where the GDD has a Formulas section: check that the test file contains at least one test whose name or comment references the formula name or a formula value. A test that exercises a formula without mentioning it by name is harder to maintain when the formula changes.
For each evidence document found, read it and evaluate:
The evidence doc should reference each acceptance criterion from the story. Check: does the evidence doc contain each criterion (or a clear rephrasing)? Missing criteria mean a criterion was never verified.
Check for three sign-off lines (or equivalent fields):
If any are missing or blank: flag as INCOMPLETE — the story cannot be fully closed without all required sign-offs.
For Visual/Feel stories: check whether screenshot file paths are referenced in the evidence doc. If referenced, Glob for them to confirm they exist.
For UI stories: check whether a walkthrough sequence (step-by-step interaction log) is present.
Evidence doc should have a date. If the date is earlier than the story's last major change (heuristic: compare against sprint start date from the sprint plan), flag as POTENTIALLY STALE — the evidence may not cover the final implementation.
For each story, assign a verdict:
| Verdict | Meaning |
|---|---|
| ADEQUATE | Test/evidence exists, passes quality checks, all criteria covered |
| INCOMPLETE | Test/evidence exists but has quality gaps (thin assertions, missing sign-offs) |
| MISSING | No test or evidence found for a story type that requires it |
The overall sprint/system verdict is the worst story verdict present.
## Test Evidence Review
> **Date**: [date]
> **Scope**: [single story path | Sprint [N] | [system name]]
> **Stories reviewed**: [N]
> **Overall verdict**: ADEQUATE / INCOMPLETE / MISSING
---
### Story-by-Story Results
#### [Story Title] — [Type] — [ADEQUATE/INCOMPLETE/MISSING]
**Test/evidence path**: `[path]` (found) / (not found)
**Automated test quality** *(Logic/Integration only)*:
- Assertion coverage: [N per function on average] — [adequate / thin / none]
- Edge cases: [covered / partial / not found]
- Naming: [consistent / [N] generic names flagged]
- Formula traceability: [yes / no — formula names not referenced in tests]
**Manual evidence quality** *(Visual/Feel/UI only)*:
- Criterion linkage: [N/M criteria referenced]
- Sign-offs: [Developer ✓ | Designer ✗ | QA Lead ✗]
- Artefacts: [screenshots present / missing / N/A]
- Freshness: [dated [date] — current / potentially stale]
**Issues**:
- BLOCKING: [description] *(prevents story-done)*
- ADVISORY: [description] *(should fix before release)*
---
### Summary
| Story | Type | Verdict | Issues |
|-------|------|---------|--------|
| [title] | Logic | ADEQUATE | None |
| [title] | Integration | INCOMPLETE | Thin assertions (avg 1.2/function) |
| [title] | Visual/Feel | INCOMPLETE | QA lead sign-off missing |
| [title] | Logic | MISSING | No test file found |
**BLOCKING items** (must resolve before story can be closed): [N]
**ADVISORY items** (should address before release): [N]
Present the report in conversation.
Ask: "May I write this test evidence review to
production/qa/evidence-review-[date].md?"
This is optional — the report is useful standalone. Write only if the user wants a persistent record.
After the report:
/story-done can mark the
story Complete. Would you like to address any of them now?"/test-helpers [system] to see
scaffolded assertion patterns for common cases."[evidence-path] with them to complete sign-off."Verdict: COMPLETE — evidence review finished. Use CONCERNS if BLOCKING items were found.