Reviews a generated test plan for completeness, consistency, and quality using a 5-criteria rubric. Scores, auto-revises, and re-scores (max 2 cycles).
Internal orchestrator that reviews and scores a test plan using the quality rubric (5 criteria, 0-2 each, 10-point scale). Auto-revises failing plans and re-scores up to 2 times.
This skill is not user-invocable. It is called by:
test-plan.create (Step 4)Parse $ARGUMENTS to extract:
TestPlan.mdIf no arguments provided and test-plan.create just generated a test plan in this session, use that feature directory automatically.
Read <feature_dir>/TestPlan.md
Read frontmatter to extract strat_key:
uv run python scripts/frontmatter.py read <feature_dir>/TestPlan.md
Fetch the source strategy from Jira using the strat_key:
mcp__atlassian__getJiraIssue with issueIdOrKey=<strat_key>
If MCP is unavailable, check for a local strategy file in artifacts/strat-tasks/<strat_key>.md (rfe-creator convention). If neither is available, warn the user that grounding and scope fidelity scoring will be degraded, and proceed with the test plan content only.
Store the raw strategy text for passing to sub-agents.
Read the score agent prompt from ${CLAUDE_SKILL_DIR}/prompts/score-agent.md.
Launch a forked score agent with these substitutions:
{FEATURE_DIR} = feature directory path{TEST_PLAN_PATH} = <feature_dir>/TestPlan.md{STRATEGY_TEXT} = raw strategy description text from Step 1{CALIBRATION_DIR} = ${CLAUDE_SKILL_DIR}/calibration/The score agent evaluates the test plan against a 5-criterion rubric (specificity, grounding, scope fidelity, actionability, consistency) and returns a structured assessment with per-criterion scores and a grounding cross-reference table.
Completeness checks performed by the score agent:
| Section | Check |
|---|---|
| 1.1 Purpose | Does it clearly state what is being tested and why? |
| 1.2 Scope | Are in-scope and out-of-scope explicitly defined? |
| 1.3 Test Objectives | Are there 3-7 concrete, measurable objectives? |
| 2.1 Test Levels | Are the selected levels appropriate for the feature type? |
| 2.3 Priorities | Are P0/P1/P2 definitions specific to this feature, not generic? |
| 3.1 Cluster Config | Are versions and dependencies specified or marked TBD? |
| 3.2 Test Data | Are test data requirements concrete enough to act on? |
| 4 Endpoints/Methods | Are entries grounded in source documents, not fabricated? |
| 6.1 E2E Scenarios | Is the E2E Scenario Summary populated with TC-E2E-* entries? (Note: expected to be empty until create-cases runs) |
| 6.2 E2E Coverage | Does each P0 endpoint from Section 4 have E2E scenario coverage in Section 6.2? (Note: expected to be empty until create-cases runs) |
| 7.1 Disconnected | Addressed with testing considerations or explicitly marked Not Applicable with justification? |
| 7.2 Upgrade | Addressed with testing considerations or explicitly marked Not Applicable with justification? |
| 7.3 Performance | Addressed with testing considerations or explicitly marked Not Applicable with justification? |
| 7.4 RBAC | Addressed with testing considerations or explicitly marked Not Applicable with justification? |
| 8 Risks | Are risks specific to this feature, not boilerplate? |
| 9 Environment | Is there enough detail to set up a test environment? |
Read the review agent prompt from ${CLAUDE_SKILL_DIR}/prompts/review-agent.md.
Launch a forked review agent with these substitutions:
{FEATURE_DIR} = feature directory path{ASSESSMENT_TEXT} = full output from the score agent (Step 2){FIRST_PASS} = true (first assessment cycle)The review agent writes <feature_dir>/TestPlanReview.md with rubric scores, feedback, and validated frontmatter.
Consistency checks performed by the review agent:
After the review agent completes, read the review frontmatter:
uv run python scripts/frontmatter.py read <feature_dir>/TestPlanReview.md
If all five criteria in scores.* are 2, proceed to Step 5 (done).
If any criterion in scores.* is < 2, enter the revision loop.
Initialize cycle counter: reassess_cycle=0
4a. Filter for revision:
uv run python scripts/filter_for_revision.py <feature_dir>
If output is SKIP, stop the loop and proceed to Step 5.
4b. Launch revise agent (fork):
Read the revise agent prompt from ${CLAUDE_SKILL_DIR}/prompts/revise-agent.md.
Launch with substitutions:
{FEATURE_DIR} = feature directory path{STRATEGY_TEXT} = raw strategy text from Step 1The revise agent edits TestPlan.md (only sections mapped to failing criteria) and sets auto_revised=true.
4c. Check if reassessment is needed:
uv run python scripts/frontmatter.py read <feature_dir>/TestPlanReview.md
If auto_revised is false, the revise agent found nothing to change — stop the loop.
Increment reassess_cycle. If reassess_cycle >= 2, stop — max cycles reached. Proceed to Step 5.
4d. Save cumulative state:
uv run python scripts/preserve_review_state.py save <feature_dir>
4e. Re-score:
Delete the existing review file to force a clean re-assessment:
rm <feature_dir>/TestPlanReview.md
Repeat Step 2 (score agent) with the revised TestPlan.md.
4f. Re-review:
Repeat Step 3 (review agent) with {FIRST_PASS}=false.
4g. Restore before_scores and revision history:
uv run python scripts/preserve_review_state.py restore <feature_dir>
4h. Check criteria again:
Read the review frontmatter. If all criteria are now 2, stop.
If any criterion remains < 2 and cycles remain, go back to 4a.
If cycles are exhausted, stop and proceed to Step 5.
Read the final review file and present a summary to the user:
## Test Plan Review — {feature_name}
**Score: {score}/10 — Verdict: {verdict}**
| Criterion | Score |
|-----------|-------|
| Specificity | {n}/2 |
| Grounding | {n}/2 |
| Scope Fidelity | {n}/2 |
| Actionability | {n}/2 |
| Consistency | {n}/2 |
{If before_score differs from score:}
**Delta: {before_score} → {score} ({+/-difference})**
{If verdict = Ready:}
The test plan is ready for test case generation. Run `/test-plan.create-cases <feature_dir>` to proceed.
{If verdict = Revise (after max cycles):}
The test plan improved but still has issues. Review `<feature_dir>/TestPlanReview.md` for remaining feedback. Consider providing additional source documents (ADR, API spec) to resolve grounding gaps.
{If verdict = Rework:}
The test plan needs significant rework. This may indicate the source strategy lacks sufficient detail. Review `<feature_dir>/TestPlanReview.md` for specific issues.
{If this plan is already in an open PR and reviewer comments exist:}
Use `/test-plan.resolve-feedback <PR_URL>` to triage and apply PR feedback items.
/test-plan.create)/test-plan.create-cases)/test-plan.resolve-feedback <PR_URL>)$ARGUMENTS