Decide the final outcome of the current iteration. Use during the decision phase to evaluate implementation and benchmark results and determine if the candidate is accepted, rejected, inconclusive, or failed. Triggers on: decide, evaluate outcome, evolution decide.
Determine the final outcome of the current iteration and persist the decision.
implementation.md and benchmark.md from the current iteration directorydecision.md with the outcome and reasoningiteration.json with one of the allowed final statesiteration.json in the current iteration directory (required) — contains iteration number, baseline version, hypothesis, implementation, and benchmark metadatahypothesis.md in the current iteration directory — the proposed improvementimplementation.md in the current iteration directory — summary of candidate changesbenchmark.md in the current iteration directory — benchmark resultscorrectness/results.md in the current iteration directory — correctness-gate results and benchmark eligibilityRead All Artifacts — Open implementation.md and benchmark.md to review what was implemented and how it performed. Open iteration.json for the full context including hypothesis, state, and metrics.
Evaluate Against Acceptance Policy — Determine the outcome based on:
benchmark.sufficientForPromotion is true with a statistically meaningful improvement over the baseline, and the candidate can be promoted under the versioning policy.Write decision.md — Record the decision with:
policyStage, completed games, and whether benchmark.sufficientForPromotion was satisfiedaccepted outcome, the promotion metadata required by the versioning policy (previousVersion, promotedVersion, baseline refs, and version-artifact path)Update iteration.json — Follow the canonical state-machine contract in tasks/prd-wiggum-evolution-loop.md:
benchmarked by transitioning to deciding before choosing an outcomestate to exactly one of "accepted", "rejected", "inconclusive", or "failed"stateMachine.currentPhase aligned with the final statedecision object with:
outcome — one of the allowed final statesreasoning — explanation of the decisionevidence — key metrics that supported the decision, including benchmark policy fields used for the callThe decision artifact should follow this structure in decision.md:
# Iteration N Decision
## Outcome
One of: accepted, rejected, inconclusive, failed.
## Reasoning
Explanation of why this outcome was selected based on the evidence.
## Evidence
- Implementation: summary of what was changed
- Benchmark: key results (games completed, win rate, ELO estimate)
- Policy: which acceptance criteria were met or not met
## Recommendations
Suggestions for future iterations (e.g., "retry with more games", "explore related idea from iteration X", "abandon this direction").
| State | Meaning | Baseline |
|---|---|---|
accepted | Candidate validated and promoted | Updated to new version |
rejected | Candidate evaluated and discarded | Unchanged |
inconclusive | Evidence insufficient for clear decision | Unchanged |
failed | Implementation or benchmark infrastructure failure | Unchanged |
The final-state rules come from the iteration state machine in tasks/prd-wiggum-evolution-loop.md. The decision phase must not skip directly from implemented to a final outcome.
decision.md and update iteration.json; do not modify engine code, benchmark results, or other iteration artifactsAfter this skill runs, the following must be true:
decision.md exists in the iteration directory with the outcome and reasoningiteration.json has been updated with one of the four allowed final states: accepted, rejected, inconclusive, or failediteration.json includes a decision object with outcome and reasoning fields.claude/evolution/CLAUDE.mdscripts/evolution-loop.shscripts/benchmark-version.shtasks/prd-wiggum-evolution-loop.mdtasks/prd-wiggum-evolution-loop.md