Use when solving any non-trivial problem where the solution space is uncertain — research options via a separate agent, adversarially critique them via a different agent, then loop (implement → critique) until the critic finds nothing. Skip only for single-line or trivial changes.
Separate the hand that builds from the hand that tears down. The builder cannot credibly critique its own output.
| Use | Skip |
|---|---|
| Solution space uncertain | Single-line change |
| 2+ plausible approaches | Trivial typo or reformat |
| Correctness is load-bearing | Throwaway experiment |
| Research would reduce uncertainty | Mechanical rename |
Coding task? Every subagent prompt (explorer, critic, implementer) must include: "Before starting, load the <language>-coding-style skill and follow its rules."
Each iteration tackles one change. All four steps run per iteration. Do not advance to next change until current one passes all steps.
| Step |
|---|
| Phase |
|---|
| Actor |
|---|
| Output |
|---|
| 1 | Explore | Separate research agent | Ranked options + cited sources |
| 2 | Critique explorations | Different critic agent | Winner with CONCRETE TEXT |
| 3 | Implement | Implementer (subagent or main) | One diff |
| 4 | Review gate (parallel) | Critic A + Critic B + E2E agent | All three run concurrently; wait for all |
| Exit | Main thread | Apply / commit / report |
Agent separation: see Red Flags. Main thread orchestrates; agents produce.
Spawn a separate agent. Prompt must include:
claim verification hierarchy. Primary sources only for T1.Spawn a DIFFERENT agent — not the explorer, not the main thread.
The critic's prompt must include:
One change, one diff. Code tasks: implementer invokes superpowers:test-driven-development, debugging-discipline, and the applicable <language>-coding-style skill.
Spawn all three agents in a single message (parallel Agent tool calls). Wait for all three to complete before evaluating results. Every reviewer prompt must include the original user requirements verbatim — reviewers catch requirement deviations, not just technical issues.
Every issue from Critic A and Critic B must carry exactly one code:
| Code | Meaning | Effect |
|---|---|---|
| REJECT | Would make the change wrong, unsafe, or contradictory | Triggers gate re-run after fix |
| CONDITIONAL | Fix needed, but obvious/trivial enough to trust without re-review | Must be fixed; no re-run needed |
| NIT | Soft recommendation | May be ignored |
Both critics tag every issue per the severity codes table above.
Emit only issues affecting correctness, safety, or fidelity to the concrete text. Interface contract fulfillment — does every interface implementation actually work, not just compile? Polish and taste items are NITs at most.
Different agent from Critic A.
Focus — adversarial, long-term lens:
<language>-coding-style skill. Does the diff follow naming, error handling, structure, and idiom conventions?Emit only issues that matter for long-term health. "Would refactor eventually" is not an issue — "will cause bugs or confusion within 3 months" is.
Code/debugging tasks only. Skip for non-code tasks (docs, config, design).
Collect results from all three agents. Apply severity logic:
Gate retry and cycle limits defined in Escalation table.
Clean pass = zero REJECTs + zero CONDITIONALs + E2E pass, all from the same gate run.
A FRESH agent — not any of the cycle agents — gets one chance to break the loop before escalating to the user.
One loop-breaker invocation per change, regardless of trigger. If the granted retry fails → hard escalate to user.
| Decision | Meaning | Effect |
|---|---|---|
| ACCEPT | Remaining issues are cosmetic, speculative, or not worth another iteration | Accept current state with reasoning. Gate passes. |
| RETRY | Remaining issues are real and fixable | Grant exactly one more attempt (gate retry or full cycle, matching the trigger). Provide specific guidance. |
Single decision table for all limit hits. One loop-breaker per change total.
| Trigger | Condition | Action | If retry fails |
|---|---|---|---|
| Gate retry cap | 3 gate retries failed within one cycle | Invoke loop-breaker (if not yet used for this change) | Hard escalate to user |
| Cycle limit | 3 full cycles failed for one change | Invoke loop-breaker (if not yet used for this change) | Hard escalate to user |
| Loop-breaker already used | Either limit hit but loop-breaker was consumed by prior trigger | Skip loop-breaker → hard escalate to user immediately | — |
Hard escalate = report to user with: (a) original problem, (b) what each cycle tried, (c) loop-breaker's assessment (if invoked), (d) last blocking issue, (e) next-best alternative from explorer's ranking. Silent punts forbidden.
Cycle limit defined in Escalation table (3 full cycles per change).
| Symptom | Fix |
|---|---|
| Implementing 2+ changes before re-critiquing | Stop. One at a time |
| "Good enough" at cycle 3 | Invoke loop-breaker, don't settle or force |
| Any two of {explorer, Step 2 critic, implementer, Critic A, Critic B, E2E agent, loop-breaker} are the same agent | Banned. Up to seven distinct agents (six per normal cycle + loop-breaker at limits) |
| Skipping E2E inside loop | E2E is part of the review gate — runs every iteration, not at the end |
| Skipping exploration or critique for later iterations | Every iteration runs all four steps — none are optional |
| Winner lacks concrete text | Critic under-specified. Re-spawn with "concrete text required" |
| No rejected list in Step 2 | Critic is not adversarial. Re-spawn |
| Skill | Difference |
|---|---|
superpowers:brainstorming | Explores user intent before design. This skill explores solutions after intent is clear. |
agent-teams-execution | Full multi-role pipeline for large builds. This skill is the medium-task pattern (explore → critique → implement → parallel review gate). Borrow its Snitch rubber-stamp check: critic citing zero issues beyond producer's self-reports = re-spawn with harsher prompt. |
superpowers:systematic-debugging | For diagnosing a known bug. This skill is for open-ended improvement/design research. |
proof-driven-development | Proves correctness of logic. This skill selects which logic to build. |