4-phase execution loop for work units - IMPLEMENT, VALIDATE, ADVERSARIAL REVIEW, COMMIT
Core principle: Trust nothing. Verify everything. Review adversarially.
This skill defines a generalized 4-phase execution loop that any orchestrator can invoke when implementing work units. It replaces linear "implement then review" flows with a rigorous cycle that independently validates results and adversarially reviews against a written spec contract.
This skill is mode-agnostic — the 4-phase execution loop works identically in both Task Mode and Team Mode. The differences:
Task() per work unit.Task() instance in BOTH modes. Never a teammate, never resumed.See ./guides/agent-coordination.md for full mode detection and coordination details.
After drafting an implementation plan (Step 1: Plan Validation), submit it to the Plan Review Gate before presenting to the user. The gate spawns 3 adversarial reviewers (Feasibility, Completeness, Scope & Alignment) — all must PASS. See skills/plan-review-gate/SKILL.md for details.
Do NOT use for: Single-file bug fixes, copy changes, or tasks without a spec.
Before submitting a plan to the Design Review Gate, the orchestrator MUST verify every item on this checklist. This prevents expensive design review cycles on fundamentally broken plans.
Note: The
Plansubagent type cannot write files (it has read-only access by design). If you spawn an Architect as a Plan subagent, it will return the plan as text in its response. The orchestrator must write the plan toPLAN.mditself.
If the plan includes HTTP endpoints or WebSocket protocols, verify:
If the plan includes a user interface:
.env.example includes all required env varsIf any checklist item fails, fix the plan BEFORE submitting to Design Review Gate.
Every plan submitted for Design Review MUST include these sections:
### POST /api/todos
- **Request Body**: `{ title: string }` (required, 1-500 chars, trimmed)
- **Success**: `201 Created` -> `{ id, title, completed, createdAt, updatedAt }`
- **Errors**: `400` (validation) / `500` (internal)
A work unit is the atomic unit of orchestrated execution. Before entering the 4-phase loop, decompose the implementation plan into work units.
Each work unit contains:
| Field | Description | Example |
|---|---|---|
| ID | Unique identifier (BEADS task ID) | bd-wu-001 |
| Title | Human-readable name | "Implement auth middleware" |
| Spec | Written specification with acceptance criteria | Link to design doc section |
| DoD Items | Enumerated, verifiable done criteria | [ ] Middleware rejects expired tokens |
| Dependencies | Other work units that must complete first | [bd-wu-000] |
| File Scope | Files this work unit may touch | src/middleware/auth.ts, src/middleware/auth.test.ts |
| Human Checkpoint | Whether to pause for human review after completion | true for risky changes |
Work units form a directed acyclic graph (DAG):
wu-001 (schema changes) ───┐
├──→ wu-003 (API endpoints) ───→ wu-005 (integration tests)
wu-002 (shared utilities) ──┘ │
▼
wu-004 (UI components) ────────────────────────────────────→ wu-006 (e2e tests)
Rules for decomposition:
# Create work units as BEADS tasks under the epic
bd create "WU-001: <title>" --type task --parent <epic-id> \
--description "Spec: <spec-section>\nDoD:\n- [ ] <item-1>\n- [ ] <item-2>\nFile scope: <files>\nCheckpoint: <yes/no>"
# Set up dependencies
bd dep add <wu-003> <wu-001>
bd dep add <wu-003> <wu-002>
For each work unit, execute these four phases in sequence. Do not skip phases. Do not combine phases. Do not proceed to the next phase until the current phase produces a clear outcome.
┌─────────────────────────────────────────────────────────────────┐
│ 4-PHASE EXECUTION LOOP │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────────┐ ┌──────┐ │
│ │ IMPLEMENT│───→│ VALIDATE │───→│ ADVERSARIAL │───→│COMMIT│ │
│ │ │ │ │ │ REVIEW │ │ │ │
│ └──────────┘ └──────────┘ └──────┬───────┘ └──────┘ │
│ ▲ │ │
│ │ FAIL │ │
│ └─────────────────────────────────┘ │
│ │
│ On FAIL: fix → re-validate → FRESH review → max 3 → escalate │
└─────────────────────────────────────────────────────────────────┘
The coding subagent executes against the work unit spec.
Orchestrator actions:
Subagent spawn template:
You are the CODER AGENT for work unit ${wuId}.
## Spec
${spec}
## Definition of Done
${dodItems.map((item, i) => `${i+1}. ${item}`).join('\n')}
## File Scope
You may ONLY modify these files: ${fileScope.join(', ')}
## Project Context
${projectContext}
## Rules
- Follow TDD: write failing test first, then implement to make it pass
- Do NOT modify files outside your file scope
- Do NOT self-certify — the orchestrator will validate independently
- When complete, report what you changed and what tests you added
- NEVER use --no-verify on git commits — pre-commit hooks are mandatory
- NEVER use git push --force
- NEVER suppress linter/type errors with eslint-disable, @ts-ignore, or as any
- NEVER skip tests or claim "tests pass" without actually running them
Phase 1 output: List of changed files and new tests.
The orchestrator independently runs quality gates. Never trust subagent self-reports.
Orchestrator actions (run these yourself, NOT via the coding subagent):
# 1. Type checking
npx tsc --noEmit
# 2. Linting
npx eslint <changed-files>
# 3. Run tests (full suite, not just new tests)
npx vitest run
# 4. Coverage enforcement (BLOCKING — read .coverage-thresholds.json)
# If .coverage-thresholds.json exists, read the enforcement command and run it
# This is NOT optional. Coverage below threshold = VALIDATION FAIL.
if [ -f .coverage-thresholds.json ]; then
CMD=$(node -e "console.log(JSON.parse(require('fs').readFileSync('.coverage-thresholds.json','utf-8')).enforcement.command)")
eval "$CMD"
fi
# 5. Verify file scope was respected
git diff --name-only | while read file; do
echo "$file" # Check each file is within the work unit's declared scope
done
Phase 2 outcomes:
Critical rule: The orchestrator runs validation commands directly. The orchestrator does NOT ask the coding subagent "did the tests pass?" and accept the answer.
A separate review subagent checks the implementation against the spec contract. This is NOT the same as a collaborative code review — it's adversarial verification.
Key differences from collaborative review:
| Collaborative Review | Adversarial Review |
|---|---|
| APPROVED / CHANGES REQUIRED | PASS / FAIL |
| Subjective quality assessment | Binary spec compliance check |
| Reviewer suggests improvements | Reviewer finds contract violations |
| Same reviewer can re-review | Fresh reviewer required on re-review |
Uses code-review-rubric.md | Uses adversarial-review-rubric.md |
Orchestrator actions:
Reviewer spawn template:
You are the ADVERSARIAL REVIEWER for work unit ${wuId}.
## Mode
Adversarial — your job is to FIND FAILURES, not to approve.
## Rubric
Read and follow: ./rubrics/adversarial-review-rubric.md
## Spec
${spec}
## Definition of Done
${dodItems.map((item, i) => `${i+1}. ${item}`).join('\n')}
## What to Review
Run: git diff main..HEAD -- ${fileScope.join(' ')}
## Rules
- Check EACH DoD item. Cite file:line evidence for PASS or expected-vs-found for FAIL.
- Any single BLOCKING issue means overall FAIL.
- You have NO context from previous reviews. Judge fresh.
- Do NOT suggest improvements. Only report PASS or FAIL with evidence.
Phase 3 outcomes:
Fresh reviewer rule: On re-review after FAIL, the orchestrator MUST spawn a new review subagent. Never pass previous findings to the new reviewer. Never reuse the same reviewer instance. This prevents anchoring bias and ensures independent verification.
Only after PASS from adversarial review.
Orchestrator actions:
# Stage only files within the work unit's file scope
git add <file-scope-files>
# Commit with reference to work unit
git commit -m "feat(wu-${wuId}): <description>
DoD items verified:
$(dodItems.map((item, i) => `- [x] ${item}`).join('\n'))
Reviewed-by: adversarial-review (PASS)"
After commit:
bd close <wu-task-id> --reason "4-phase loop complete. PASS."After commit, update SERVICE-INVENTORY.md:
If this work unit created or modified services, factories, database tables, or shared modules, update SERVICE-INVENTORY.md with the new entries. This document is read by subsequent coder agents to avoid duplicating existing services.
Quality gates are BLOCKING STATE TRANSITIONS, not advisory recommendations. The orchestrator CANNOT advance to the next phase without gate passage.
IMPLEMENT ──→ VALIDATE ──→ REVIEW ──→ COMMIT
│ │
↓ ↓
FAIL: FAIL:
fix + re-run fix + re-validate
+ FRESH re-review
│ │
(max 3) (max 3)
│ │
↓ ↓
ESCALATE ESCALATE
(to human) (to human)
.coverage-thresholds.json thresholdsTrack each attempt visibly: "Re-review attempt 1/3", "Re-review attempt 2/3", etc.
When multiple work units have no dependencies on each other, execute them in parallel — but with structured convergence points.
┌──── WU-001: IMPLEMENT ────┐
│ │
Fan-out ──────┼──── WU-002: IMPLEMENT ────┼──── Converge for VALIDATE
│ │
└──── WU-003: IMPLEMENT ────┘
│
┌──── WU-001: REVIEW ────────┤
│ │
Fan-out ──────┼──── WU-002: REVIEW ────────┼──── Sequential COMMIT
│ │
└──── WU-003: REVIEW ────────┘
Rules for parallel execution:
Stale notification handling: When parallel subagent results arrive after the orchestrator has moved past their work unit (e.g., at a checkpoint), acknowledge them briefly in one line. Do NOT print a full "still waiting at checkpoint" block for each stale notification — this clutters the conversation and wastes context window.
The orchestrator MUST maintain a project context document that grows with each work unit. This is passed to every coder subagent to prevent context loss.
# Project Context (Maintained by Orchestrator)
## Tooling
- Package manager: <npm/pnpm/yarn>
- Test runner: <vitest/jest> (<config-file>)
- Linter: <eslint> (<config-file>)
- Build: <vite/webpack/tsc> (<config-file>)
## Completed Work Units
| WU | Title | Key Files | Services Created |
|----|-------|-----------|-----------------|
## Established Patterns
- <pattern-1>: <description>
- <pattern-2>: <description>
## Active Services
See SERVICE-INVENTORY.md
Update rules:
The Project Context Document MUST be written to .beads/context/project-context.md and kept in sync with the in-memory version. This ensures the context survives context compaction and session boundaries.
# Create directory if needed
mkdir -p .beads/context
# Write/update after each Phase 4 (COMMIT) and at orchestration start
# The file should always reflect the current state of execution
When to write:
The file is NOT committed to git during execution — it's a working document. It gets cleaned up after the PR is created (or left for the next session if interrupted).
Approved plans and execution state are persisted to .beads/ so agents can recover after context compaction or session interruption.
| File | Contents | Written When |
|---|---|---|
.beads/plans/active-plan.md | The adversarially-reviewed, user-approved implementation plan | After plan review gate PASS + user approval |
.beads/context/project-context.md | Project Context Document (tooling, completed WUs, patterns) | After each Phase 4 COMMIT |
.beads/context/execution-state.md | Current work unit, phase, retry count | After each phase transition |
After the Plan Review Gate approves a plan AND the user approves it, persist immediately:
mkdir -p .beads/plans
# Write the approved plan with metadata header
cat > .beads/plans/active-plan.md << 'PLAN_EOF'
# Active Plan
<!-- approved: <timestamp> -->
<!-- gate-iterations: <N> -->
<!-- user-approved: true -->
<!-- status: in-progress -->
<full plan text including work unit decomposition, DoD items, file scopes, dependencies>
PLAN_EOF
After each phase transition, update the execution state:
cat > .beads/context/execution-state.md << 'STATE_EOF'
# Execution State
<!-- updated: <timestamp> -->
## Current Position
- Active work unit: <wu-id>
- Current phase: <IMPLEMENT|VALIDATE|REVIEW|COMMIT>
- Retry count: <0-3>
## Work Unit Status
| WU | Status | Phase | Retries |
|----|--------|-------|---------|
| WU-001 | COMPLETE | COMMITTED | 0 |
| WU-002 | IN-PROGRESS | VALIDATE | 1 |
| WU-003 | PENDING | — | 0 |
## Blocked / Escalated
<any blocked or escalated work units with context>
STATE_EOF
When the orchestrator detects it has lost context (after compaction or in a new session), it recovers by reading persisted state:
1. Check: Does `.beads/plans/active-plan.md` exist with `status: in-progress`?
- YES → Context was lost mid-execution. Recover.
- NO → No active execution. Start fresh.
2. Recovery steps:
a. Read `.beads/plans/active-plan.md` — reload the approved plan
b. Read `.beads/context/project-context.md` — reload completed work and patterns
c. Read `.beads/context/execution-state.md` — find where execution stopped
d. Run `bd prime --work-type recovery` — reload relevant knowledge base facts
e. Resume from the current work unit and phase
3. Announce recovery to user:
"Recovered execution context from BEADS. Resuming from WU-<id>, Phase <phase>."
When to trigger recovery: The orchestrator should check for .beads/plans/active-plan.md at the start of any orchestrated execution. If the file exists with status: in-progress and the orchestrator has no plan in its current context, it's a recovery scenario.
After the PR is created (or the plan is abandoned):
# Mark plan as completed (cross-platform sed)
if [[ "$OSTYPE" == "darwin"* ]]; then
sed -i '' 's/status: in-progress/status: completed/' .beads/plans/active-plan.md
else
sed -i 's/status: in-progress/status: completed/' .beads/plans/active-plan.md
fi
# Archive execution state (don't delete — useful for post-mortem)
mv .beads/context/execution-state.md .beads/context/execution-state-<timestamp>.md
# Project context can be kept for reference
Human checkpoints are planned pauses, not reactive escalations. They are defined in the spec before execution begins.
When reaching a checkpoint, present this report and wait for explicit human approval:
## Checkpoint: <checkpoint-name>
### Completed Work Units
| WU | Title | Status | Review |
| --- | --- | --- | --- |
| WU-001 | Schema migration | PASS | Adversarial PASS |
| WU-002 | Service layer | PASS | Adversarial PASS |
### Key Decisions Made
- <decision-1>: <rationale>
- <decision-2>: <rationale>
### What Comes Next
- WU-003: <description>
- WU-004: <description>
### Questions for Human (if any)
- <question>
---
**Action required**: Reply to continue, or provide feedback to adjust course.
Do NOT continue past a checkpoint without human response. This is not a notification — it's a gate.
After ALL work units are complete and committed, run a final comprehensive review across the entire change set. This catches cross-unit integration issues that per-unit reviews miss.
# 1. Combined diff — see the full picture
git diff main..HEAD
# 2. Full test suite — not just changed files
npx vitest run
# 3. Type check — catch cross-unit type conflicts
npx tsc --noEmit
# 4. Lint — catch cross-unit style issues
npx eslint .
# 5. Coverage — verify overall coverage thresholds
if [ -f .coverage-thresholds.json ]; then
CMD=$(node -e "console.log(JSON.parse(require('fs').readFileSync('.coverage-thresholds.json','utf-8')).enforcement.command)")
eval "$CMD"
fi
# 6. Commit history — verify clean, logical commits
git log main..HEAD --oneline
## Final Comprehensive Review
### Overall Verdict: PASS / FAIL
### Work Units Summary
| WU | Title | Impl | Validate | Review | Commit |
| --- | --- | --- | --- | --- | --- |
| WU-001 | <title> | Done | Pass | Pass | <sha> |
| WU-002 | <title> | Done | Pass | Pass | <sha> |
### Quality Gates
- [ ] All tests pass
- [ ] Type check clean
- [ ] Lint clean
- [ ] Coverage thresholds met
- [ ] No cross-unit integration issues
### Remaining Issues
<any issues found during final review>
### Ready for PR: YES / NO
After the final comprehensive review passes but BEFORE creating the PR, run /self-reflect to extract learnings into the knowledge base. This captures implementation insights, debugging discoveries, and architectural decisions while context is freshest — not deferred to post-merge when details have faded.
## Pre-PR Knowledge Capture
Final review PASSED. Before creating the PR, extracting learnings...
/self-reflect
Learnings captured: [N] items added to knowledge base.
Committing knowledge base updates...
Proceeding to PR creation.
Why before PR, not after merge? By the time a PR is merged, the implementing agent's context may be gone (session ended, context compacted). The richest insights — why a certain approach was chosen, what debugging dead-ends were hit, which patterns emerged — exist NOW, immediately after implementation. Capture them now.
Knowledge base changes are part of the PR. After self-reflect updates the knowledge base files, commit them alongside the implementation. This ensures learnings are reviewed as part of the PR and land atomically with the code that generated them. Do NOT defer knowledge base commits to a separate PR or post-merge step.
When things go wrong during the 4-phase loop, follow this structured recovery.
Identify what failed and gather evidence:
# Capture the failure
# - Which phase failed? (IMPLEMENT, VALIDATE, REVIEW)
# - What was the error message or FAIL reason?
# - Which DoD items are affected?
Categorize the failure:
| Classification | Description | Action |
|---|---|---|
| Fixable | Clear error, known fix | Retry with specific fix instructions |
| Ambiguous | Unclear root cause | Investigate before retrying |
| External | Dependency, access, or environment issue | Escalate immediately |
For fixable and ambiguous failures:
Track retry count:
bd label add <task-id> retry:1 # or retry:2, retry:3
After 3 failed attempts, escalate to human with full context:
## Escalation: Work Unit <wu-id> Failed After 3 Attempts
### Failure History
| Attempt | Phase | Error | Fix Tried |
| --- | --- | --- | --- |
| 1 | VALIDATE | Tests fail: auth.test.ts:34 | Fixed mock setup |
| 2 | REVIEW | DoD #3 not met: missing edge case | Added edge case test |
| 3 | VALIDATE | Type error in cross-module import | Restructured imports |
### Root Cause Assessment
<best understanding of why this keeps failing>
### Options
1. <option-1>
2. <option-2>
3. Abandon this work unit and restructure
### Recommendation
<which option and why>
These are explicit DON'Ts. Violating any of these undermines the entire orchestration pattern.
| # | Anti-Pattern | Why It's Wrong | What to Do Instead |
|---|---|---|---|
| 1 | Self-certifying — coding subagent says "tests pass" and you believe it | Subagents can hallucinate, skip tests, or misinterpret results | Orchestrator runs validation commands independently |
| 2 | Skipping adversarial review — "the code looks fine, let's commit" | Visual inspection misses spec violations; confirmation bias | Always run adversarial review against DoD items |
| 3 | Reusing a reviewer — same subagent re-reviews after FAIL | Anchoring bias: reviewer remembers previous findings and checks for those specifically instead of reviewing fresh | Spawn a new reviewer instance with no prior context |
| 4 | Passing previous findings to new reviewer — "last reviewer found X, check if fixed" | Creates anchoring bias; new reviewer should find issues independently | Pass only: spec, DoD items, diff. Nothing about previous reviews |
| 5 | Trusting subagent file scope claims — "I only changed the files in scope" | Subagents may accidentally modify files outside scope | Run git diff --name-only and verify each file independently |
| 6 | Combining phases — "implement and validate in one step" | Removes the independence that makes validation meaningful | Run each phase as a distinct step with its own output |
| 7 | Continuing past a checkpoint without human response | Defeats the purpose of proactive checkpoints | Wait. If urgent, escalate — don't skip |
| 8 | Skipping final comprehensive review — "all units passed individually" | Per-unit reviews can't catch cross-unit integration issues | Always run the final review after all units are committed |
| 9 | Skipping coverage enforcement — "tests pass, coverage doesn't matter" | Coverage thresholds exist for a reason; low coverage means untested paths | Read .coverage-thresholds.json and run the enforcement command. Block on failure. |
| 10 | Building UI components in isolation — all components tested but never wired into the app | Users can't interact with components that aren't rendered | Plan must include integration WUs that wire components into the app shell |
| 11 | Proceeding without external credentials — building features that require API keys without verifying the user has them | Features will fail at runtime; user discovers this after 10+ commits | Checkpoint before external-service WUs to verify credentials are configured |
| 12 | Advisory quality gates — treating FAIL as a suggestion rather than a blocking transition | Undermines the entire trust model; equivalent to skipping the gate | Quality gates are state transitions. FAIL means retry or escalate, never skip. |
| 13 | Using --no-verify — bypassing pre-commit hooks on git commits | Pre-commit hooks catch lint errors, type errors, and formatting issues before they enter history | Never use --no-verify. Fix the underlying issue instead. |
| 14 | Skipping design review gate after brainstorming — going directly from brainstorming to writing-plans | Expensive implementation work begins on unreviewed designs | Always run the 5-agent design review gate between brainstorming and planning |
| 15 | Skipping plan review gate — presenting a plan to the user without adversarial review | Plans with feasibility gaps, missing requirements, or scope creep reach implementation | Always run the 3-reviewer plan review gate before presenting any plan |
Before execution (Plan Validation):
For each work unit:
git diff --name-onlyQuality Gate Rules:
After all work units:
/self-reflect to capture learnings (BEFORE PR creation)