Contract-driven build loop with independent builder and evaluator agents. Mandatory 5-round verification per slice. Use for medium/large product work spanning 3+ files.
Two independent agents. They NEVER share context. They communicate ONLY through files.
Builder Agent ──writes code──> project files
docs/contract.md
Evaluator Agent ──reads──> project files + docs/contract.md
──writes──> docs/eval-round-N.md
──runs──> scripts/verify.sh, tests, curl, browser
Builder Agent ──reads──> docs/eval-round-N.md
──fixes code──> project files
The orchestrator (you) coordinates. You do NOT build or evaluate. You launch agents and pass files.
docs/spec.md from user request. Get user sign-off.docs/contract.md with:
Every slice runs EXACTLY 5 rounds. No exceptions. No early exit.
For round N (1 through 5):
Step A — Launch Builder Agent:
Use the Agent tool with this prompt:
"You are a Builder. Read docs/contract.md and docs/eval-round-{N-1}.md (if exists).
Build or fix the code to satisfy the contract and address all eval feedback.
Do NOT evaluate your own work. Do NOT claim it works. Just write code.
When done, commit with message 'build: round N'."
Step B — Launch Evaluator Agent (SEPARATE agent, fresh context):
Use the Agent tool with this prompt:
"You are an Evaluator. You have NEVER seen the builder's reasoning.
Read docs/contract.md. Then verify the build:
1. Run scripts/verify.sh — report full output
2. Run the project test suite — report full output
3. Execute each verification step from the contract LITERALLY
4. Try to break things: empty inputs, wrong types, rapid actions, missing auth,
mobile viewport, concurrent requests
5. Check rubrics/ for domain-specific quality gates
Write docs/eval-round-N.md with:
- PASS or FAIL for each verification step
- Exact error messages and reproduction steps for failures
- Red-team findings (issues not in contract but found by trying to break things)
- Overall verdict: PASS / FAIL
Be harsh. Your job is to find what's wrong, not confirm what's right.
Round 1-2: finding real bugs is expected.
Round 3-4: finding edge cases and polish issues.
Round 5: finding anything at all means the builder needs more work."
Step C — Record round in docs/contract.md verification log:
Round N: [PASS/FAIL] — [one-line summary of eval findings]
Repeat A→B→C for all 5 rounds.
docs/qa-report.md — final state of all verification stepsdocs/handoff.md — what changed, what's next| Round | Evaluator focus |
|---|---|
| 1 | Does it exist and basically function? Run contract verification steps. |
| 2 | Fix round 1 failures. Check error handling, input validation. |
| 3 | Edge cases: empty data, huge data, wrong types, concurrent access. |
| 4 | Polish: mobile viewport, accessibility, performance, loading states. |
| 5 | Adversarial: try to break auth, bypass permissions, corrupt data. If nothing breaks, the slice is solid. |