Use when executing phased implementation that needs build verification, testing, and multi-perspective review before each commit. Triggers on multi-phase plans, protocol fixes, platform builds, or any work where sub-agents implement and reviewers gate quality. Not for single-file edits or quick fixes. Supersedes executing-plans and subagent-driven-development when platform-native review is needed alongside general code review.
Orchestrate phased implementation using sub-agents for research, testing, coding, and review — with hard gates between phases. The orchestrator delegates, coordinates, and fixes. Sub-agents execute.
If you don't have clear requirements for the work — what to build, what "done" looks like, what constraints apply — offer to interview the user first:
"I can start planning, but I'd get better results with more context. Want me to interview you about [the domain] first? I'll ask detailed questions about requirements, edge cases, and tradeoffs, then write a spec we can plan from."
If they say yes, conduct an in-depth interview: dig into technical details, UI/UX considerations, edge cases, security, performance, integration points. Don't accept vague answers. When done, write the spec to a file and use it as input to the phase planning.
If they say no or the requirements are already clear (existing spec, detailed issue, prior conversation context), skip straight to phase planning.
digraph build_loop {
rankdir=TB;
node [fontsize=11];
"Research" [shape=box, label="1. RESEARCH\nExplore agent reads scope"];
"Test First" [shape=box, label="2. TEST FIRST\nAgent writes failing tests"];
"Implement" [shape=box, label="3. IMPLEMENT\nAgent writes code to pass tests"];
"Build Gate" [shape=diamond, label="4. BUILD?"];
"Test Gate" [shape=diamond, label="5. TESTS?"];
"Reviews" [shape=box, label="6-7. DUAL REVIEW\nNative + Code (parallel)"];
"Issues?" [shape=diamond, label="CRITICAL\nor HIGH?"];
"Fix" [shape=box, label="8. FIX\nOrchestrator fixes"];
"Commit" [shape=box, label="9. COMMIT"];
"Next" [shape=doublecircle, label="NEXT PHASE"];
"Research" -> "Test First";
"Test First" -> "Implement";
"Implement" -> "Build Gate";
"Build Gate" -> "Test Gate" [label="pass"];
"Build Gate" -> "Implement" [label="fail"];
"Test Gate" -> "Reviews" [label="pass"];
"Test Gate" -> "Implement" [label="fail"];
"Reviews" -> "Issues?";
"Issues?" -> "Fix" [label="yes"];
"Issues?" -> "Commit" [label="no"];
"Fix" -> "Build Gate";
"Commit" -> "Next";
}
Before starting any phase, declare:
Phase: [name]
Goal: [one sentence]
Build: [exact command]
Test: [exact command]
Native reviewer: [skill or agent type]
Code reviewer: [skill or agent type]
Scope: [files to touch]
This prevents scope creep and gives sub-agents clear boundaries.
Every sub-agent dispatch includes:
Omitting any of these produces bad results. 30 seconds writing a complete prompt saves 5 minutes re-dispatching.
Dispatch an Explore agent to read files in scope and report: what exists, what's broken, what's missing. Always run this even when the code seems familiar — sub-agents start with zero context and will guess at types without it.
Dispatch an agent to write failing tests BEFORE implementation. The agent receives research findings, test file locations, and the test framework.
For protocol/networking: test exact wire format, mock expected server responses, cover error cases. For UI: test state transitions, view hierarchy expectations, accessibility. For logic: test boundary conditions and the specific bug or feature being addressed.
Tests must fail. If they pass, the behavior already exists (skip implementation) or the tests are wrong.
Dispatch an agent to write minimal code making the tests pass. The agent receives: failing test file paths, allowed file scope, any protocol/API reference, and the build command.
The agent runs the build command before returning. If build fails, it fixes until clean.
Run the build command yourself (don't trust the sub-agent's report alone). If it fails, send the error output back to step 3. Build failures in unrelated files that predate this phase can be noted and skipped if they don't affect the phase scope.
Run the test command. All tests must pass — both new tests from step 2 and any pre-existing tests. If failures occur, send the output back to step 3.
Dispatch a platform-specialist review agent (read-only). Choose based on domain:
| Domain | Reviewer |
|---|---|
| iOS/Swift | apple-swift-language-expert, apple-swiftui-mastery, apple-networking-apis |
| macOS | apple-macos-ux-full, apple-architecture-patterns |
| Web frontend | design-maestro, nextjs-app-router-architecture |
| General | code-reviewer-guardian |
The reviewer receives: phase goal, diff, spec/protocol reference, and instructions to rate findings by severity.
Dispatch a general code quality reviewer in parallel with step 6. This agent checks correctness, error handling, memory management, thread safety, security. Same severity system. Read-only.
Severity scale (P0–P3):
The orchestrator reads both review reports and fixes P0 and P1 issues. Do not delegate fixes — only the orchestrator has seen both reports and the full phase context. After fixing, loop back to step 4 (build gate).
P2 and P3 findings are noted but do not block the phase.
Once build passes, tests pass, and no P0/P1 issues remain: commit with a descriptive message, update task tracking, move to next phase.
If a sub-agent fails (context overflow, can't fix build, returns garbage):
Both reviewers should cover relevant dimensions from this list. Not every dimension applies to every phase — pick what's relevant.
Code dimensions:
| Dimension | What to check |
|---|---|
| Correctness | Logic errors, protocol mismatches, wrong behavior |
| Security | Credential handling, injection, auth bypass |
| Concurrency | Data races, actor isolation, Sendable compliance |
| Performance | N+1 queries, unnecessary allocations, blocking main thread |
| Error handling | Missing catches, silent failures, crash paths |
| Platform idioms | Native patterns vs fighting the framework |
UI/Design dimensions (when phase touches views or styling):
| Dimension | What to check |
|---|---|
| Visual hierarchy | Focal points, information flow, heading/body/meta distinction |
| Interaction states | Default, hover, active, focus, disabled all present |
| Accessibility | VoiceOver labels, touch targets (44pt min), WCAG AA contrast, dynamic type |
| Responsiveness | Phone/tablet layouts, safe areas, keyboard handling |
| Loading/empty/error | Every async view has all three states, not just the happy path |
| Design token usage | Colors from theme, not hardcoded hex; spacing from scale, not magic numbers |
| Parameter | iOS | Web | Python |
|---|---|---|---|
| Build | xcodebuild ... build | npm run build | python -m py_compile |
| Test | xcodebuild ... test | npm test | pytest |
| Native reviewer | apple-swift-language-expert | design-maestro | — |
| Code reviewer | code-reviewer-guardian | code-reviewer-guardian | code-reviewer-guardian |