Launch and manage ralph-orchestrator planner-builder-reviewer loops for autonomous multi-step implementation. Use this skill whenever the user says "ralph loop", "ralph orchestrate", "ralph run", wants to delegate work to a plan/build/review cycle, mentions phase plans, wants to configure loop iterations (max activations), hat workflows, cost budgets, or guardrails. Also trigger when the user asks to "orchestrate", "delegate to ralph", "launch a loop", "reduce max to N", or references the planner/builder/reviewer pattern. Covers project setup, spec writing, tmux launch, loop monitoring, steering, and ceremony.
Configure and launch ralph-orchestrator. Install: cargo install --git https://github.com/mikeyobrien/ralph-orchestrator ralph-cli
.kiro/specs/ # specs (committed)
NN-<name>/
requirements.md # input: the spec AND the ralph prompt (-P)
design.md # output: archived scratchpad after loop
progress.txt # output: timestamped task log
steering/ # optional: product vision, tech decisions
.ralph/ # orchestrator (gitignored runtime state)
ralph.yml # config (core.specs_dir → .kiro/specs)
hats/greenfield.yml # hat definitions with ceremony instructions
agent/scratchpad.md # hat handoff channel (ephemeral)
agent/memories.md # persistent constraints (seed with ceremony rules)
Rules: No root pollution. requirements.md IS the prompt — no separate prompt.md. design.md is an output, not an input. .kiro/specs/ is canonical — migrate legacy locations. Number spec dirs sequentially (01-, ).
02-ralph loops list # existing loops?
tmux has-session -t ralph-$NAME 2>/dev/null && echo "running"
ls .ralph/loop.lock 2>/dev/null # stale lock?
If running: monitor, don't start another. If stale lock with no session: rm .ralph/loop.lock.
work.start → [planner] → plan.ready → [builder] → build.done → [reviewer]
↑ ↑ │
│ └── work.resume ───────────┤ (code bug)
└──────── replan ────────────────────────────────────┘ (plan wrong)
│
LOOP_COMPLETE (verified)
Hats communicate via scratchpad (task breakdown + acceptance criteria), memories (persistent constraints), and event payloads. Sonnet MUST NOT be reviewer — it ignores event constraints and causes stale loops. Use Opus for planner and reviewer; Opus or Sonnet for builder.
SPEC="<spec-name>"
tmux new-session -d -s "$NAME" -c "$PROJECT_PATH"
tmux send-keys -t "$NAME:0" "ralph run -q -c .ralph/ralph.yml -H .ralph/hats/greenfield.yml -P .kiro/specs/$SPEC/requirements.md > .ralph/run.log 2>&1" Enter
tmux split-window -v -t "$NAME:0" -c "$PROJECT_PATH" -l 80%
tmux send-keys -t "$NAME:0.1" "bash .ralph/monitor.sh" Enter
tmux resize-pane -t "$NAME:0.0" -y 3
Monitor: .ralph/monitor.sh (copy from scripts/monitor.sh in this skill) or ralph tui (needs real TTY, attach directly). Set up CronCreate every 3 minutes to poll events + commits. Auto-cancel when loop terminates.
For Codex, periodic Ralph monitoring can be implemented by reusing a watcher sub-agent as a timer-like harness.
Recommended Codex watch cycle:
progressing, stuck, failed, or needs-input.Use this when Codex is the meta-orchestrator and no built-in timer tool is available. If another frontend provides a native timer tool, prefer that simpler primitive over a sub-agent watch loop.
| Action | Command |
|---|---|
| Loop status | ralph loops list |
| Events | ralph events --last 10 |
| View plan | cat .ralph/agent/scratchpad.md |
| Stop | touch .ralph/stop-requested |
| Resume | ralph run --continue |
| Steer | ralph wave emit human.guidance --payloads "msg" |
| Validate | ralph preflight -c .ralph/ralph.yml -H .ralph/hats/greenfield.yml |
| Dry run | ralph run --dry-run -c .ralph/ralph.yml -H .ralph/hats/greenfield.yml -P .kiro/specs/<name>/requirements.md |
Human-authored, variable-depth. The planner fills gaps; never overrides what the human specified.
Thin (domain is well-understood):
Rework Docker/Splunk infra from the devcontainer template.
Splunk 10.2.1+, Python 3.12. Ports: 18000/18088/18089.
Accept when: `docker compose config` validates and all 7 checks pass.
Detailed (high stakes, specific opinions):
### Requirement 1: Strategy wiring
1. WHEN scene JSON has `"strategy": "rule-based"`, THE Bot SHALL use RuleBasedStrategy
2. WHEN strategy is omitted, THE Bot SHALL default to RuleBasedStrategy
3. WHEN unknown strategy specified, THE ScenePlayer SHALL raise ScenePlayerError
Include: code deliverables, infrastructure tasks, handoff artifacts, verification requirements. Claude can help draft — but should not author unilaterally.
human writes requirements.md → ralph run -P requirements.md
↓
planner → builder → reviewer loop
↓
ceremony: archive scratchpad → design.md
write progress.txt
↓
update PROJECT_PLAN.md → next spec
One spec = one loop. Only write requirements.md before launching. Archive after loop. Migrate legacy spec locations on sight.
Each full plan→build→review cycle costs 3 iterations minimum. Budget accordingly:
iterations_needed = (num_phases × 3) + retries_buffer
| Spec shape | Tasks | Recommended max_iterations |
|---|---|---|
| Single phase, 1-3 tasks | 1-3 | 6 |
| Two phases, 3-5 tasks | 3-5 | 9-12 |
| Three phases, 5-8 tasks | 5-8 | 12-15 |
| Complex multi-phase | 8+ | 15-20 or split into multiple specs |
Common budget killers:
phase.next → replanning) — costs 1 iteration eachTask merging rule: If tasks are sequential dependencies within the same phase (A feeds B feeds C), merge them into one task. Three separate build→review cycles for dependent work burns 9 iterations; one batched cycle burns 3.
Continuation loops: When a loop terminates at max_iterations with remaining tasks, write a new spec referencing prior commits. Seed the scratchpad/memories with what's already done. Budget only for the remaining work.
Create a hat file per spec (e.g. .ralph/hats/05-parity.yml), never edit the
greenfield template in-place. Choose the right hat topology for the work:
work.start → [planner] → plan.ready → [builder] → build.done → [reviewer]
Use when: the plan is uncertain and may need revision based on review findings.
The reviewer can emit replan to change direction. Best for exploratory work
and specs with 4+ tasks.
work.start → [planner+builder] → build.done → [reviewer]
Use when: the task requires deep research that the builder needs in-context. The planner reads the codebase, understands the architecture, and implements in the same session — no lossy scratchpad handoff. Best for: small specs (1-3 tasks) where context is critical. Warning: combining roles on large specs causes the first activation to absorb too much context and stall. Prefer 3-hat with phase-scoped planning for specs with 4+ tasks.
work.start → [planner] → plan.ready → [builder+reviewer]
Use when: tasks have clear pass/fail criteria and the builder should self-verify. Saves an iteration per task by skipping the separate review cycle. The builder runs tests, lint, dryrun before emitting. Best for: well-defined tasks with mechanical acceptance criteria (all tests pass, dryrun clean, files exist).
For specs with many tasks, instruct the planner to plan only the next 2-3
tasks per activation — not the entire spec. This prevents the planner from
spending a full iteration reading the whole codebase upfront. The planner
advances phase by phase as the reviewer emits phase.next.
The reviewer must run verification commands itself. It should never reject
because the build.done payload lacks formatted evidence. If the reviewer
wants to verify coverage, lint, or dryrun — it runs them. It decides based
on its own command output, not on what the builder reported.
-q mode in tmuxevent_loop:
starting_event: "work.start"
completion_promise: "LOOP_COMPLETE"