Execute all plans in a phase with wave-based parallelization
Codex shell compatibility:
gpd on PATH.GPD_ACTIVE_RUNTIME=codex uv run gpd ....
</codex_runtime_notes>Orchestrator stays lean: discover plans, analyze dependencies, group into waves, spawn subagents, collect results. Each subagent loads the full execute-plan context and handles its own plan.
Execution scope: Each plan may involve any combination of:
Context budget: ~15% orchestrator, 100% fresh per subagent. </objective>
<execution_context>
<core_principle> Orchestrator coordinates, not executes. Each subagent loads the full execute-plan context. Orchestrator: discover plans -> analyze deps -> group waves -> spawn agents -> handle checkpoints -> collect results -> validate physics. </core_principle>
<required_reading>
Read STATE.md before any operation to load project context.
For agent selection strategy and verification failure routing, see @./.codex/get-physics-done/references/orchestration/meta-orchestration.md.
</required_reading>
INIT=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local init execute-phase "${PHASE_ARG}")
if [ $? -ne 0 ]; then
echo "ERROR: gpd initialization failed: $INIT"
exit 1
fi
Parse JSON for: executor_model, verifier_model, commit_docs, autonomy, review_cadence, research_mode, parallelization, max_unattended_minutes_per_plan, max_unattended_minutes_per_wave, checkpoint_after_n_tasks, checkpoint_after_first_load_bearing_result, checkpoint_before_downstream_dependent_tasks, verifier_enabled, branching_strategy, branch_name, phase_found, phase_dir, phase_number, phase_name, phase_slug, plans, incomplete_plans, plan_count, incomplete_count, state_exists, roadmap_exists, project_contract, contract_intake, effective_reference_intake, active_reference_context, reference_artifacts_content, selected_protocol_bundle_ids, protocol_bundle_context.
If phase_found is false: Error -- phase directory not found.
If plan_count is 0: Error -- no plans found in phase.
If state_exists is false but .gpd/ exists: Offer reconstruct or continue.
Treat project_contract as the authoritative machine-readable execution contract when present.
Treat effective_reference_intake as the carry-forward anchor ledger for refs, baselines, prior outputs, and unresolved context gaps.
Use active_reference_context and reference_artifacts_content to interpret that ledger, not to replace it with markdown-only guesses.
When parallelization is false, plans within a wave execute sequentially.
Mode-aware behavior:
autonomy=supervised: Pause for user confirmation before each wave. Show the plan summary and wait for approval.autonomy=balanced (default): Execute waves automatically and pause only if errors, ambiguities, or scope-changing decisions arise at a wave boundary.autonomy=yolo: Execute all waves without user prompts on clean passes. Do NOT skip required correctness gates, first-result sanity checks, skeptical review stops, or anchor-gated fanout reviews. A clean pass may auto-continue only after the gate is explicitly cleared.research_mode=explore: Favor thoroughness — always run verification, expand context budget.research_mode=exploit: Favor speed — skip optional research steps, tighter context budget, but never skip required first-result, skeptical, or pre-fanout review gates.research_mode=adaptive: Start with explore-style coverage, then narrow only after prior decisive contract_results, decisive comparison_verdicts, or an explicit approach lock show that the method family is stable. Do NOT narrow just because a wave advanced or one proxy passed.review_cadence: Controls when bounded review gates appear. autonomy controls who must approve or inspect those gates. These are separate axes.
</step>
"none": Skip, continue on current branch.
"per-phase" or "per-milestone": Use pre-computed branch_name from init:
git checkout -b "$BRANCH_NAME" 2>/dev/null || git checkout "$BRANCH_NAME"
All subsequent commits go to this branch. User handles merging. </step>
PHASE_GOAL=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local roadmap get-phase "${phase_number}" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .goal --default "")
PLAN_OBJECTIVES=""
for plan in "$phase_dir"/*-PLAN.md; do
OBJ=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local frontmatter get "$plan" --field objective 2>/dev/null)
PLAN_OBJECTIVES="${PLAN_OBJECTIVES} ${OBJ}"
done
PHASE_TEXT="${PHASE_GOAL} ${PLAN_OBJECTIVES}"
Classify using keyword matching (a phase may have multiple classes):
PHASE_CLASSES=()
echo "$PHASE_TEXT" | grep -qiE "derive|prove|show that|analytical|closed.form|exact result" && PHASE_CLASSES+=("derivation")
echo "$PHASE_TEXT" | grep -qiE "simulat|compute|discretiz|grid|convergence|benchmark|finite.element|Monte Carlo|numerical" && PHASE_CLASSES+=("numerical")
echo "$PHASE_TEXT" | grep -qiE "survey|review|compare approaches|what is known|prior work|literature" && PHASE_CLASSES+=("literature")
echo "$PHASE_TEXT" | grep -qiE "write paper|draft|manuscript|submit|LaTeX" && PHASE_CLASSES+=("paper-writing")
echo "$PHASE_TEXT" | grep -qiE "define|set up framework|establish conventions|Lagrangian|Hamiltonian|action" && PHASE_CLASSES+=("formalism")
echo "$PHASE_TEXT" | grep -qiE "analyz|compare|interpret|extract|fit|scaling" && PHASE_CLASSES+=("analysis")
echo "$PHASE_TEXT" | grep -qiE "verify|cross.check|reproduce|validate|test against" && PHASE_CLASSES+=("validation")
[ ${#PHASE_CLASSES[@]} -eq 0 ] && PHASE_CLASSES+=("mixed")
Log the classification: "Phase ${phase_number} classified as: ${PHASE_CLASSES[*]}"
Use classification for:
agent-infrastructure.md Meta-Orchestration Intelligence > Agent Selection by Phase Type)agent-infrastructure.md Meta-Orchestration Intelligence > Context Budget Allocation)5.5 convergence and 5.14 statistics; validation phases run the full relevant registry)adapt_to_computation_type below)
</step>
# Defaults
CONVENTION_LOCK_REQUIRED=false
PRE_EXECUTION_AGENTS=()
INTER_WAVE_CHECKS=("convention" "dimensional")
EXECUTOR_CONTEXT_HINT="standard"
WAVE_TIMEOUT_FACTOR=1.0
FORCE_SEQUENTIAL=false
YOLO_RESTRICTIONS=()
Per-class overrides (applied cumulatively for multi-class phases):
| Class | Parameter Overrides |
|---|---|
| derivation | CONVENTION_LOCK_REQUIRED=true — refuse to start if conventions unlocked. INTER_WAVE_CHECKS+=("identity_scan") — check for unverified identities between waves. EXECUTOR_CONTEXT_HINT="derivation-heavy" — hint executors to allocate 70% of context to step-by-step work. WAVE_TIMEOUT_FACTOR=1.5 — derivations run longer. YOLO_RESTRICTIONS+=("no_skip_verification") — even in yolo mode, do NOT skip verification for derivation phases (sign errors cost more than the verification). |
| numerical | INTER_WAVE_CHECKS+=("convergence_spot_check") — between waves, scan SUMMARY for convergence metrics and flag regressions. EXECUTOR_CONTEXT_HINT="code-heavy" — hint executors to reserve context for code output and numerical tables. PRE_EXECUTION_AGENTS+=("experiment-designer") — if experiment-designer is enabled, spawn before wave 1 to validate parameter ranges. |
| literature | FORCE_SEQUENTIAL=true — literature plans build on each other's findings; parallel risks redundant searches. EXECUTOR_CONTEXT_HINT="reading-heavy" — hint executors to budget for large literature ingestion. INTER_WAVE_CHECKS=("convention") — skip dimensional checks (no equations). |
| paper-writing | PRE_EXECUTION_AGENTS+=("notation-coordinator") — ensure notation glossary is current before any section drafting. INTER_WAVE_CHECKS+=("latex_compile") — compile after each wave to catch LaTeX errors early. EXECUTOR_CONTEXT_HINT="prose-heavy" — hint executors to balance equation density with exposition. |
| formalism | CONVENTION_LOCK_REQUIRED=true. PRE_EXECUTION_AGENTS+=("notation-coordinator") — conventions must be established before framework setup. INTER_WAVE_CHECKS+=("identity_scan"). |
| analysis | INTER_WAVE_CHECKS+=("plausibility_scan") — between waves, scan results for physically implausible values (NaN, sign changes, order-of-magnitude jumps). |
| validation | YOLO_RESTRICTIONS+=("no_skip_verification" "no_skip_inter_wave") — validation phases must run all checks regardless of autonomy mode. INTER_WAVE_CHECKS+=("identity_scan" "convergence_spot_check" "plausibility_scan") — run all inter-wave checks. |
Apply overrides:
for CLASS in "${PHASE_CLASSES[@]}"; do
case "$CLASS" in
derivation)
CONVENTION_LOCK_REQUIRED=true
INTER_WAVE_CHECKS+=("identity_scan")
EXECUTOR_CONTEXT_HINT="derivation-heavy"
WAVE_TIMEOUT_FACTOR=1.5
YOLO_RESTRICTIONS+=("no_skip_verification")
;;
numerical)
INTER_WAVE_CHECKS+=("convergence_spot_check")
EXECUTOR_CONTEXT_HINT="code-heavy"
PRE_EXECUTION_AGENTS+=("experiment-designer")
;;
literature)
FORCE_SEQUENTIAL=true
EXECUTOR_CONTEXT_HINT="reading-heavy"
INTER_WAVE_CHECKS=("convention")
;;
paper-writing)
PRE_EXECUTION_AGENTS+=("notation-coordinator")
INTER_WAVE_CHECKS+=("latex_compile")
EXECUTOR_CONTEXT_HINT="prose-heavy"
;;
formalism)
CONVENTION_LOCK_REQUIRED=true
PRE_EXECUTION_AGENTS+=("notation-coordinator")
INTER_WAVE_CHECKS+=("identity_scan")
;;
analysis)
INTER_WAVE_CHECKS+=("plausibility_scan")
;;
validation)
YOLO_RESTRICTIONS+=("no_skip_verification" "no_skip_inter_wave")
INTER_WAVE_CHECKS+=("identity_scan" "convergence_spot_check" "plausibility_scan")
;;
esac
done
echo "Execution adaptation: convention_lock=${CONVENTION_LOCK_REQUIRED}, pre_agents=[${PRE_EXECUTION_AGENTS[*]}], inter_wave=[${INTER_WAVE_CHECKS[*]}], context_hint=${EXECUTOR_CONTEXT_HINT}, timeout_factor=${WAVE_TIMEOUT_FACTOR}"
Convention lock enforcement:
If CONVENTION_LOCK_REQUIRED=true:
CONV_STATUS=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local --raw convention check)
if [ "$CONV_STATUS" != "locked" ] && [ "$CONV_STATUS" != "complete" ]; then
echo "ERROR: Phase class (${PHASE_CLASSES[*]}) requires locked conventions before execution."
echo "Convention status: ${CONV_STATUS}"
echo ""
echo "Fix with one of:"
echo " gpd convention set"
echo " $gpd-validate-conventions"
echo ""
echo "HALTING — convention errors in derivation/formalism phases compound across every step."
exit 1
fi
This is a hard gate. When CONVENTION_LOCK_REQUIRED=true and conventions are not locked, execution MUST NOT proceed. Do not skip this gate in any autonomy mode (including yolo). Convention errors are irreversible — they invalidate all downstream results.
Pre-execution agent spawning:
If PRE_EXECUTION_AGENTS is non-empty, spawn them sequentially before wave 1:
for AGENT_TYPE in "${PRE_EXECUTION_AGENTS[@]}"; do
case "$AGENT_TYPE" in
notation-coordinator)
AGENT_MODEL=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local resolve-model gpd-notation-coordinator)
# Spawn notation-coordinator to verify/establish conventions
# task(subagent_type="gpd-notation-coordinator", model="{AGENT_MODEL}", readonly=false, ...)
;;
experiment-designer)
AGENT_MODEL=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local resolve-model gpd-experiment-designer)
# Spawn experiment-designer to validate parameter ranges
# task(subagent_type="gpd-experiment-designer", model="{AGENT_MODEL}", readonly=false, ...)
;;
esac
done
Force-sequential override:
If FORCE_SEQUENTIAL=true, override PARALLELIZATION to false for this phase regardless of config setting. Log: "Phase class (${PHASE_CLASSES[*]}) forces sequential execution within waves."
YOLO mode restrictions:
If autonomy=yolo and YOLO_RESTRICTIONS is non-empty, restrict yolo behavior:
no_skip_verification: Do not skip the verification step even in yolo mode. Derivation and validation phases produce irreversible errors that cost more to debug than to verify.no_skip_inter_wave: Do not skip inter-wave gates even in yolo mode. Convention drift between waves in these phase types creates compound errors.Log any restrictions: "YOLO mode restricted for phase class (${PHASE_CLASSES[*]}): ${YOLO_RESTRICTIONS[*]}"
Context hint propagation:
Include EXECUTOR_CONTEXT_HINT in the executor spawn prompt so subagents can self-regulate:
<context_hint>{EXECUTOR_CONTEXT_HINT}</context_hint>
Hint meanings:
standard: Default allocation — balanced between derivation, code, and prose.derivation-heavy: Reserve 70% of context for step-by-step mathematical work. Minimize prose. Use \therefore not paragraphs.code-heavy: Reserve space for code blocks, numerical output tables, and convergence plots. Summarize analytical steps briefly.reading-heavy: Reserve space for literature citations and comparisons. Budget for reading 5-10 paper summaries.prose-heavy: Balance equations with exposition. Every equation needs 2-3 sentences of context.
</step>
Report: "Found {plan_count} plans in {phase_dir} ({incomplete_count} incomplete)" </step>
VALIDATION_FAILED=false
# Validate each plan's frontmatter and structure
for plan in "$phase_dir"/*-PLAN.md; do
if ! gpd verify plan "$plan"; then
echo "ERROR: plan-structure validation failed for $(basename "$plan")"
VALIDATION_FAILED=true
fi
done
# Validate wave dependencies
if ! gpd phase validate-waves "$phase_number"; then
echo "ERROR: wave dependency validation failed"
VALIDATION_FAILED=true
fi
# Check cross-references in plans
for plan in "$phase_dir"/*-PLAN.md; do
if ! gpd verify references "$plan"; then
echo "ERROR: reference validation failed for $(basename "$plan")"
VALIDATION_FAILED=true
fi
done
if [ "$VALIDATION_FAILED" = true ]; then
echo "Structural validation failed. Fix the issues above before proceeding."
fi
If VALIDATION_FAILED is true: Present all collected errors to the user. Do not proceed with execution until structural issues are resolved.
</step>
PLAN_INDEX=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local phase index "${phase_number}")
Parse JSON for: phase, plans[] (each with id, wave, interactive, objective, files_modified, task_count, has_summary), waves (map of wave number -> plan IDs), incomplete, has_checkpoints.
Filtering: Skip plans where has_summary: true. If --gaps-only: also skip non-gap_closure plans. If all filtered: "No matching incomplete plans" -> exit.
Intra-wave dependency validation: Verify that no plan's depends_on references another plan in the SAME wave (which would be a circular dependency within a wave):
INTRA_WAVE_CONFLICT=false
# For each wave, check that no plan depends on another plan in the same wave
for WAVE_NUM in $(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json keys .waves); do
WAVE_PLANS=$(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json list ".waves[\"$WAVE_NUM\"]")
for PLAN_ID in $WAVE_PLANS; do
DEPS=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local frontmatter get \
"${phase_dir}/${PLAN_ID}-PLAN.md" --field depends_on 2>/dev/null)
for DEP in $(echo "$DEPS" | tr ',' ' '); do
if echo "$WAVE_PLANS" | grep -q "^${DEP}$"; then
echo "ERROR: Plan ${PLAN_ID} depends on ${DEP}, but both are in wave ${WAVE_NUM}"
INTRA_WAVE_CONFLICT=true
fi
done
done
done
Parallel file conflict detection: For waves with 2+ plans, check files_modified frontmatter for overlaps:
FILE_CONFLICT=false
# For each wave with 2+ plans, check for file modification overlaps
for WAVE_NUM in $(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json keys .waves); do
WAVE_PLANS=($(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json list ".waves[\"$WAVE_NUM\"]"))
if [ ${#WAVE_PLANS[@]} -gt 1 ]; then
ALL_FILES=()
for PLAN_ID in "${WAVE_PLANS[@]}"; do
FILES=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local frontmatter get \
"${phase_dir}/${PLAN_ID}-PLAN.md" --field files_modified 2>/dev/null)
for F in $(echo "$FILES" | tr ',' ' '); do
if [[ " ${ALL_FILES[*]} " =~ " ${F} " ]]; then
echo "WARNING: File '${F}' modified by multiple plans in wave ${WAVE_NUM}"
FILE_CONFLICT=true
fi
ALL_FILES+=("${F}")
done
done
fi
done
If INTRA_WAVE_CONFLICT is true: STOP — present the dependency issue and do not proceed.
If FILE_CONFLICT is true: WARN — present the overlap and offer to serialize the conflicting plans within the wave.
Report:
## Execution Plan
**Phase {X}: {Name}** -- {total_plans} plans across {wave_count} waves
| Wave | Plans | What it builds |
|------|-------|----------------|
| 1 | 01-01, 01-02 | {from plan objectives, 3-8 words} |
| 2 | 01-03 | ... |
REVIEW_CADENCE=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .review_cadence --default adaptive)
MAX_UNATTENDED_MINUTES_PER_PLAN=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .max_unattended_minutes_per_plan --default 45)
MAX_UNATTENDED_MINUTES_PER_WAVE=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .max_unattended_minutes_per_wave --default 90)
CHECKPOINT_AFTER_N_TASKS=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .checkpoint_after_n_tasks --default 3)
CHECKPOINT_AFTER_FIRST_RESULT=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .checkpoint_after_first_load_bearing_result --default true)
CHECKPOINT_BEFORE_DOWNSTREAM=$(echo "$INIT" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json get .checkpoint_before_downstream_dependent_tasks --default true)
Core invariant: autonomy decides who gets interrupted. review_cadence decides when the system must stop, inspect, or re-question. Even in yolo, required first-result and pre-fanout gates still run; the difference is that a clean pass can auto-continue.
These gates are task-level safety rails, not line-by-line interruptions. Even in supervised, checkpoint after each plan task or required gate, not after every algebraic micro-step.
For each wave, classify whether downstream fanout is risky:
task_count >= CHECKPOINT_AFTER_N_TASKS, no authored checkpoints, or is likely to exceed MAX_UNATTENDED_MINUTES_PER_PLANderivation, formalism, numerical, or validation phase classesWhen a wave is risky:
FIRST_RESULT_GATE_REQUIRED=truePRE_FANOUT_REVIEW_REQUIRED=trueSEGMENT_TASK_CAP=${CHECKPOINT_AFTER_N_TASKS}When a wave is not risky:
Skeptical re-questioning rule: if the first material result only validates a proxy, internal consistency story, or supporting artifact while decisive anchors, benchmark references, or contract-backed acceptance tests remain unresolved, stop and explicitly re-question the framing before allowing downstream fanout. Record:
For each wave:
Convention lock check (before parallel execution):
Before launching parallel plans, verify convention consistency:
/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local convention check
gpd convention setPre-flight convention check for parallel waves: Before spawning wave executors in parallel, verify all plans in the wave reference the same convention_lock values. For each plan in the wave, extract any convention references (metric signature, Fourier convention, unit system) and cross-compare. If any plan's conventions differ from the locked values, resolve the discrepancy before spawning. This prevents the most insidious class of parallel execution bugs: two agents computing with different sign conventions whose results are later combined.
Create wave-level checkpoint before any plan in the wave starts:
WAVE_CHECKPOINT="gpd-checkpoint/phase-${phase_number}-wave-${WAVE_NUM}-$(date +%s)-$$"
git tag "${WAVE_CHECKPOINT}"
Store the tag for wave-level recovery.
Describe what's being done (BEFORE spawning):
Read each plan's <objective>. Extract what's being computed/derived and why.
---
## Wave {N}
**{Plan ID}: {Plan Name}**
{2-3 sentences: what this derives/computes/simulates, mathematical approach, why it matters for the overall research}
Spawning {count} agent(s)...
---
If this wave is marked risky fanout: run probe_then_fanout instead of blind full-wave scaleout.
Spawn executor agents:
Pass paths only -- executors read files themselves with their fresh 200k context. This keeps orchestrator context lean (~10-15%).
Runtime delegation: Spawn a subagent for the task below. Adapt the
task()call to your runtime's agent spawning mechanism. Ifmodelresolves tonullor an empty string, omit it so the runtime uses its default model. Always passreadonly=falsefor file-producing agents. If subagent spawning is unavailable, execute these steps sequentially in the main context.
task(
subagent_type="gpd-executor",
model="{executor_model}",
readonly=false,
prompt="First, read ./.codex/agents/gpd-executor.md for your role and instructions.
<objective>
Execute plan {plan_number} of phase {phase_number}-{phase_name}.
Commit each task atomically. Create SUMMARY.md.
Return state updates (position, decisions, metrics) in your response -- do NOT write STATE.md directly.
</objective>
<context_hint>{EXECUTOR_CONTEXT_HINT}</context_hint>
<phase_class>{PHASE_CLASSES}</phase_class>
<protocol_bundles>{selected_protocol_bundle_ids}</protocol_bundles>
<protocol_bundle_context>{protocol_bundle_context}</protocol_bundle_context>
<review_cadence>{REVIEW_CADENCE}</review_cadence>
<max_unattended_minutes_per_plan>{MAX_UNATTENDED_MINUTES_PER_PLAN}</max_unattended_minutes_per_plan>
<max_unattended_minutes_per_wave>{MAX_UNATTENDED_MINUTES_PER_WAVE}</max_unattended_minutes_per_wave>
<segment_task_cap>{SEGMENT_TASK_CAP}</segment_task_cap>
<first_result_gate>{FIRST_RESULT_GATE_REQUIRED}</first_result_gate>
<checkpoint_before_downstream>{CHECKPOINT_BEFORE_DOWNSTREAM}</checkpoint_before_downstream>
<bounded_execution>{true}</bounded_execution>
<files_to_read>
Read these files at execution start using the read_file tool:
- Workflow: ./.codex/get-physics-done/workflows/execute-plan.md
- Summary template: ./.codex/get-physics-done/templates/summary.md
- Checkpoints ref: ./.codex/get-physics-done/references/orchestration/checkpoints.md
- Validation ref: ./.codex/get-physics-done/references/verification/core/verification-core.md (+ domain-specific verification file)
- Plan: {phase_dir}/{plan_file}
- State: .gpd/STATE.md
- Config: .gpd/config.json (if exists)
</files_to_read>
<success_criteria>
- [ ] All tasks executed with mathematical rigor
- [ ] Each task committed individually
- [ ] Dimensional consistency verified at each step
- [ ] Limiting cases checked where specified in plan
- [ ] SUMMARY.md created in plan directory
- [ ] State updates returned (NOT written to STATE.md directly)
</success_criteria>
"
)
Wait for all agents in wave to complete.
Progress feedback during wave execution: As each plan completes (or fails), immediately report to the user:
[Phase {N}, Wave {W}] Plan {plan_id} complete ({completed}/{total} in wave)
Result: {one-line summary from SUMMARY.md or failure reason}
This ensures the user sees progress even when waves have multiple parallel plans. Do not wait for the entire wave to finish before showing any output.
If any executor agent fails to spawn or returns an error: Check if the agent committed any work (git log --oneline -3). If commits exist, the agent may have completed but failed to report — spot-check output files and proceed. If no work was done, record the plan as failed for this wave. After all other agents complete, report failed plans and offer: 1) Retry failed plans in a new wave, 2) Execute failed plans in the main context, 3) Skip failed plans and continue. Do not abort the entire phase for individual plan failures.
Report completion -- spot-check claims first:
For each SUMMARY.md:
Verify first 2 files from key-files.created exist on disk
Check git log --oneline --grep="{phase}-{plan}" returns >=1 commit
Check for ## Self-Check: FAILED marker
Check for ## Validation: FAILED marker (physics-specific)
Validate the gpd_return envelope:
RETURN_CHECK=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local --raw validate-return "${SUMMARY_FILE}")
if [ "$RETURN_CHECK" != "passed" ]; then
echo "WARNING: validate-return failed for $(basename "$SUMMARY_FILE")"
# Mark plan as NEEDS_REVIEW but continue — missing envelope is not fatal
fi
If ANY spot-check fails: report which plan failed, route to wave_failure_handling -- do NOT silently continue.
IMPORTANT: Executor subagents MUST NOT write STATE.md directly. Return state updates (position, decisions, metrics) in the structured return envelope. The orchestrator applies them sequentially after each agent completes. This prevents parallel write conflicts where multiple agents overwrite each other's STATE.md changes.
After each plan completes successfully (not just after each wave), the orchestrator runs:
gpd state advance immediatelygpd state record-metric for the completed planIf pass:
---
## Wave {N} Complete
**{Plan ID}: {Plan Name}**
{What was derived/computed -- from SUMMARY.md}
{Notable deviations or unexpected results, if any}
{Limiting cases verified: list}
{If more waves: what this enables for next wave}
---
Handle failures -- see wave_failure_handling below.
Execute checkpoint plans between waves -- see <checkpoint_handling>.
Before unlocking downstream dependent waves, confirm that risky-wave plans passed the first meaningful review point:
If this gate fails: STOP — do not let wrong early assumptions scale out.
Machine-state requirement for risky fanout gates: when this review point pauses execution, record it as live execution state, not only prose. Emit an execution gate event with:
checkpoint_reason: pre_fanoutpre_fanout_review_pending: truedownstream_locked: truelast_result_label or last_artifact_path for the first load-bearing output being reviewedskeptical_requestioning_required: true when the first result still looks proxy-only, anchor-thin, or otherwise short of the decisive evidence the contract still owesskeptical_requestioning_summary, weakest_unchecked_anchor, and disconfirming_observation whenever skeptical re-questioning is requiredIf the runtime or agent only emits a fanout-lock event, normalize it into the same live review stop: treat the lock as checkpoint_reason=pre_fanout, mark waiting_for_review=true, and keep downstream locked until the review is explicitly cleared.
Gate clears are reason-scoped: clearing first_result must not erase pre_fanout or skeptical review flags, and skeptical re-questioning should be cleared explicitly when it is resolved.
For pre_fanout, the matching gate-clear and fanout unlock are separate transitions: the clear records the review outcome, the unlock releases downstream work. Keep the segment live on status, notify, and resume surfaces until both have been observed. Do not silently continue on "looks fine" prose alone.
Inter-wave verification gate (if more waves remain):
Before spawning the next wave, run lightweight verification on the just-completed wave's outputs. This catches errors cheaply before they propagate to downstream waves.
Determine if gate is enabled from init/context fields only:
review_cadence == dense: enable inter-wave verificationreview_cadence == adaptive: enable it when the completed wave established or challenged a decisive evidence path, introduced a new baseline/estimator that later waves depend on, or left any skeptical or pre-fanout state unresolvedreview_cadence == sparse: skip the routine gate unless the just-completed wave triggered a failed sanity check, anchor gap, or pre-fanout dependency warningIf enabled:
First, collect the SUMMARY.md files produced by the just-completed wave:
# Collect SUMMARY files from the plans that executed in the current wave
wave_summaries=()
for PLAN_ID in $(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json list ".waves[\"$WAVE_NUM\"]"); do
SUMMARY_PATH="${phase_dir}/${PLAN_ID}-SUMMARY.md"
[ -f "$SUMMARY_PATH" ] && wave_summaries+=("$SUMMARY_PATH")
done
Run lightweight checks on the wave's SUMMARY.md outputs:
a. Convention consistency — verify convention lock hasn't drifted:
CONV_CHECK=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local --raw convention check)
if [ "$CONV_CHECK" = "incomplete" ]; then
echo "WARNING: Convention lock has unlocked fields"
fi
b. Dimensional spot-check — scan the wave's SUMMARY.md files for key results and verify dimensional consistency:
For each SUMMARY.md produced in the just-completed wave, extract key equations (from key_results or equations frontmatter fields) and verify that:
This is a lightweight scan (~2-5k tokens), not a full dimensional analysis. It checks the SUMMARY outputs, not the derivation internals.
Inter-wave transition display:
Before spawning the next wave, display a physics-meaningful progress update that connects what was just computed to what comes next:
---
Wave {N} -> Wave {N+1} transition
Completed: {brief physics summary of wave N results -- e.g., "Exact diagonalization of 2D Hubbard model for N=4,8,12 sites"}
Enables: {what wave N+1 will use from these results -- e.g., "Finite-size scaling analysis using the energy spectra from Wave 1"}
Starting: {brief description of wave N+1 plans -- e.g., "Extracting critical exponents via data collapse (plans 03, 04)"}
---
Extract the "Completed" summary from the wave N completion report (step 6 above). Extract "Enables" and "Starting" from the wave N+1 plan objectives. Keep each line to one sentence.
1. Identify the failure and its downstream impact:
# Collect all plans from waves AFTER the current wave
LATER_PLANS=()
for LATER_WAVE in $(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json keys .waves | awk -v w="$WAVE_NUM" '$1 > w'); do
for P in $(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json list ".waves[\"$LATER_WAVE\"]"); do
LATER_PLANS+=("$P")
done
done
# Which of those later plans depend on the failed plan?
DEPENDENT_PLANS=()
for LATER_PLAN in "${LATER_PLANS[@]}"; do
DEPS=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local frontmatter get \
"${phase_dir}/${LATER_PLAN}-PLAN.md" --field depends_on 2>/dev/null)
if echo "$DEPS" | grep -q "${FAILED_PLAN_ID}"; then
DEPENDENT_PLANS+=("${LATER_PLAN}")
fi
done
2. Report failure with dependency analysis:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GPD > WAVE {N} FAILURE
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
**Failed plan:** {PLAN_ID} -- {plan name}
**Reason:** {failure description from spot-check or agent report}
### Wave {N} Status
| Plan | Status |
| ---- | ------ |
| {plan-A} | Passed |
| {plan-B} | FAILED |
| {plan-C} | Passed |
### Downstream Impact
Plans that depend on {FAILED_PLAN_ID} (will be auto-skipped):
{list of dependent plans with their wave numbers, or "None -- no downstream dependencies"}
──────────────────────────────────────────────────────
Options:
1. "Rollback failed plan only" (preferred) -- revert only the commits from the failed plan
using the TASK_COMMITS record. Keep all successful plans in this wave.
2. "Continue" -- skip failed plan + dependents, execute remaining waves
3. "Rollback wave" -- revert all wave {N} work to wave checkpoint
4. "Stop" -- halt phase execution, preserve all completed work
──────────────────────────────────────────────────────
3. Handle user choice:
Continue:
Mark the failed plan as skipped in the wave tracker
Auto-skip all plans in DEPENDENT_PLANS in subsequent waves with message:
Skipping {PLAN_ID}: depends on failed plan {FAILED_PLAN_ID}
Track skipped plans in SKIPPED_PLANS array with reasons for the recovery report
Proceed to next wave, filtering out dependent plans
Rollback wave:
Revert to the wave checkpoint:
WAVE_CHECKPOINT_COMMIT=$(git rev-list -n 1 "${WAVE_CHECKPOINT}")
git revert --no-commit HEAD...${WAVE_CHECKPOINT_COMMIT}
git commit -m "$(cat <<EOF
revert: rollback wave ${WAVE_NUM} of phase ${phase_number}
Failed plan: ${FAILED_PLAN_ID}
Reason: ${FAILURE_REASON}
Checkpoint: ${WAVE_CHECKPOINT}
EOF
)"
Ask: "Retry wave {N}?" or "Stop execution?"
If retry: re-enter the wave execution loop for wave N
If stop: proceed to recovery report
Stop:
4. Auto-skip dependent plans during subsequent waves:
When processing plans in waves N+1, N+2, etc., check each plan against the SKIPPED_PLANS list:
for DEP in $(echo "$PLAN_DEPS" | tr ',' ' '); do
if [[ " ${SKIPPED_PLANS[*]} " =~ " ${DEP} " ]]; then
echo "SKIP: Plan ${PLAN_ID} depends on skipped/failed plan ${DEP}"
SKIPPED_PLANS+=("${PLAN_ID}:depends_on_${DEP}")
continue 2
fi
done
Handoff verification: Do not trust the runtime handoff status by itself. Verify expected output files, the structured return envelope, and git commits before treating a subagent as failed.
</step>
Flow:
Spawn agent for checkpoint plan
Agent runs until checkpoint task or validation gate -> returns structured state
Agent return includes: completed tasks table, current task + blocker, checkpoint type/details, what's awaited, and the bounded execution segment envelope
checkpoint_reasonfirst_result_gate_pending or pre_fanout_review_pendingpre_fanout_review_cleared when review was accepted but downstream unlock is still outstandingskeptical_requestioning_requiredskeptical_requestioning_summaryweakest_unchecked_anchordisconfirming_observationdownstream_lockedPresent to user:
## Checkpoint: [Type]
**Plan:** 03-03 Perturbation Expansion
**Progress:** 2/3 tasks complete
[Checkpoint Details from agent return]
[Awaiting section from agent return]
User responds: "approved"/"done" | issue description | decision selection
Spawn continuation agent (NOT resume) using ./.codex/get-physics-done/templates/continuation-prompt.md template:
{completed_tasks_table}: From checkpoint return{resume_task_number} + {resume_task_name}: Current task{user_response}: What user provided{resume_instructions}: Based on checkpoint type (see template for type-specific instructions){execution_segment}: The returned bounded-segment state, including checkpoint cause, current cursor, resume preconditions, downstream-lock status, and any skeptical re-questioning fields that must survive into the continuationContinuation agent verifies previous commits, continues from resume point
Repeat until plan completes or user stops
Why fresh agent, not resume: Resume relies on internal serialization that breaks with parallel tool calls. Fresh agents with explicit state are more reliable.
Checkpoints in parallel waves: Agent pauses and returns while other parallel agents may complete. Present checkpoint, spawn continuation, wait for all before next wave. </step>
Count the SUMMARY files that will be read and estimate their impact on orchestrator context:
SUMMARY_COUNT=$(ls "${phase_dir}"/*-SUMMARY.md 2>/dev/null | wc -l)
ESTIMATED_TOKENS=$(( SUMMARY_COUNT * 3000 ))
CONTEXT_BUDGET=${CONTEXT_BUDGET:-200000} # Model-dependent; 200k for most current models
BUDGET_PERCENT=$(( ESTIMATED_TOKENS * 100 / CONTEXT_BUDGET ))
If BUDGET_PERCENT exceeds 15%: warn before proceeding:
WARNING: Reading ${SUMMARY_COUNT} SUMMARY files will consume ~${BUDGET_PERCENT}% of orchestrator context.
Consider using summary-extract for one-liners only instead of full SUMMARY reads.
If >15%, use summary-extract for one-liners instead of reading full SUMMARY files:
for summary in "${phase_dir}"/*-SUMMARY.md; do
/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local summary-extract "$summary" --field one_liner
done
## Phase {X}: {Name} Execution Complete
**Waves:** {N} | **Plans:** {M}/{total} complete
| Wave | Plans | Status |
| ---- | ---------------- | -------- |
| 1 | plan-01, plan-02 | Complete |
| CP | plan-03 | Verified |
| 2 | plan-04 | Complete |
### Plan Details
1. **03-01**: [one-liner from SUMMARY.md]
2. **03-02**: [one-liner from SUMMARY.md]
### Validation Summary
[Aggregate limiting case checks, dimensional consistency results, cross-checks]
### Issues Encountered
[Aggregate from SUMMARYs, or "None"]
Scan all SUMMARY.md files from this phase for figure-related artifacts:
# Find figures referenced in SUMMARY files
for SUMMARY in "${phase_dir}"/*-SUMMARY.md; do
# Extract key-files.created entries that look like figures
grep -E '\.(pdf|png|eps|svg|jpg|jpeg|tiff)' "$SUMMARY" 2>/dev/null
done
# Also scan durable figure roots for generated plot files
PHASE_ARTIFACT_DIR="artifacts/phases/${phase_number}-${phase_slug}"
find "${PHASE_ARTIFACT_DIR}" figures/ paper/figures/ -maxdepth 3 \
\( -name "*.pdf" -o -name "*.png" -o -name "*.eps" \) 2>/dev/null | \
grep -iE "fig|plot|phase_diag|spectrum|convergence|diagram" 2>/dev/null
Generated figures and plots should live in stable workspace roots such as artifacts/phases/${phase_number}-${phase_slug}/, figures/, or paper/figures/, not under .gpd/phases/**.
If any figures found:
Read the figure tracker template:
cat ./.codex/get-physics-done/templates/paper/figure-tracker.md
If .gpd/paper/FIGURE_TRACKER.md already exists: Append new figures to the existing registry. Do not overwrite existing entries.
If it does not exist: Create it from the template:
mkdir -p .gpd/paper
Write .gpd/paper/FIGURE_TRACKER.md with:
Source phase set to the current phase numberSource file set to the script or notebook that generated it (from SUMMARY key-files)Data file(s) set to any associated data files (from SUMMARY key-files)Status set to "Data ready" or "Draft" based on file inspectionLast updated set to today's dateCommit:
PRE_CHECK=$(/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local pre-commit-check --files .gpd/paper/FIGURE_TRACKER.md 2>&1) || true
echo "$PRE_CHECK"
/home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local commit \
"docs(phase-${phase_number}): update figure tracker" \
--files .gpd/paper/FIGURE_TRACKER.md
If no figures found: Skip silently (not all phases produce visual outputs).
Experimental comparison artifact: If any plan in this phase compared theoretical predictions with experimental or observational data (PHENO-type objectives, or plans whose SUMMARY mentions "experimental comparison", "pull", "chi-squared", or "theory vs data"), create .gpd/paper/EXPERIMENTAL_COMPARISON.md using ./.codex/get-physics-done/templates/paper/experimental-comparison.md. Populate with comparison tables, pull values, and discrepancy classifications from the plan SUMMARYs. Skip if no experimental comparison was performed.
This step runs unconditionally -- for fully successful phases it is a brief confirmation; for phases with failures it is the critical decision point.
1. Collect execution outcomes:
# Collect all plan IDs from the phase plan index
ALL_PLAN_IDS=($(echo "$PLAN_INDEX" | /home/qol/.gpd/venv/bin/python -m gpd.runtime_cli --runtime codex --config-dir ./.codex --install-scope local json pluck .plans id))
# Initialize outcome tracking arrays
# Note: FAILED_IDS, SKIPPED_IDS, and their reason maps should be maintained
# by the orchestrator during execute_waves and wave_failure_handling steps.
# FAILED_IDS+=("plan_id") when a plan fails spot-checks or agent reports failure.
# SKIPPED_IDS+=("plan_id") when a plan is auto-skipped due to dependency on failed plan.
declare -A FAILURE_REASONS # Map plan_id -> failure description
declare -A SKIP_REASONS # Map plan_id -> "depends_on_${dep_id}"
PLANS_SUCCEEDED=() # Plans with SUMMARY.md and passing spot-checks
PLANS_FAILED=() # Plans that failed during execution
PLANS_SKIPPED=() # Plans skipped due to dependency on failed plans
PLANS_ROLLED_BACK=() # Plans whose work was reverted
for PLAN_ID in "${ALL_PLAN_IDS[@]}"; do
if [ -f "${phase_dir}/${PLAN_ID}-SUMMARY.md" ]; then
PLANS_SUCCEEDED+=("${PLAN_ID}")
elif [[ " ${FAILED_IDS[*]} " =~ " ${PLAN_ID} " ]]; then
PLANS_FAILED+=("${PLAN_ID}:${FAILURE_REASONS[$PLAN_ID]}")
elif [[ " ${SKIPPED_IDS[*]} " =~ " ${PLAN_ID} " ]]; then
PLANS_SKIPPED+=("${PLAN_ID}:${SKIP_REASONS[$PLAN_ID]}")
fi
done
2. Present recovery report:
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
GPD > PHASE {X} EXECUTION REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
### Results
| Plan | Status | Detail |
| ---- | ------ | ------ |
| {id} | Passed | {one-liner from SUMMARY} |
| {id} | FAILED | {failure reason} |
| {id} | Skipped | Depends on failed {dep_id} |
**Summary:** {succeeded_count} passed, {failed_count} failed, {skipped_count} skipped
3. If ALL plans passed: Proceed to verify_phase_goal as normal. Report is informational only.
4. If ANY failures or skips occurred:
Create a recovery section in the phase directory. For physics-specific root cause analysis, consult ./.codex/get-physics-done/templates/recovery-plan.md:
RECOVERY_FILE="${phase_dir}/PHASE-RECOVERY.md"
Write PHASE-RECOVERY.md:
---
c. Unverified identity scan — check for IDENTITY_CLAIM tags without verification:
# Scan wave artifacts for unverified identity claims
PHASE_ARTIFACT_DIR="artifacts/phases/${phase_number}-${phase_slug}"
for summary in ${wave_summaries[@]}; do
grep -rl "IDENTITY_SOURCE: training_data" \
"$summary" "${PHASE_ARTIFACT_DIR}" figures/ data/ simulations/ paper/figures/ \
2>/dev/null | while read f; do
if ! grep -q "IDENTITY_VERIFIED:" "$f" 2>/dev/null; then
echo "WARNING: Unverified training_data identity in $f"
fi
done
done
Prefer paths surfaced through SUMMARY key-files or contract deliverables. Do not assume durable artifacts live beside the SUMMARY in .gpd/phases/**.
If unverified identities are found: flag as WARNING. These identities may be correct but have not been numerically tested — downstream waves building on them carry unquantified risk.
d. Computation-type-specific checks (driven by INTER_WAVE_CHECKS from adapt_to_computation_type):
If convergence_spot_check in INTER_WAVE_CHECKS (numerical phases):
Scan the wave's SUMMARY.md files for convergence-related metrics. Look for keywords: convergence, error, residual, tolerance, iterations, grid_size. Flag if:
for summary in ${wave_summaries[@]}; do
# Extract numerical metrics from SUMMARY frontmatter or key_results
grep -iE "converge|residual|error.*=.*[0-9]|tolerance" "$summary" 2>/dev/null | while read line; do
echo "CONVERGENCE: $line"
done
done
If plausibility_scan in INTER_WAVE_CHECKS (analysis/validation phases):
Scan the wave's SUMMARY.md outputs for physically implausible values:
for summary in ${wave_summaries[@]}; do
grep -iE "NaN|Inf|= -[0-9]|diverge" "$summary" 2>/dev/null | while read line; do
echo "PLAUSIBILITY WARNING: $line"
done
done
If latex_compile in INTER_WAVE_CHECKS (paper-writing phases):
If pdflatex is available, compile the paper after each wave to catch LaTeX errors early:
if command -v pdflatex &>/dev/null && [ -f paper/main.tex ]; then
cd paper && pdflatex -interaction=nonstopmode main.tex 2>&1 | grep -E "^!" | head -5
cd ..
fi
Flag any LaTeX errors as WARNING — they should be fixed before the next wave adds more content.
If any check fails:
---
## Inter-wave verification gate
**Convention check:** {PASS | WARNING: {details}}
**Dimensional check:** {PASS | WARNING: {details}}
**Identity check:** {PASS | WARNING: {N} unverified training_data identities}
**Convergence check:** {PASS | WARNING: {details} | SKIPPED (not numerical phase)}
**Plausibility check:** {PASS | WARNING: {details} | SKIPPED (not analysis/validation phase)}
**LaTeX compile:** {PASS | WARNING: {N} errors | SKIPPED (not paper-writing phase)}
Options:
1. Continue to next wave (accept warnings)
2. Fix issues before continuing
3. Stop execution and investigate
---
Present options and wait for user response (or auto-continue in YOLO mode if both are warnings, not errors — unless YOLO_RESTRICTIONS includes no_skip_inter_wave, in which case always present).
If disabled: Skip verification gate, proceed directly to step 10. Exception: if YOLO_RESTRICTIONS includes no_skip_inter_wave, the gate runs even when disabled by config.
Cost: ~2-5k tokens per inter-wave gate. For a 4-wave phase with deep-theory profile, this is ~10-15k tokens overhead — negligible compared to the cost of a sign error propagating through 3 subsequent waves.