Use when main results pass result-to-claim (claim_supported=yes or partial) and ablation studies are needed for paper submission. A secondary Codex reviewer designs ablations from a reviewer's perspective, while the main agent checks feasibility and implementation.
Systematically design ablation studies that answer the questions reviewers will ask.
/result-to-claim with claim_supported = yes or partial/auto-review-loop identifies missing ablationsResolve automation defaults in this precedence order:
PROJECT_AUTOMATION.md in the project rootCLAUDE.md in the project rootBefore invoking the secondary reviewer, read ../shared-references/agent-role-charter.md and apply the Ablation Reviewer role.
Read available project files to build the full picture:
docs/research_contract.md, project notes, or AGENTS.mdEXPERIMENT_LOG.md, EXPERIMENT_TRACKER.md, or W&Bresult-to-claim output or project notesspawn_agent:
model: gpt-5.4
reasoning_effort: xhigh
message: |
You are a skeptical senior reviewer planning ablation studies for a paper in
multi-agent control / robotics / learning systems. Your job is to propose the
minimum decisive ablations needed to survive peer review.
Given this method and results, design ablations that:
1. Isolate the contribution of each novel component
2. Answer questions reviewers will definitely ask
3. Test sensitivity to key hyperparameters
4. Compare against natural alternative design choices
Method: [description from project files]
Components: [list of removable/replaceable components]
Current results: [key metrics from experiments]
Claims: [what we claim and current evidence]
For each ablation, specify:
- name: what to change
- what_it_tests: the specific question this answers
- expected_if_component_matters: what we predict if the component is important
- priority: 1 (must-run) to 5 (nice-to-have)
Also provide:
- coverage_assessment
- unnecessary_ablations
- suggested_order
- estimated_compute
Do not pad the plan with ornamental sweeps. Favor reviewer-facing ablations that
clearly sharpen the paper's claim boundaries.
Normalize the response into:
## Ablation Plan
### Component Ablations
| # | Name | What It Tests | Expected If Matters | Priority |
|---|------|---------------|---------------------|----------|
### Hyperparameter Sensitivity
| # | Parameter | Values to Test | What It Tests | Priority |
|---|-----------|---------------|---------------|----------|
### Design Choice Comparisons
| # | Name | What It Tests | Priority |
|---|------|---------------|----------|
### Coverage Assessment
[what reviewer questions these ablations answer]
### Unnecessary Ablations
[what to skip]
### Run Order
[optimized order]
### Estimated Compute
[total GPU-hours]
Before running anything, check:
EXPERIMENT_LOG.mdwhat_it_tests and expected_if_component_matters.