Pre-experiment planning that translates hypotheses into a concrete, executable experiment plan with baselines, ablations, sample size, resource estimation, and execution ordering.

Core Features

1. Baseline Selection

Select and justify comparison baselines:

Trivial baseline: Random chance, majority class, or simplest heuristic
Standard baseline: Most common method in the field
SOTA baseline: Best published result on the benchmark
Ablation baseline: Proposed method minus the key component
Fairness checklist: Same preprocessing, splits, hyperparameter budget

2. Ablation Planning

Design ablation studies to isolate component contributions:

Component identification: Which parts of the method are novel?
Ablation ordering: Which components to remove first (most to least important)
: Predicted effect of each ablation (for hypothesis validation)

Pre-experiment planning that translates hypotheses into a concrete, executable experiment plan with baselines, ablations, sample size, resource estimation, and execution ordering.

Core Features

1. Baseline Selection

Select and justify comparison baselines:

Trivial baseline: Random chance, majority class, or simplest heuristic
Standard baseline: Most common method in the field
SOTA baseline: Best published result on the benchmark
Ablation baseline: Proposed method minus the key component
Fairness checklist: Same preprocessing, splits, hyperparameter budget

2. Ablation Planning

Design ablation studies to isolate component contributions:

Component identification: Which parts of the method are novel?
Ablation ordering: Which components to remove first (most to least important)
: Predicted effect of each ablation (for hypothesis validation)

Resource	Estimate
GPU type needed	[minimum VRAM, recommended type]
Per-run time	[estimated minutes]
Total runs	[conditions x seeds]
Total GPU-hours	[per-run x total runs]
Storage	[dataset size + checkpoints + outputs]

Experiment Design

Core Features

1. Baseline Selection

2. Ablation Planning

Experiment Design

Core Features

1. Baseline Selection

2. Ablation Planning

Causal Claim Ablation Requirements

2b. Dataset Representativeness Rule

2c. Motivation-Metric Alignment

3. Sample Size & Seeds

4. Resource Estimation

5. Execution Ordering

6. Compute Requirements (estimated at design time)

7. Expected Results (mandatory per hypothesis)

Input Modes

Mode A: Pipeline (from predecessor)

Mode B: Standalone (manual)

Outputs

Power Analysis: Optional with Explicit Caveat

Iteration Loop State

`experiment-state.json`

When to Use

Scenarios for This Skill

Typical Workflow

Integration with Other Systems

Complete Research Workflow

Data Flow

Key Configuration

Additional Resources

Reference Files

Example Files

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio