A structured 4-stage framework for executing research experiments from initial implementation through ablation study, with attempt budgets and gate conditions that prevent wasted effort. This follows the Experiment Tree Search design from the EvoScientist paper, where the engineer agent iteratively generates executable code, runs experiments, and records structured execution results at each stage.

When to Use This Skill

User has a planned experiment and needs to organize the execution workflow
User wants to systematically validate a novel method against baselines
User asks about experiment stages, attempt budgets, or when to move on
User needs to reproduce baseline results before testing their method
User mentions "experiment pipeline", "baseline first", "ablation study", "stage budget", "experiment execution"

The Pipeline Mindset

Experiments fail for two reasons: wrong order and no stopping criteria. Most researchers jump straight to testing their novel method without verifying their baseline setup, then wonder why results don't make sense. Others spend weeks tuning hyperparameters without a budget, hoping the next run will work.

Stage	Goal	Budget (N_E^s)	Gate Condition
1. Initial Implementation	Get baseline code running and reproduce known results	≤20 attempts	Metrics within 2% of reported values (or within reported variance)
2. Hyperparameter Tuning	Optimize config for your setup	≤12 attempts	Stable config, variance < 5% across 3 runs
3. Proposed Method	Implement & validate novel method	≤12 attempts	Outperforms tuned baseline on primary metric, consistent across 3 runs
4. Ablation Study	Prove each component's contribution	≤18 attempts	All claims evidenced with controlled experiments

Artifact	Source Stage	Used By
Initial implementation results	Stage 1	Comparison tables, setup verification
Optimal hyperparameter config	Stage 2	Reproducibility section
Method vs baseline comparison	Stage 3	Main results table
Ablation study results	Stage 4	Ablation table, contribution claims
Code trajectory logs (all stages)	All stages	Method section details, supplementary
Implementation details and tricks	Stages 1-3	Method section, reproducibility (captured in trajectory log Analysis fields and `[Reusable]` tags)

Topic	Reference File	When to Use
Per-stage checklists and patterns	stage-protocols.md	Detailed guidance for each stage
Budget rationale and adjustment	attempt-budget-guide.md	When budgets feel too tight or too loose
Code trajectory logging format	code-trajectory-logging.md	Recording attempts for evo-memory
Stage log template	stage-log-template.md	Logging a single stage's progress
Pipeline tracker template	pipeline-tracker-template.md	Tracking the full 4-stage pipeline

Experiment Pipeline

Experiment Pipeline

When to Use This Skill

The Pipeline Mindset

Before Starting: Load Prior Knowledge

4-Stage Pipeline Overview

The Stage Loop

Stage 1: Initial Implementation

Stage 2: Hyperparameter Tuning

Stage 3: Proposed Method

Stage 4: Ablation Study

Integrating experiment-craft for Diagnosis

Code Trajectory Logging

Counterintuitive Pipeline Rules

Handoff to Paper Writing

Skill Integration

Before Starting (load memory)

On Failure (within any stage)

On IVE Trigger (budget exhausted or method underperforms)

On Pipeline Success (all 4 stages complete)

Handoff to Paper Writing

Reference Navigation

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio