Use this skill when the user wants data where the agent must execute experiments, inspect logs and outputs, and judge whether the intended research results were reproduced. Trigger it for requests like 'generate reproduction verification tasks', 'make result-checking workflows', 'create reproduce.sh evaluation data', or 'give me code-agent tasks about running and validating research experiments'. Do not use it for first-pass paper-to-code implementation only.
[Case 1]
[Case 2]
To synthesize data for this capability, you must strictly follow a 3-phase pipeline. Do not hallucinate steps. Read the corresponding reference file for each phase sequentially:
Phase 1: Environment Exploration
Read the exploration guidelines to discover raw knowledge seeds:
references/EXPLORATION.md
Phase 2: Trajectory Selection
Once Phase 1 is complete, read the selection criteria to evaluate the trajectory:
references/SELECTION.md
Phase 3: Data Synthesis
Once a trajectory passes Phase 2, read the synthesis instructions to generate the final data:
references/SYNTHESIS.md