Experiment-planning skill for research papers in systems, networking, and AI. Use when Codex must design or audit baselines, metrics, workloads, ablations, statistical checks, scaling studies, sensitivity analysis, and failure tests so that a paper's claims are actually supported.
Read ../references/workflow.md, ../references/venues.md, and ../references/memory.md.
Before proposing new experiments, load relevant memory if it exists:
Design evaluation from claims backward.
For each claim, specify:
Produce an experiment matrix with columns:
Default rigor checks:
Flag common paper-killing problems:
When the session establishes a durable evaluation rule, baseline policy, or failed direction that should not be relearned next time, propose a project-memory entry for it.