Name: Experiment Planning
Author: shawnroos

Experiment Planning for Nerd

Parameter is wrong: a different value would improve the metric
Model is wrong: a different model (not just parameters) would fit better
Feature is unnecessary: removing it entirely causes no degradation

Every Plan Needs

Competing theories (3+) — not just "is X optimal?" but "what's really going on?"
Testable predictions — what we'd observe if each theory is correct
Ablation baseline — what happens if we remove the feature entirely?
Metric — specific, computable (F1, nDCG, latency, token count)
Sweep spec — ranges, steps, --max-combos cap
Ground truth — how "correct" is defined, with circularity caveats
Theory-linked acceptance criteria — each theory can be confirmed or rejected

Competing theories (3+) — not just "is X optimal?" but "what's really going on?"
Testable predictions — what we'd observe if each theory is correct
Ablation baseline — what happens if we remove the feature entirely?
Metric — specific, computable (F1, nDCG, latency, token count)
Sweep spec — ranges, steps, --max-combos cap
Ground truth — how "correct" is defined, with circularity caveats
Theory-linked acceptance criteria — each theory can be confirmed or rejected