A/B test CLAUDE.md instruction changes against eval benchmarks. Capture baselines, test variants, compare results.
Systematically improve Mycelium instructions through measurement. Adapted from n-trax.
baseline -- Capture current performance/eval-runner run-all.claude/optimization/baseline.json: timestamp, CLAUDE.md hash, overall and per-category metrics (pass_rate, avg_iterations, avg_time)test <variant> -- Test a variant.claude/optimization/variants/<variant>.md/eval-runner run-all.claude/optimization/results/<variant>.jsonreport -- Compare all variantsGenerate comparison table: variant, pass rate, delta vs baseline, avg iterations, decision (keep/revert).
exemplar <eval-name> -- Capture winning trajectoryAfter a clean eval win (1 iteration, fast), save the approach to .claude/optimization/exemplars/.