Autonomous experiment loop: hypothesize > modify > test > evaluate > keep/discard > repeat. Run N experiments automatically with measurable metrics. Works for performance optimization, A/B testing, prompt engineering, and any measurable improvement task.
Autonomous, iterative improvement inspired by Karpathy's autoresearch methodology. Define a metric, set a target, and let the loop run until the target is met or the iteration limit is reached.
1. HYPOTHESIZE -> Form a specific, falsifiable improvement hypothesis
2. MODIFY -> Apply the minimal code/config/prompt change
3. TEST -> Run the measurement suite (benchmarks, tests, evals)
4. EVALUATE -> Compare result against baseline and previous best
5. DECIDE -> KEEP if better, DISCARD (git stash pop --index) if worse
|
Repeat until target met OR max_iterations reached
Each iteration is atomic: one hypothesis, one change, one measurement, one decision.
Define an experiment in your task or in thoughts/EXPERIMENTS.md: