Set up and run an autonomous experiment loop for any optimization target. Gathers what to optimize, then starts the loop immediately. Use when asked to "run autoresearch", "optimize X in a loop", "set up autoresearch for X", or "start experiments".
Autonomous experiment loop: try ideas, keep what works, discard what doesn't, never stop.
init_experiment — configure session (name, metric, unit, direction). Call again to re-initialize with a new baseline when the optimization target changes.run_experiment — runs command, times it, captures output.log_experiment — records result. keep auto-commits. discard/crash/checks_failed auto-reverts code changes (autoresearch files preserved). Always include secondary metrics dict. Dashboard: ctrl+x.git checkout -b autoresearch/<goal>-<date>autoresearch.md and autoresearch.sh (see below). Commit both.init_experiment → run baseline → log_experiment → start looping immediately.autoresearch.mdThis is the heart of the session. A fresh agent with no context should be able to read this file and run the loop effectively. Invest time making it excellent.
# Autoresearch: <goal>
## Objective
<Specific description of what we're optimizing and the workload.>
## Metrics
- **Primary**: <name> (<unit>, lower/higher is better)
- **Secondary**: <name>, <name>, ...
## How to Run
`./autoresearch.sh` — outputs `METRIC name=number` lines.
## Files in Scope
<Every file the agent may modify, with a brief note on what it does.>
## Off Limits
<What must NOT be touched.>
## Constraints
<Hard rules: tests must pass, no new deps, etc.>
## What's Been Tried
<Update this section as experiments accumulate. Note key wins, dead ends,
and architectural insights so the agent doesn't repeat failed approaches.>
Update autoresearch.md periodically — especially the "What's Been Tried" section — so resuming agents have full context.
autoresearch.shBash script (set -euo pipefail) that: pre-checks fast (syntax errors in <1s), runs the benchmark, outputs METRIC name=value lines to stdout. These lines are automatically parsed by run_experiment — the primary metric (matching init_experiment's metric_name) and any secondary metrics are extracted, shown in the TUI, and suggested as exact values for log_experiment. If no METRIC lines are found, the agent falls back to manually extracting values from the output. Keep the script fast — every second is multiplied by hundreds of runs. Update it during the loop as needed.
autoresearch.config.json (optional)JSON config file that lives in the pi session's working directory (ctx.cwd). Supported fields:
maxIterations (number) — maximum experiments before auto-stopping.workingDir (string) — override the directory for all autoresearch operations: file I/O (autoresearch.jsonl, autoresearch.md, autoresearch.sh, autoresearch.checks.sh, autoresearch.ideas.md), command execution, and git operations. Supports absolute paths or relative paths (resolved against ctx.cwd). The config file itself always stays in ctx.cwd. Fails if the directory doesn't exist.{
"workingDir": "/path/to/project",
"maxIterations": 50
}
autoresearch.checks.sh (optional)Bash script (set -euo pipefail) for backpressure/correctness checks: tests, types, lint, etc. Only create this file when the user's constraints require correctness validation (e.g., "tests must pass", "types must check").
When this file exists:
run_experiment.run_experiment reports it clearly — log as checks_failed.keep a result when checks have failed.checks_timeout_seconds).When this file does not exist, everything behaves exactly as before — no changes to the loop.
Keep output minimal. Only the last 80 lines of checks output are fed back to the agent on failure. Suppress verbose progress/success output and let only errors through. This keeps context lean and helps the agent pinpoint what broke.
#!/bin/bash
set -euo pipefail
# Example: run tests and typecheck — suppress success output, only show errors
pnpm test --run --reporter=dot 2>&1 | tail -50
pnpm typecheck 2>&1 | grep -i error || true
LOOP FOREVER. Never ask "should I continue?" — the user expects autonomous work.
keep. Worse/equal → discard. Secondary metrics rarely affect this.autoresearch.md exists, read it + git log, continue looping.NEVER STOP. The user may be away for hours. Keep going until interrupted.
When you discover complex but promising optimizations that you won't pursue right now, append them as bullets to autoresearch.ideas.md. Don't let good ideas get lost.
On resume (context limit, crash), check autoresearch.ideas.md — prune stale/tried entries, experiment with the rest. When all paths are exhausted, delete the file and write a final summary.
If the user sends a message while an experiment is running, finish the current run_experiment + log_experiment cycle first, then incorporate their feedback in the next iteration. Don't abandon a running experiment.