Autonomous experiment loop — pick a metric (test speed, bundle size, build time, type-check time, etc.), then iteratively optimize it by making changes, measuring, keeping improvements, and reverting regressions. Usage: /autoresearch [optimization goal]
Run an autonomous experiment loop to optimize $ARGUMENTS in this monorepo.
Important: this is a Go backend + Next.js frontend monorepo using Nix/direnv. All build/test/lint commands require the Nix environment. If direnv is not active, prefix commands with direnv exec ..
If $ARGUMENTS is empty or vague, ask the user for:
gotest, cd backend/ && go test ./internal/domain/codereview/..., cd apps/app && pnpm build)lower or higher)If $ARGUMENTS is clear enough, infer reasonable defaults and confirm with the user before proceeding. For example:
/autoresearch codereview test speed → command: cd backend/ && go test ./internal/domain/codereview/..., metric: wall-clock seconds (lower is better), scope: backend/internal/domain/codereview//autoresearch app bundle size → command: cd apps/app && pnpm build, metric: bundle size bytes (lower is better), scope: apps/app//autoresearch go build time → command: cd backend/ && go build ./..., metric: wall-clock seconds (lower is better), scope: backend/Look for autoresearch.jsonl and autoresearch.md in the repo root (they may be on disk but gitignored from a previous session).
autoresearch.md to understand what's been tried, read autoresearch.jsonl to reconstruct experiment history. Summarize progress so far, then skip to Phase 2 (the loop). Do NOT re-run setup or overwrite these files..autoresearch/<old-goal-slug>/ for reference, then start freshCreate a branch for this optimization work. Use the repo's branch-create skill:
/branch-create chore/autoresearch-<short-goal-slug>
If already on a non-main branch (e.g. resuming), skip branch creation.
Before writing anything, deeply read all files in scope. Understand the architecture, patterns, and dependencies. This investment pays off — blind changes waste experiment cycles.
Create autoresearch.md at the repo root with this template:
# Autoresearch: <goal>
## Objective
<one-paragraph description of what we're optimizing and why>
## Metric
- **Primary**: <metric name> (<unit>) — <lower|higher> is better
- **Secondary**: <any secondary metrics to track, e.g. "test count must not decrease">
## Command
\`\`\`bash
<the benchmark command>
\`\`\`
## Checks Command
\`\`\`bash
<correctness checks command, if any — e.g. gotest, golint, cd apps/app && pnpm type-check>
\`\`\`
## Files in Scope
<list of directories/files that may be modified>
## Off Limits
<files/patterns that must not be touched>
## Constraints
<things that must not break>
## Baseline
- **Value**: <filled after first run>
- **Date**: <filled after first run>
## What's Been Tried
<filled as experiments run — keep this updated>
## Dead Ends
<approaches that were tried and reverted — don't repeat these>
Create autoresearch.sh at the repo root:
#!/usr/bin/env bash
set -euo pipefail
<the benchmark command>
Make it executable: chmod +x autoresearch.sh
If the user specified constraints that require correctness validation, also create autoresearch.checks.sh:
#!/usr/bin/env bash
set -euo pipefail
<the correctness check command(s)>
Make it executable: chmod +x autoresearch.checks.sh
Default checks: if the user doesn't specify a checks command, create a lightweight autoresearch.checks.sh scoped to the files being optimized — not /preflight (which is too heavy for every iteration). Choose checks that validate correctness without duplicating the benchmark:
cd backend/ && go vet ./... && golangci-lint run (skip go test if the benchmark already runs tests)cd apps/app && pnpm type-check (skip pnpm build if the benchmark already builds)golint for Go changes, lint for frontend changesReserve /preflight for the periodic full validation (step 2.6) and the final stopping check.
Warm-up: run ./autoresearch.sh once and discard the result. The first run of Go commands is always slower due to compilation cache warming and module resolution. Skipping warm-up inflates the baseline.
Run ./autoresearch.sh and extract the metric value. Record this as the baseline.
Log the baseline to autoresearch.jsonl (one JSON object per line):
{"type":"config","name":"<goal>","metric_name":"<name>","metric_unit":"<unit>","direction":"<lower|higher>","timestamp":"<ISO 8601>"}
{"run":1,"metric":<value>,"status":"keep","description":"baseline","timestamp":"<ISO 8601>"}
Update autoresearch.md with the baseline value.
Commit these session files:
git add -f autoresearch.md autoresearch.sh autoresearch.jsonl
git add -f autoresearch.checks.sh 2>/dev/null || true
git commit -m "chore: autoresearch baseline for <goal>"
Note: session files are gitignored (autoresearch.* in .gitignore), so git add -f is needed to force-track them during the session. They get untracked at stop time.
Each iteration:
autoresearch.md — especially "What's Been Tried" and "Dead Ends" — to avoid repeating failed approaches.Make a focused code change. Prefer small, isolated changes — they're easier to reason about and attribute metric changes to.
Rules:
Run the benchmark:
time ./autoresearch.sh 2>&1
Extract the primary metric from the output. Then run correctness checks:
./autoresearch.checks.sh 2>&1
Compare the metric to the previous best value:
| Outcome | Action |
|---|---|
| Metric improved | KEEP — stage and commit the change |
| Metric equal + simpler | KEEP — less code/complexity at same perf is a win |
| Metric equal or worse | DISCARD — revert all changes |
| Benchmark crashed | DISCARD — revert all changes |
| Checks failed | DISCARD — revert all changes, even if metric improved |
Keep (metric improved):
git add -A
git commit -m "perf: <short description of what changed>
Autoresearch run #<N>: <metric_name> <old_value> → <new_value> <unit> (<percentage>% improvement)"
Discard (anything else):
git checkout -- .
git clean -fd
Append to autoresearch.jsonl:
{"run":<N>,"metric":<value>,"status":"<keep|discard|crash|checks_failed>","description":"<what was tried>","timestamp":"<ISO 8601>"}
Update autoresearch.md:
Every 5 kept experiments, run /preflight as a full validation pass. This catches accumulated drift in codegen, formatting, or lint that individual checks might miss. If preflight fails, fix the issues and amend the last commit before continuing.
Go back to step 2.1. Do not stop. Do not ask for permission to continue.
autoresearch.md current. This is the session's memory. A fresh agent should be able to resume from this file alone.This is a Go backend + Next.js frontend monorepo using Nix/direnv. Key patterns:
gotest, golint, bufgen, etc.) need Nix/direnv. If direnv is not active, prefix with direnv exec .backend/):
internal/domain/<name>/{service/,worker/,db/,invoke/}gotest (Nix script, runs go test ./... from backend/)cd backend/ && go test ./internal/domain/<name>/...cd backend/ && go test -tags=integration ./...golint / golint-fixcd backend/ && go build ./...apps/app/):
cd apps/app && pnpm testcd apps/app && pnpm buildtype-check (Nix script)lint (Nix script)gotest or cd backend/ && go test ./internal/domain/<name>/...cd backend/ && go build ./...golintcd apps/app && pnpm build → check .next/ artifactstype-check-count=1 to disable caching (e.g. go test -count=1 ./...), otherwise metrics will be misleading after unchanged runsbufgen → sqlc generate → gomocks (run in order when changing APIs)db/ dirs (sqlc-generated), pkg/proto/ (buf-generated), mock_*.go (gomocks-generated)packages/shared/* can affect multiple frontend consumers — be cautiousbackend/AGENTS.md)When resuming (autoresearch.md + autoresearch.jsonl exist):
autoresearch.md — understand the goal, what's been tried, and dead endsautoresearch.jsonl — reconstruct the full experiment history and current best metricThe user can interrupt at any time. When they do:
/preflight to ensure the final state is clean (codegen, lint, format, tests all pass)autoresearch.md and prepare a summary to include in the PR body (step 7).git rm --cached autoresearch.md autoresearch.sh autoresearch.jsonl
git rm --cached autoresearch.checks.sh 2>/dev/null || true
git commit -m "chore: untrack autoresearch session files"
The files remain in the working directory (already gitignored via autoresearch.* in .gitignore). They are also recoverable from earlier commits on the branch via git show <commit>:autoresearch.md./changelog-add to generate a changelog entry summarizing the optimization work/create-pr to open a pull request. Include the session history from step 3 as a collapsible <details> section in the PR body under "## Autoresearch Log". The /create-pr skill will handle conventional commit title and ask for user approval. Since /preflight and /changelog-add already ran, the create-pr pre-flight checks should pass automatically.