What I do

Run an autonomous cycle of rapid experiments against a single target metric. Each iteration: hypothesize → modify code → run → evaluate → keep or discard. No human in the loop until manually interrupted.

Does not replace apm-ds-exp. That skill is for single, carefully-planned experiments with quality gates and user approval. This skill is for high-volume autonomous iteration.

Setup (with user)

Agree on these parameters before starting:

Run tag — propose based on date (e.g. apr2). Branch: autoresearch/<tag>.
Objective — metric name + direction (minimize or maximize).
Runner — command to execute one experiment (e.g. uv run train.py, python benchmark.py, npm run perf-test).
Metric extraction — how to read the metric from output. Prefer a grep pattern (e.g. grep "^val_loss:" run.log). If the metric is not printed in a greppable format, agree on a parsing approach.

What I do

Does not replace apm-ds-exp. That skill is for single, carefully-planned experiments with quality gates and user approval. This skill is for high-volume autonomous iteration.

Setup (with user)

Agree on these parameters before starting:

Run tag — propose based on date (e.g. apr2). Branch: autoresearch/<tag>.
Objective — metric name + direction (minimize or maximize).
Runner — command to execute one experiment (e.g. uv run train.py, python benchmark.py, npm run perf-test).
Metric extraction — how to read the metric from output. Prefer a grep pattern (e.g. grep "^val_loss:" run.log). If the metric is not printed in a greppable format, agree on a parsing approach.

Apm Autoresearch

What I do

Setup (with user)

Apm Autoresearch

What I do

Setup (with user)

Setup steps

Experiment loop

Simplicity criterion

Results tracking

Crash handling

Guardrails

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio