스킬 파일

Autoresearch: Autonomous Iterative Experimentation

Name: Autoresearch: Autonomous Iterative Experimentation
Author: tinspham209

Autonomous iterative experimentation loop for any programming task. Guides the user through defining goals, measurable metrics, and scope constraints, then runs an autonomous loop of code changes, testing, measuring, and keeping/discarding results. Inspired by Karpathy's autoresearch. USE FOR: autonomous improvement, iterative optimization, experiment loop, auto research, performance tuning, automated experimentation, hill climbing, try things automatically, optimize code, run experiments, autonomous coding loop, improve Next.js build, reduce bundle size, speed up pnpm build, optimize Sanity GROQ queries, improve Lighthouse score. DO NOT USE FOR: one-shot tasks, simple bug fixes, code review, or tasks without a measurable metric.

tinspham2090 스타2026. 3. 31.

직업
카테고리: 실험실 도구

스킬 내용

An autonomous experimentation loop for any programming task. You define the goal and how to measure it; the agent iterates autonomously -- modifying code, running experiments, measuring results, and keeping or discarding changes -- until interrupted.

This skill is inspired by Karpathy's autoresearch, generalized from ML training to any programming task with a measurable outcome.

Inut Design Project Context

This skill is running inside the Inut Design codebase — a Next.js 12 production e-commerce site (laptop skins, stickers, print services) based in Da Nang, Vietnam.

Stack: Next.js 12 pages router · React 18 · TypeScript (strict: false) · MUI v5 · Sanity v2 · Zustand · pnpm

Always use pnpm — never npm or yarn. Commands: pnpm lint, pnpm build, pnpm build:analyze.

Protected Flows — Never Modify Without Explicit Approval

Autoresearch: Autonomous Iterative Experimentation

tinspham2090 스타2026. 3. 31.

직업
카테고리: 실험실 도구

스킬 내용

This skill is inspired by Karpathy's autoresearch, generalized from ML training to any programming task with a measurable outcome.

Inut Design Project Context

This skill is running inside the Inut Design codebase — a Next.js 12 production e-commerce site (laptop skins, stickers, print services) based in Da Nang, Vietnam.

Stack: Next.js 12 pages router · React 18 · TypeScript (strict: false) · MUI v5 · Sanity v2 · Zustand · pnpm

Always use pnpm — never npm or yarn. Commands: pnpm lint, pnpm build, pnpm build:analyze.

Protected Flows — Never Modify Without Explicit Approval

관련 스킬

Protected area	File(s)
Cart state + localStorage key `inut-lighters-cart`	`store/cart/lightersCart.ts`
Pricing calculations	`utils/priceCalculator.ts`
Checkout + Sanity order write	`pages/checkout/lighters.tsx`, `api-client/sanity-client.ts`
Dual analytics (GA4 + UmamiJS)	`utils/analytics.ts`, `utils/umamiAnalytics.ts`

Goal	Command	Extract	Direction
Lint errors	`pnpm lint 2>&1 \| grep -c "error"`	number	lower
Build success	`pnpm build`	exit code 0	pass/fail
Build time (s)	`{ time pnpm build; } 2>&1 \| grep real`	seconds	lower
First Load JS (kB)	`pnpm build 2>&1 \| grep "First Load JS shared"`	kB value	lower
Bundle analysis	`pnpm build:analyze`	visual output	qualitative
Regression tests	`pnpm regression`	exit code 0	pass/fail
Lighter regression	`pnpm regression:lighter`	exit code 0	pass/fail

Create a branch: Propose a tag based on today's date (e.g., autoresearch/mar17). Create the branch: git checkout -b autoresearch/<tag>.
Read in-scope files: Read all files that are in scope to build full context of the current state.
Initialize results.tsv: Create results.tsv in the repo root with the header row:
```
experiment	commit	metric	status	description
```
Add results.tsv and run.log to .git/info/exclude (append if not already present) so they stay untracked without modifying any tracked files.
Run the baseline: Execute the metric command on the current unmodified code. Record the result as experiment 0 with status baseline in results.tsv.
Report baseline to the user:

Baseline established: [metric_name] = [value] Starting autonomous experimentation loop.

LOOP:
  1. THINK   - Analyze previous results and the current code.
               Generate an experiment hypothesis.
               Consider: what worked, what didn't, what hasn't been tried.

  2. EDIT    - Modify the in-scope file(s) to implement the idea.
               Keep changes focused and minimal per experiment.

  3. COMMIT  - git add + git commit with a short descriptive message.
               Format: "experiment: <short description of what changed>"

  4. RUN     - Execute the metric command.
               Redirect output to run.log so it does not flood the context window.
               Use shell-appropriate redirection:
               - Bash/Zsh: `<command> > run.log 2>&1`
               - PowerShell: `<command> *> run.log`

  5. MEASURE - Extract the metric from run.log.
               If extraction fails (crash/error), read the last 50 lines
               of run.log for the error.

  6. DECIDE  - Compare metric to the current best:
               - IMPROVED: Keep the commit. Update the "best" baseline.
                 Log status = "keep".
               - SAME OR WORSE: Revert. `git reset --hard HEAD~1`.
                 Log status = "discard".
               - CRASH: Attempt a quick fix (typo, import, simple error).
                 Amend the experiment commit (`git commit --amend`) with the fix
                 and rerun. The experiment keeps its original number.
                 If unfixable after 2 attempts, revert the entire experiment
                 (`git reset --hard HEAD~1`) and log status = "crash".

  7. LOG     - Append a row to results.tsv:
               experiment_number  commit_hash  metric_value  status  description

  8. CONTINUE - Go to step 1.

Print the full results.tsv as a formatted table.
Summarize:
- Total experiments run
- Experiments kept / discarded / crashed
- Starting metric (baseline) vs. final metric
- Improvement percentage
- Top 3 most impactful changes
Show the cumulative git log of kept experiments: git log --oneline <start_commit>..HEAD
Run final verification (Inut Design):
```
pnpm lint && pnpm build
```
If any touched area includes cart, checkout, or Sanity writes, also run:
```
pnpm regression:lighter
```
Recommend next steps: Based on the results, suggest what a human researcher might try next (ideas that were too risky/complex for automated experimentation).

experiment	commit	metric	status	description
0	a1b2c3d	0.997900	baseline	unmodified code
1	b2c3d4e	0.993200	keep	increase learning rate to 0.04
2	c3d4e5f	1.005000	discard	switch to GeLU activation
3	d4e5f6g	0.000000	crash	double model width (OOM)

Parameter	Value
Goal	...
Metric command	...
Metric extraction	...
Direction	lower is better / higher ...
In-scope files	...
Out-of-scope files	...
Constraints	...
Max experiments	...
Simplicity policy	...

Autoresearch: Autonomous Iterative Experimentation

Inut Design Project Context

Protected Flows — Never Modify Without Explicit Approval

Autoresearch: Autonomous Iterative Experimentation

Inut Design Project Context

Protected Flows — Never Modify Without Explicit Approval

Safe Optimization Targets

Inut Design Metric Command Cheatsheet

Agent Behavior Rules

Phase 1: Setup (Interactive)

1.1 Define the Goal

1.2 Define the Metric

1.3 Define the Scope

1.4 Define Constraints

1.5 Define the Experiment Budget (Optional)

1.6 Simplicity Criterion

1.7 Confirm Setup

Phase 2: Branch & Baseline

Phase 3: Experiment Loop

For each experiment:

Experiment Strategy

Handling Constraints

Phase 4: Reporting

Quick Reference

Results TSV Format

Git Workflow

Key Principles

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio