Autonomous improvement loop
Communicate with the user in their language. Detect from their messages or system locale.
Parse the first argument after /sindri:
| Input | Action |
|---|---|
init | Run Init flow below |
loop | Run Loop flow below |
cycle | Run Cycle flow below |
| (none) | Show available commands and help user pick |
If no argument, tell the user:
/sindri init— Set up improvement loop (explore project, design evaluate, scaffold .sindri/)/sindri loop— Start continuous experiment loop/sindri cycle— Run one experiment cycle
Set up an autonomous improvement loop in the current project. The most critical output is a well-designed evaluate function.
Understand the project before asking anything. Use Bash, Glob, Grep, Read — not AskUserQuestion.
Build a mental model of:
After exploration, summarize what you found and suggest potential improvement areas. Let the user pick or refine the goal before proceeding.
Run sindri init, then update .sindri/config.yaml with what you learned:
artifact — detected source directory or filesrun — detected from package.json, Makefile, etc.timeout — estimated from project complexityShow the config to the user for confirmation.
This is the most important step. A bad metric makes the entire loop useless.
Use AskUserQuestion to have a deep conversation. Ask informed questions based on Phase 1.
Start with the goal:
Determine measurability:
Can you measure the result as a number?
├── Yes: What number? (ms, %, count, KB...)
│ ├── Available immediately? → Direct measurement
│ └── Takes days/weeks? → Find a leading indicator
│ "What's the earliest signal that tells you it's working?"
│
└── No: What does "better" mean to you?
└── "If you were reviewing this yourself, what would you check?"
→ Decompose into T/F checklist (5-10 yes/no items)
→ NEVER use numeric scores (1-10). LLM scores drift.
If hybrid: ask how much weight each dimension deserves.
Write the function. Patterns:
// Direct measurement
export function evaluate(): number {
const ms = benchmark()
return 1000 / ms
}
// T/F checklist (for subjective criteria)
export async function evaluate(): Promise<number> {
const checks = await llm.checkAll(output, checklist)
return checks.filter(Boolean).length / checks.length
}
// Hybrid
export async function evaluate(): Promise<number> {
return measure() * 0.6 + await checklist() * 0.4
}
Show the user and get explicit confirmation before moving on.
If the metric requires time to accumulate (ad CTR, A/B test, SEO), recommend a cycle interval.
Ask: "How long does it take for meaningful data to come in after a change?"
Set schedule in config.yaml (seconds between cycles):
Tell the user to use /sindri cycle periodically instead of /sindri loop.
Fill in the Domain Context section of .sindri/agents.md:
Tell the user:
sindri is ready. Start the loop:
/sindri loop(continuous) or/sindri cycle(one at a time)
Start or resume the continuous experiment loop.
.sindri/ exist? If not, run /sindri init first..sindri/evaluate.ts contain real logic (not just return 0)? If not, tell the user..sindri/agents.md — this is your complete operating manualRun exactly ONE experiment cycle and stop.
For delayed feedback domains where data needs time to accumulate between cycles.
.sindri/ exist? If not, run /sindri init first..sindri/evaluate.ts contain real logic? If not, tell the user..sindri/agents.md.sindri/results/<branch>.jsonl for history