Run a strict multi-iteration Reflex Loop with phases (PLAN, PRODUCE||PREPARE, EVALUATE, CRITIQUE, REFINE) to improve an artifact until quality gates pass or iteration limits are reached. Use when user asks for iterative refinement, quality-gated generation, or "generate -> critique -> refine" loops.
Run a result-focused iterative loop with strict phase contracts, evaluation rules, and persistent state between sessions.
Terminology:
run.json, identified by run_id)Each iteration executes 6 phases with parallel execution where possible:
PLAN - short plan for current iterationPRODUCE - produce one artifact.md ← runs in parallel with PREPAREPREPARE - generate check scripts and test definitions from rules ← runs in parallel with PRODUCEEVALUATE - run prepared checks + content rules against artifact, score result. Uses parallel Task agents for independent check groupsCRITIQUE - precise issues + fixes (only if fail)REFINE - rewrite artifact using critique (only if fail) PLAN
│
┌──────┴──────┐
↓ ↓ ← parallel (Task tool)
PRODUCE PREPARE
(artifact) (checks)
↓ ↓
└──────┬──────┘
↓
EVALUATE ← parallel check execution (Task tool)
┌───┼───┐
↓ ↓ ↓
exec content aggregate
└───┼───┘
↓
CRITIQUE (if fail)
↓
REFINE (if fail)
Stop when quality is good enough, no major issues remain, or iteration limit is reached.
Use exactly 3+1 files for state (where current.json exists only while a loop is active):
.ai-factory/evolution/current.json
.ai-factory/evolution/<task-alias>/run.json
.ai-factory/evolution/<task-alias>/history.jsonl
.ai-factory/evolution/<task-alias>/artifact.md
Do not create extra index files or per-iteration folder trees unless user explicitly asks.
current.json: pointer to active loop only; delete it when loop becomes completed/stopped/failedrun.json: single source of truth for current loop statehistory.jsonl: append-only event log (one JSON object per line)artifact.md: single source of truth for artifact content (written after PRODUCE and REFINE phases, never duplicated in run.json)Parse $ARGUMENTS:
status - show active loop status from current.json and stopresume [alias] - continue active loop or loop by aliasstop [reason] - stop active loop with reason (user_stop if omitted)new <task> or no mode + task text - start new looplist - list all task aliases with status (running/stopped/completed/failed)history [alias] - show event history for a loop (default: active loop)clean [alias|--all] - remove loop files for a stopped/completed/failed loop (requires user confirmation, always confirm before deleting)If no task and no active loop exists, ask user for task prompt.
Read these files if present:
.ai-factory/DESCRIPTION.md.ai-factory/ARCHITECTURE.md.ai-factory/RULES.mdUse them to keep outputs aligned with project conventions.
Read .ai-factory/skill-context/aif-loop/SKILL.md — MANDATORY if the file exists.
This file contains project-specific rules accumulated by $aif-evolve from patches,
codebase conventions, and tech-stack analysis. These rules are tailored to the current project.
How to apply skill-context rules:
Enforcement: After generating any output artifact, verify it against all skill-context rules. If any rule is violated — fix the output before presenting it to the user.
If command is status, stop, list, history, or clean, execute and stop:
status: read current.json; if file exists, read pointed run.json and display alias | status | iteration | phase | current_step | last_score | updated_at; if file is missing, report that no loop is activestop [reason]: stop active running loop only; set run.json.status = "stopped" and run.json.stop.reason = <reason or "user_stop">, append stopped event to history.jsonl, then delete current.json (active pointer cleared) and exitlist: scan .ai-factory/evolution/ directories, read each run.json, display table of alias | status | iteration | last_score | updated_athistory [alias]: read history.jsonl for the alias (or active loop), display formatted event timelineclean [alias|--all]: show what will be deleted, ask for explicit user confirmation via AskUserQuestion, then delete loop directory. Only clean stopped/completed/failed loops — refuse to clean running loops. Update current.json if needed.mkdir -p .ai-factory/evolution
Generate:
task_alias: lowercase hyphen slug (3-64 chars)run_id: <task_alias>-<yyyyMMdd-HHmmss>current.json{
"active_run_id": "courses-api-ddd-20260218-120000",
"task_alias": "courses-api-ddd",
"status": "running",
"updated_at": "2026-02-18T12:00:00Z"
}
run.json{
"run_id": "courses-api-ddd-20260218-120000",
"task_alias": "courses-api-ddd",
"status": "running",
"iteration": 1,
"max_iterations": 4,
"phase": "A",
"current_step": "PLAN",
"task": {
"prompt": "OpenAPI 3.1 spec + DDD notes + JSON examples",
"ideal_result": "..."
},
"criteria": {
"name": "loop_default_v1",
"version": 1,
"phase": {
"A": { "threshold": 0.8, "active_levels": ["A"] },
"B": { "threshold": 0.9, "active_levels": ["A", "B"] }
},
"rules": []
},
"plan": [],
"prepared_checks": null,
"evaluation": null,
"critique": null,
"stop": { "passed": false, "reason": "" },
"last_score": 0,
"stagnation_count": 0,
"created_at": "2026-02-18T12:00:00Z",
"updated_at": "2026-02-18T12:00:00Z"
}
When resuming a loop:
run.json to get current_step and iterationhistory.jsonl to confirm consistencyrun.json.current_step indicates a phase was interrupted:
PRODUCE_PREPARE: always re-run both PRODUCE and PREPARE (idempotent — artifact overwrites, checks regenerate)run.json.status is stopped, completed, or failed, inform user and suggest new (for failed runs, also show the last phase_error event from history.jsonl so user understands what went wrong)If the task prompt contains enough context to infer task type and ideal result:
references/CRITERIA-TEMPLATES.md4)AskUserQuestion, even if criteria were already present in the task textAskUserQuestion, even if iteration count was already present in the task textCritical guardrail:
Ask concise setup questions before first iteration:
references/CRITERIA-TEMPLATES.md4)Generate evaluation rules from answers:
references/CRITERIA-TEMPLATES.md as starting pointPersist answers and generated rules inside run.json.criteria (snapshot for reproducibility).
Never treat criteria or iteration limits parsed from task text as final until the user explicitly confirms both.
Normalization rules before persisting:
run.json.max_iterations is the single source of truth for iteration limitid, description, severity, weight, phase, check)weight, derive from severity (fail=2, warn=1, info=0)Before running phases, load:
references/PHASE-CONTRACTS.md - strict I/O contracts for each phasereferences/RULE-SCHEMA.md - rule format and score calculationPLAN - generates iteration plan (sequential)PRODUCE - generates artifact (parallel with PREPARE)PREPARE - generates check scripts/definitions from rules + task prompt (parallel with PRODUCE)EVALUATE - runs prepared checks + content rules, aggregates score (parallel check groups via Task)CRITIQUE - identifies issues with fix instructions (sequential, only on fail)REFINE - applies fixes to artifact (sequential, only on fail)Two levels of parallelism via Task tool:
Task agents after PLAN completes. Both depend only on PLAN output.Task agents for independent check groups (executable checks via Bash, content rules via Read/Grep). Aggregates results into final score.Each phase produces its defined output (see PHASE-CONTRACTS.md). No envelope wrapping. No router output.
For each iteration:
run.json.current_step = "PLAN", run PLAN phaserun.json.current_step = "PRODUCE_PREPARE", launch both as parallel Task agents:
artifact.mdrun.json.current_step = "EVALUATE", run EVALUATE phase:
Task agents for independent check groups:
Task with BashTask with Read/Greppassed=false:
run.json.current_step = "CRITIQUE", run CRITIQUE phaserun.json.current_step = "REFINE", run REFINE phaseartifact.mdphase=A and passed=true:
phase=B, activate B-level rulesrun.json.current_step = "PREPARE", re-run PREPARE with phase=B to materialize B-level checks (no PLAN/PRODUCE — artifact already passed A)run.json.current_step = "EVALUATE", run EVALUATE against the same artifact with B-level prepared checksthreshold_reached)phase=B and passed=true:
threshold_reached)If Task tool is unavailable or returns errors, fall back to sequential execution: PLAN → PRODUCE → PREPARE → EVALUATE → CRITIQUE → REFINE. The loop must work without parallelism.
Stop when any condition is met:
phase=B and passed=true (reason=threshold_reached)fail-severity rules failed in current evaluation (reason=no_major_issues) — even if score is below threshold, the artifact has no blocking issues and only warn/info remainiteration >= run.max_iterations (reason=iteration_limit)reason=user_stop)reason=stagnation)Track score progress:
delta = score - last_scoredelta < 0.02 and there are no severity fail blockers, increment stagnation_countstagnation_count >= 2, stop with stagnationAfter each phase output:
run.json (including current_step)history.jsonlcurrent.json.updated_atartifact.md to disk after PRODUCE and REFINE phasesartifact.md, save a SHA-256 hash of the previous artifact in the refinement_done event payload as "previous_artifact_hash" (enables integrity verification without bloating history)Event names:
run_startedplan_createdartifact_createdchecks_preparedevaluation_donecritique_donerefinement_donephase_switchediteration_advancedphase_errorstoppedfailedhistory.jsonl example line:
{"ts":"2026-02-18T12:01:10Z","run_id":"courses-api-ddd-20260218-120000","iteration":1,"phase":"A","step":"EVALUATE","event":"evaluation_done","status":"ok","payload":{"score":0.72,"passed":false}}
After the loop stops (any reason):
iteration, max_iterations, phase, final score, stop reason)stop reason = iteration_limit and latest evaluation has passed=false, include mandatory distance-to-success details:
threshold - score, floor at 0)fail-severity rule count + blocking rule IDspassed_rules / total_rules).ai-factory/evolution/<alias>/artifact.md)$aif-plan to implement it$aif-verify to check it$aif-docs to integrate itrun.json.status based on stop reason, and if current.json points to this loop, delete current.json (no active loop remains):| Stop reason | Status |
|---|---|
threshold_reached | completed |
no_major_issues | completed |
user_stop | stopped |
iteration_limit | stopped |
stagnation | stopped |
phase_error | failed |
Show a compact summary after each iteration — do NOT dump full run.json or artifact.md content into the conversation. The artifact is already on disk; duplicating it wastes context.
── Iteration {N}/{max} | Phase {A|B} | Score: {score} | {PASS|FAIL} ──
Plan: {1-line summary of plan focus}
Hash: {first 8 chars of artifact SHA-256}
Changed: {list of added/modified sections, or "initial generation"}
Failed: {comma-separated rule IDs, or "none"}
Warnings: {comma-separated rule IDs, or "none"}
Artifact: .ai-factory/evolution/<alias>/artifact.md
Hash — lets the user verify which version they're looking at without reading the full artifactChanged — shows what actually moved between iterations so regressions are visible from the summary aloneIf passed=false, append a compact critique summary (rule ID + 1-line fix instruction per issue). Do not repeat the full artifact or full evaluation object.
When the loop terminates with reason=iteration_limit and passed=false, append a compact distance_to_success block to the final response.
Show the full artifact content (not just summary) in these cases only:
The loop generates significant context per iteration (subagent results, evaluation data, critique). After several iterations the conversation context grows large, degrading LLM quality.
All loop state is persisted to disk — clearing context loses nothing. The resume command fully reconstructs state from files.
Recommend clearing context to the user in these situations:
iteration >= 3 — context is already heavyAfter the iteration summary, append:
💡 Context is growing. Recommended: /clear then $aif-loop resume
All state is saved on disk — nothing will be lost.
Do not force or auto-clear. The user decides. If the user ignores the recommendation, continue normally.
If a phase produces output that does not match its contract:
history.jsonl with event phase_errorreason=phase_error and display the errorrun.jsonIf run.json is missing or unparseable:
history.jsonl to reconstruct the last known staterun.json from the most recent events (last iteration, phase, score, etc.)history.jsonl is also missing/empty, inform user and suggest starting a new looprun.json is the only source of current state truth (does NOT store artifact content)artifact.md on disk is the single source of truth for artifact content — never duplicate it in run.jsonhistory.jsonl is append-only; do not edit old eventsrun.json into conversation.$aif-loop new OpenAPI 3.1 spec + DDD notes + JSON examples
$aif-loop resume
$aif-loop resume courses-api-ddd
$aif-loop status
$aif-loop stop
$aif-loop list
$aif-loop history
$aif-loop history courses-api-ddd
$aif-loop clean courses-api-ddd
$aif-loop clean --all