Run perpetual DxEngine self-improvement loop - evaluate, fix, auto-merge, repeat until interrupted
You are running a perpetual self-improvement loop. This runs indefinitely until the user interrupts. Each cycle: analyze failures → propose data fix → evaluate → auto-merge if improved → repeat.
IMPORTANT: You may ONLY modify data files (data/*.json). Never modify Python code, test vignettes, or evaluation harness code.
Shell variables (N, rejections): These do NOT persist between Bash tool calls. Track them in your own context and substitute literal values into bash commands (e.g., --output state/eval/iter_3.json not --output state/eval/iter_${N}.json).
Ensure you're on master branch:
git checkout master 2>/dev/null || true
Ensure vignettes exist:
ls tests/eval/vignettes/train/*.json 2>/dev/null | wc -l
If empty, generate them:
uv run python tests/eval/generate_vignettes.py
Run baseline evaluation:
mkdir -p state/eval
uv run python .claude/skills/improve/scripts/evaluate.py --output state/eval/baseline.json
Initialize iteration counter: N=0
Initialize consecutive rejection counter: rejections=0
Repeat the following forever. Do NOT stop, do NOT ask the user anything, do NOT present a summary and wait. Just keep going.
uv run python .claude/skills/improve/scripts/analyze_failures.py state/eval/baseline.json --output state/eval/analysis.json
Read the analysis. If $ARGUMENTS.focus is set, filter to matching fixes only.
Priority order:
Pick the fix that would affect the most failing vignettes. Never repeat a fix you already tried and rejected.
Edit the appropriate data file directly on master:
data/likelihood_ratios.json - for LR additions/modificationsdata/disease_lab_patterns.json - for pattern additionsdata/finding_rules.json - for finding rule additionsLR Safety Bounds:
uv run pytest tests/ -x -q
If any test fails, revert (git checkout -- data/) and go back to Step 2 with a different fix.
N=$((N+1))
uv run python .claude/skills/improve/scripts/evaluate.py --output state/eval/iter_${N}.json --quiet
uv run python .claude/skills/improve/scripts/compare_scores.py state/eval/baseline.json state/eval/iter_${N}.json
If ACCEPT (score improved AND no regressions AND no new FPs):
git add data/
git commit -m "improve: [description] (score X.XXXX → Y.YYYY)"
cp state/eval/iter_${N}.json state/eval/baseline.json
Reset rejections=0. Print a one-line status: ✓ Iteration N: [description] (score X.XXXX → Y.YYYY).
If REJECT (score didn't improve OR regressions OR new FPs):
git checkout -- data/
Increment rejections. Print: ✗ Iteration N: [description] - REJECTED ([reason]).
Pause conditions (print status and pause for next /improve invocation):
rejections >= 5 consecutive - print "Paused: 5 consecutive rejections, diminishing returns. Re-run /improve to continue with fresh analysis."Otherwise: go back to Step 1 immediately. Do not stop. Do not summarize. Do not ask the user. Just keep improving.
data/lab_ranges.jsonmaster. No branches, no merge questions.{
"finding_key": {
"description": "Clinical finding description",
"diseases": {
"disease_name": {
"lr_positive": 5.0,
"lr_negative": 0.5
}
}
}
}
{
"disease_name": {
"description": "Disease description",
"pattern": {
"analyte_name": {
"direction": "increased|decreased|normal",
"typical_z_score": 2.5,
"weight": 0.80
}
},
"collectively_abnormal": false,
"prevalence": "1 in 100"
}
}
{
"single_rules": [
{
"test": "analyte_name",
"operator": "gt|lt|gte|lte|above_uln|below_lln|within_range|gt_mult_uln|between",
"threshold": 10.0,
"finding_key": "finding_key_name"
}
]
}