Run the improvement loop - analyze failures and generate persistent learnings.
Analyze failed queries from an evaluation run and generate improvements.
/improve [run_id]
evals/results/<run_id>/improver agent with failure contextknowledge//improve run_001
│
├─> Load failures from evals/results/run_001/
│
├─> For each failure:
│ ├─> Analyze: What went wrong?
│ ├─> Categorize: Example? Function? Doc?
│ └─> Write: Update knowledge/
│
└─> Report: "Added 3 examples, 1 function, 2 doc entries"
# Improvement Report: [run_id]
## Summary
- Failures analyzed: N
- Improvements generated: M
- Examples: X
- Functions: Y
- Documentation: Z
## Changes Made
### knowledge/examples.md
- Added: [pattern name] for query [q_id]
### knowledge/functions.py
- Added: `function_name()` - [description]
### knowledge/schema.md
- Added: [data fact or edge case discovery]
## Next Steps
- Re-run evaluation to verify improvements
- Command: `/run-baseline test`
evals/results/<run_id>/*.json - Failed evaluation resultsknowledge/* - Existing knowledge (to avoid duplicates)knowledge/examples.md - Worked examplesknowledge/functions.py - Helper functionsknowledge/schema.md - Schema/data discoveries