After solving a non-trivial problem, detect generalizable learnings and propose skill updates so future interactions benefit automatically. Always active — applies to every interaction.
Skills improve through a three-phase lifecycle. The agent operates in one phase at a time depending on whether ground truth is available.
You MUST evaluate whether to enter the skill evolution workflow when ANY of these events occur during a conversation:
When a trigger fires: Finish solving the user's problem first, then evaluate whether the learning is generalizable (not user-specific) before entering Phase 1 or Phase 2.
Do NOT trigger for: Trivial typos, user-specific data/paths, one-off configuration issues, or problems already covered by existing skills.
Enter this phase when you can score your output — a ground truth answer exists, a test suite passes/fails, or a known-correct result can be compared against.
Inside the learning phase, run an evolutionary loop before proposing anything:
skills/*/SKILL.md)The sandbox is conceptual for interactive agents (Cursor, Claude Code): iterate internally before presenting to the user. Do not propose on the first attempt if the score failed. For CI/batch contexts, the sandbox is literal — experimental skill modifications in a temp directory, validated by running tests, then promoted.
Use whatever ground truth is available:
| Ground truth | How to score |
|---|---|
| Behavioral tests | must_include / must_not_include patterns pass |
| Code execution | solution.py runs without error, produces expected output |
| Solver status | cuOpt returns Optimal / FeasibleFound / SUCCESS |
| Constraint satisfaction | All constraints in the formulation are met |
| Known answer | Output matches the expected value within tolerance |
If no ground truth is available, you are in Phase 2 (inference), not Phase 1.
When the score passes, distill the learning into a skill artifact. Two types:
Markdown (SKILL.md patches) — gotchas, patterns, examples, table rows:
skills/*/SKILL.md would benefitCode (assets/*.py) — reusable helper functions, reference solutions:
skills/*/assets/ alongside existing assetsci/test_skills_assets.shAlways place the learning in the single skill where it has the widest effect. Do NOT duplicate the same content across multiple skills.
Choose the target using this priority:
lp-milp-formulation, routing-formulation, cuopt-user-rules) — if the learning applies regardless of language or interface, put it here. All downstream API skills already read the common skill.cuopt-lp-milp-api-python, cuopt-routing-api-python) — if the learning is specific to one API or language.If a gotcha affects both Python and C users but is about the solver behavior (not the API), it belongs in the common formulation skill, not in both api-python and api-c.
Present to the user as:
Skill update proposal:
Skill: skills/<name>/SKILL.md (or skills/<name>/assets/<file>.py)
Type: markdown | code
Phase: learning (scored)
Section: <where it goes>
Trigger: <what happened that surfaced this>
Score: <how it was validated — e.g. "solver returned Optimal", "test passed">
Change: <the exact content to add or modify>
Only apply after the user approves. If the user declines, do not persist.
Enter this phase during normal user interactions where no ground truth exists to score against.
Read and apply skills (including any content added by prior learning phases) to solve the user's problem.
While solving, note insights — observations that could not be scored but may be valuable:
Present insights to the user as lower-confidence proposals, clearly marked:
Skill insight (unscored):
Skill: skills/<name>/SKILL.md
Type: markdown | code
Phase: inference (unscored)
Section: <where it goes>
Trigger: <what happened>
Change: <the exact content to add or modify>
Note: This was not validated against ground truth. Review carefully.
The user may approve, decline, or defer for offline reflection.
After inference interactions, review accumulated insights to find patterns.
Phase: reflection (pattern-validated)Every change made through skill evolution MUST be tagged so its origin is traceable.
Wrap added content with start and end boundary markers so it is easy to locate, review, and remove:
<!-- skill-evolution:start — <short trigger description> -->
<added content>
<!-- skill-evolution:end -->
For example, a new table row:
<!-- skill-evolution:start — large objective recursion fix -->
| Maximum recursion depth | Building big expr with chained `+` | Use `LinearExpression(vars_list, coeffs_list, constant)` |
<!-- skill-evolution:end -->
Or a new subsection:
<!-- skill-evolution:start — warmstart gotcha -->
### Warmstart gotcha
Content here...
<!-- skill-evolution:end -->
When skill evolution creates an entirely new skill directory, add origin: skill-evolution to the YAML frontmatter:
---