Analyze skill execution traces to identify issues and automatically evolve/improve skills. Use when users provide trace files (JSON) from skill runs and want to improve skill performance based on real execution data. Triggers on requests like "analyze traces", "evolve skill based on traces", "improve skill from execution history", "find issues in skill traces", or when working with skill trace/log files.
Analyze skill execution traces to discover issues, identify improvement opportunities, and apply fixes to skill files.
Traces are JSON with this structure:
{
"id": "uuid",
"request": "user's original request",
"skills_used": ["skill-name"],
"success": true/false,
"total_turns": 2,
"total_input_tokens": 5000,
"total_output_tokens": 200,
"duration_ms": 7000,
"steps": [
{"role": "assistant", "content": "...", "tool_name": null},
{"role": "tool", "tool_name": "...", "tool_input": {}, "tool_result": "..."}
],
"llm_calls": [
{"turn": 1, "stop_reason": "tool_use", "input_tokens": 2500, "output_tokens": 50}
]
}
This skill can receive two types of input (at least one required):
When both are provided, combine insights: use traces to validate/discover issues and feedback to prioritize and guide fixes.
If traces are provided, run the analysis script:
scripts/analyze_traces.py <traces.json> [--skill <name>] [--format json|text]
Output includes:
If feedback is provided, identify the user's improvement goals and map them to actionable changes.
If both are provided, cross-reference: does the feedback align with trace-discovered issues? Use feedback to prioritize which trace-identified problems to fix first.
For failed or problematic traces, extract full context:
scripts/extract_issue_context.py <traces.json> --failed
scripts/extract_issue_context.py <traces.json> --trace-id <id> --show-llm
scripts/extract_issue_context.py <traces.json> --high-turns
Skip this step if only feedback was provided (no traces).
Map issues to skill components using references/issue-patterns.md:
| Issue Type | Likely Fix Location |
|---|---|
| execution_failure | scripts/, error handling |
| high_turn_count | SKILL.md clarity, add examples |
| tool_errors | scripts/, input validation |
| high_token_usage | SKILL.md verbosity, progressive disclosure |
| repeated_tool_calls | SKILL.md decision trees |
For feedback-only input, map the user's suggestions directly to the appropriate skill components.
Read the target skill and apply changes based on analysis:
Scope constraints — strictly follow:
| Metric | Warning | Action |
|---|---|---|
| success_rate | <90% | Review failures |
| avg_turns | >4 | Simplify workflow |
| avg_tokens | >30000 | Reduce context |
| duration_ms | >60000 | Optimize scripts |
Low success rate:
High turn count:
High token usage: