Analyze eval failure diagnosis and make targeted improvements to picobot workspace files and tool implementations. Given structured diagnosis JSON with failure categories and fix targets, make the minimum edits needed to address each diagnosed issue.
You are given a diagnosis JSON describing eval failures. Your job is to make the minimum targeted edits to fix the diagnosed issues, then summarize what you changed.
The diagnosis JSON is passed as your input context. It has this structure:
{
"run_id": "...",
"pass_rate": 0.83,
"failures": [
{
"task_id": "L1-write-file",
"category": "TOOL_MISUSE",
"summary": "Agent used write_file with wrong path",
"fix_target": "TOOLS.md",
"fix_suggestion": "Add rule: always use ~/workspace/ prefix for file operations",
"tool_trace_summary": "...",
"agent_response": "..."
}
]
}
Each failure has:
task_id — which eval task failedcategory — failure classification (PROMPT_GAP, TOOL_MISUSE, TOOL_BUG, PLANNING_ERROR, HALLUCINATION, CONTEXT_MISSING, FORMAT_MISMATCH, TIMEOUT)summary — human-readable description of what went wrongfix_target — which file to edit (see target resolution below)fix_suggestion — specific suggested fixfix_target so you make one pass per file..py file, run ruff check --fix on it after editing.Map each fix_target value to a file path. The prompt that invokes you will
specify the exact workspace path — look for the "File path overrides" section.
By default workspace files are at .eval-workspace/ relative to your cwd.
| fix_target pattern | Default file path |
|---|---|
AGENTS.md | .eval-workspace/AGENTS.md |
TOOLS.md | .eval-workspace/TOOLS.md |
SOUL.md | .eval-workspace/SOUL.md |
IDENTITY.md | .eval-workspace/IDENTITY.md |
USER.md | .eval-workspace/USER.md |
tool:{name} | picobot/agent/tools/{name}.py |
tool_schema:{name} | picobot/agent/tools/{name}.py (the JSON schema dict within the file) |
context | picobot/agent/context.py |
eval_task:{id} | picobot/eval/tasks/level{N}.yaml (extract level from the task id prefix) |
If a fix_target does not match any of these patterns, skip it and note it in the summary.
Always check the "File path overrides" section in your prompt for the authoritative paths.
Location: .eval-workspace/{filename} (or as specified in the "File path overrides" prompt section)
tool:{name})Location: picobot/agent/tools/{name}.py
fix_suggestion.ruff check --fix picobot/agent/tools/{name}.pytool_schema:{name})Location: picobot/agent/tools/{name}.py (the schema/parameters dict)
context)Location: picobot/agent/context.py
eval_task:{id})Location: picobot/eval/tasks/level{N}.yaml
FORMAT_MISMATCH.exact_match to contains), never weaken the task itself.FORMAT_MISMATCH, skip the edit and note why.Never edit any of these files regardless of what the diagnosis says:
picobot/eval/runner.pypicobot/eval/graders/*picobot/eval/schema.pypicobot/eval/diagnose.pypicobot/eval/improve.pypicobot/eval/baseline.pypicobot/cli/commands.pypicobot/agent/loop.py (unless fix_target is literally loop)If a fix_target resolves to a forbidden file, skip it and note in the summary:
"Skipped {fix_target}: forbidden file."
After all edits are complete, print a structured summary in this exact format:
## Changes Made
- **{file}**: {description of change} (fixes {task_id} {category})
- **{file}**: {description of change} (fixes {task_id} {category})
If any fixes were skipped, add a section:
## Skipped
- **{task_id}**: {reason for skipping}
Examples:
## Changes Made
- **TOOLS.md**: Added rule to always use ~/workspace/ prefix for file paths (fixes L1-write-file TOOL_MISUSE)
- **picobot/agent/tools/filesystem.py**: Added path normalization in write_file (fixes L2-multi-file TOOL_BUG)
## Skipped
- **L3-eval-runner**: fix_target "runner.py" is a forbidden file
fix_suggestion is unclear or seems risky, skip it and explain why in the Skipped section.