Eval Improve Skill

You are given a diagnosis JSON describing eval failures. Your job is to make the minimum targeted edits to fix the diagnosed issues, then summarize what you changed.

Input

The diagnosis JSON is passed as your input context. It has this structure:

{
  "run_id": "...",
  "pass_rate": 0.83,
  "failures": [
    {
      "task_id": "L1-write-file",
      "category": "TOOL_MISUSE",
      "summary": "Agent used write_file with wrong path",
      "fix_target": "TOOLS.md",
      "fix_suggestion": "Add rule: always use ~/workspace/ prefix for file operations",
      "tool_trace_summary": "...",
      "agent_response": "..."
    }
  ]
}

Each failure has:

task_id — which eval task failed
category — failure classification (PROMPT_GAP, TOOL_MISUSE, TOOL_BUG, PLANNING_ERROR, HALLUCINATION, CONTEXT_MISSING, FORMAT_MISMATCH, TIMEOUT)
summary — human-readable description of what went wrong

Eval Improve Skill

You are given a diagnosis JSON describing eval failures. Your job is to make the minimum targeted edits to fix the diagnosed issues, then summarize what you changed.

Input

The diagnosis JSON is passed as your input context. It has this structure:

{
  "run_id": "...",
  "pass_rate": 0.83,
  "failures": [
    {
      "task_id": "L1-write-file",
      "category": "TOOL_MISUSE",
      "summary": "Agent used write_file with wrong path",
      "fix_target": "TOOLS.md",
      "fix_suggestion": "Add rule: always use ~/workspace/ prefix for file operations",
      "tool_trace_summary": "...",
      "agent_response": "..."
    }
  ]
}

Each failure has:

task_id — which eval task failed
category — failure classification (PROMPT_GAP, TOOL_MISUSE, TOOL_BUG, PLANNING_ERROR, HALLUCINATION, CONTEXT_MISSING, FORMAT_MISMATCH, TIMEOUT)
summary — human-readable description of what went wrong

fix_target pattern	Default file path
`AGENTS.md`	`.eval-workspace/AGENTS.md`
`TOOLS.md`	`.eval-workspace/TOOLS.md`
`SOUL.md`	`.eval-workspace/SOUL.md`
`IDENTITY.md`	`.eval-workspace/IDENTITY.md`
`USER.md`	`.eval-workspace/USER.md`
`tool:{name}`	`picobot/agent/tools/{name}.py`
`tool_schema:{name}`	`picobot/agent/tools/{name}.py` (the JSON schema dict within the file)
`context`	`picobot/agent/context.py`
`eval_task:{id}`	`picobot/eval/tasks/level{N}.yaml` (extract level from the task id prefix)

Eval Improve

Eval Improve Skill

Input

Eval Improve

Eval Improve Skill

Input

Process

Target Resolution

Edit Rules by Target Type

Workspace files (AGENTS.md, TOOLS.md, SOUL.md, USER.md, IDENTITY.md)

Tool implementations (fix_target = `tool:{name}`)

Tool JSON schemas (fix_target = `tool_schema:{name}`)

System prompt assembly (fix_target = `context`)

Eval task definitions (fix_target = `eval_task:{id}`)

Forbidden Files

Output

Principles

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Eval Improve

Eval Improve Skill

Input

Eval Improve

Eval Improve Skill

Input

Process

Target Resolution

Edit Rules by Target Type

Workspace files (AGENTS.md, TOOLS.md, SOUL.md, USER.md, IDENTITY.md)

Tool implementations (fix_target = tool:{name})

Tool JSON schemas (fix_target = tool_schema:{name})

System prompt assembly (fix_target = context)

Eval task definitions (fix_target = eval_task:{id})

Forbidden Files

Output

Principles

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Tool implementations (fix_target = `tool:{name}`)

Tool JSON schemas (fix_target = `tool_schema:{name}`)

System prompt assembly (fix_target = `context`)

Eval task definitions (fix_target = `eval_task:{id}`)