技能档案

Debug Stuck Eval

Name: Debug Stuck Eval
Author: METR

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.

METR24 星标2026年3月3日

职业
分类: 教育

技能内容

Quick Checklist

Verify auth: hawk auth access-token > /dev/null || echo "Run 'hawk login' first"
Get eval-set-id from user
Check status: hawk status <eval-set-id> - JSON report with pod state, logs, metrics
View logs: hawk logs <eval-set-id> or hawk logs -f for follow mode
List samples: hawk list samples <eval-set-id> - see completion status
Look for error patterns (see below)
Test API directly if logs show retries without clear errors

Error Patterns

Log Pattern	Meaning	Resolution
`[uuid task/id/epoch model] Retrying request to /responses`	OpenAI SDK retry with sample context	Test API directly with curl to see real error

相关技能

Debug Stuck Eval | Skills Pool

TOKEN=$(hawk auth access-token)

# Test through middleman
curl --max-time 300 -X POST https://middleman.internal.metr.org/anthropic/v1/messages \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"model": "claude-sonnet-4-20250514", "max_tokens": 100, "messages": [{"role": "user", "content": "Say hello"}]}'

# Test OpenAI-compatible
curl --max-time 300 -X POST https://middleman.internal.metr.org/openai/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Say hello"}], "max_tokens": 100}'

# Delete stuck eval and restart
hawk delete <eval-set-id>
hawk eval-set <config.yaml>

Debug Stuck Eval

Quick Checklist

Error Patterns

Debug Stuck Eval

Quick Checklist

Error Patterns

Key Techniques

Test API Directly

Recovery

HTTP Retry Count

More Details

References

Filing Issues

Gemini

Coding Agent (bash-first)

Voice Call

Things Mac

Sag

Clawhub