Try Model On Env

技能檔案

Try Model On Env

Smoke-test a model on one sample from a reasoning-gym environment. Shows the exact prompt the model sees, runs inference, then evaluates the environment's own format and correctness reward functions. Use when you want to quickly verify a model or checkpoint works correctly on a specific task.

sol0invictus0 星標2026年2月28日

職業
分類: 機器學習

技能內容

Smoke-test a model against one sample from a reasoning-gym environment.

Arguments

$ARGUMENTS[0] — env_name (required): reasoning-gym task name, e.g. countdown, maze, gsm8k. If missing, ask the user before proceeding.
$ARGUMENTS[1] — model_name (optional): HuggingFace ID or local checkpoint path. Default: Qwen/Qwen2.5-0.5B-Instruct

Steps

Run the diagnostic script from the project root with a 10-minute timeout:

python .claude/skills/try-model-on-env/scripts/diagnostic.py $ARGUMENTS

Show the full output, calling out each phase clearly:
- System prompt check — does it include <think> and <answer> tag instructions?