Delegate analysis tasks to GitHub Copilot CLI as a parallel subagent for MotrixLab RL project. Handles automated policy playback frame capture, VLM-based visual behavior analysis, screenshot analysis, image file inspection, simulation frame interpretation, reward curve analysis, and general research conversations.
Use the Copilot CLI subagent for analysis tasks in MotrixLab RL workflows:
NOT for: Training execution,
train.py/view.py/play.pycommands, or TensorBoard launching.
Use this skill for , not execution.
IMPORTANT: Always use the free gpt-4.1 model:
copilot --model gpt-4.1 ...
| Model | Cost | Use Case |
|---|---|---|
gpt-4.1 | Free | Always use this |
gpt-5, claude-opus-4.5 | Premium | Avoid unless explicitly requested |
The capture_vlm.py script automates the full pipeline: play policy → capture frames → send to VLM → get visual analysis report.
# Play best policy, capture 20 frames, analyze with gpt-4.1
uv run scripts/capture_vlm.py --env <env-name>
# Specify policy, capture settings, and custom VLM focus
uv run scripts/capture_vlm.py --env <env-name> \
--policy runs/<env-name>/.../best_agent.pt \
--capture-every 30 --max-frames 30 \
--vlm-prompt "Focus on leg coordination and whether the robot reaches the target"
# Capture only (no VLM), analyze later manually
uv run scripts/capture_vlm.py --env <env-name> --no-vlm
# Use a different VLM model
uv run scripts/capture_vlm.py --env <env-name> --vlm-model gpt-4.1
| Flag | Default | Description |
|---|---|---|
--env | (required) | Environment name |
--policy | auto-discover | Policy checkpoint path |
--train-backend | torch | Inference backend (torch/jax) |
--num-envs | 1 | Parallel envs for playback |
--capture-every | 15 | Capture frame every N sim steps |
--max-frames | 20 | Total frames to capture |
--warmup-steps | 30 | Steps before capture starts |
--capture-delay | 0.15 | Seconds delay before screenshot |
--no-vlm | false | Skip VLM, only save frames |
--vlm-model | gpt-4.1 | Copilot CLI model |
--vlm-prompt | (default) | Custom analysis focus |
--vlm-batch-size | 10 | Frames per VLM call |
--output-dir | auto | Frame output directory |
--policy)starter_kit_log/vlm_captures/{env}/{timestamp}/gpt-4.1) with a structured analysis promptvlm_analysis.md covering:
starter_kit_log/vlm_captures/<env-name>/<timestamp>/
├── frame_00045.png # Captured simulation frames
├── frame_00060.png
├── frame_00075.png
├── ...
├── capture_metadata.txt # Run configuration
└── vlm_analysis.md # VLM analysis report
# Train, then immediately check visual quality
uv run scripts/train.py --env <env-name> --train-backend torch
uv run scripts/capture_vlm.py --env <env-name> --max-frames 15
# Capture frames from policy A
uv run scripts/capture_vlm.py --env <env-name> \
--policy runs/.../checkpoint_A.pt --output-dir analysis/policy_a
# Capture frames from policy B
uv run scripts/capture_vlm.py --env <env-name> \
--policy runs/.../checkpoint_B.pt --output-dir analysis/policy_b
# Compare with VLM
copilot --model gpt-4.1 --allow-all \
--add-dir analysis/policy_a --add-dir analysis/policy_b \
-p "Compare robot behavior between policy_a/ and policy_b/ frames. Which policy has better gait, navigation, and stability?" -s
# Just capture
uv run scripts/capture_vlm.py --env <env-name> --no-vlm
# Analyze specific frames later
copilot --model gpt-4.1 --allow-all \
--add-dir starter_kit_log/vlm_captures/<env-name>/latest \
-p "Examine frame_00060.png and frame_00075.png — the robot seems to stumble. What's happening?" -s
# VLM focus on leg issues
uv run scripts/capture_vlm.py --env <env-name> \
--vlm-prompt "The robot's rear legs seem to drag. Focus on rear leg joint angles and contact patterns."
# VLM focus on navigation failures
uv run scripts/capture_vlm.py --env <env-name> \
--vlm-prompt "The robot circles instead of going straight to target. Analyze heading and path curvature."
$result = copilot --model gpt-4.1 --allow-all -p "<prompt>" -s
| Flag | Purpose |
|---|---|
--model gpt-4.1 | Use free model |
--allow-all | Grant all permissions (file access, tools) |
-p "<prompt>" | Execute this prompt non-interactively |
-s | Silent mode (output only agent response) |
# Add MotrixLab directories for context
copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\starter_kit\<task> -p "<prompt>" -s
# Multiple directories
copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\starter_kit --add-dir d:\MotrixLab\runs -p "<prompt>" -s
# Start interactive session for complex analysis
copilot --model gpt-4.1 --allow-all -i "Let's analyze the VBot navigation reward structure together"
# After view.py or train.py --render is running, capture window
# Use external screenshot tool, then analyze:
$analysis = copilot --model gpt-4.1 --allow-all -p "Look at d:\MotrixLab\screenshots\vbot_render.png. Describe: 1) Robot pose and stance 2) Terrain features visible 3) Distance from goal marker 4) Any collision or instability signs" -s
# Analyze multiple frames from a training run
$frames = copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\renders\episode_001 -p "Examine all PNG frames in this directory. Create a timeline of robot behavior: stance changes, terrain traversal progress, any falls or recovery attempts." -s
# Re-analyze previously captured frames with a new prompt
$captureDir = "starter_kit_log/vlm_captures/<env-name>/<timestamp>"
copilot --model gpt-4.1 --allow-all --add-dir $captureDir -p "Re-examine these policy evaluation frames. This time focus specifically on: 1) Whether the robot reaches the target platform 2) Any reward hacking behavior 3) Energy efficiency of the gait" -s
# Capture pre and post action states
$comparison = copilot --model gpt-4.1 --allow-all -p "Compare d:\MotrixLab\screenshots\before_stairs.png and d:\MotrixLab\screenshots\after_stairs.png. Did the VBot successfully ascend? What gait pattern is visible?" -s
# When robot falls or fails navigation
$diagnosis = copilot --model gpt-4.1 --allow-all -p "Look at d:\MotrixLab\screenshots\failure_frame.png. What caused the VBot to fail? Check: 1) Leg configuration 2) Body orientation 3) Terrain interaction 4) Likely cause of termination" -s
# Analyze competition instructions
$task_info = copilot --model gpt-4.1 --allow-all -p "Read d:\MotrixLab\starter_kit\<task>\<task>.pdf. Summarize: 1) Task objectives 2) Terrain description 3) Scoring criteria 4) Time limits 5) Robot constraints" -s
# Side-by-side comparison
$comparison = copilot --model gpt-4.1 --allow-all -p "Read all PDF files from d:\MotrixLab\starter_kit\. Create a comparison table of: terrain complexity, distance, time limits, scoring weights." -s
# If you exported TensorBoard plots as images
$rewards = copilot --model gpt-4.1 --allow-all -p "Look at d:\MotrixLab\analysis\reward_curve.png. Describe the training progress: 1) Convergence pattern 2) Reward scale 3) Any plateaus or instabilities 4) Estimated episodes to convergence" -s
# Multiple metric plots
$metrics = copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\analysis\tensorboard_exports -p "Examine all training plots. For each metric (episode_reward, policy_loss, value_loss, entropy): describe trend, identify anomalies, suggest hyperparameter adjustments." -s
# Compare two different hyperparameter settings
$comparison = copilot --model gpt-4.1 --allow-all -p "Compare reward curves: d:\MotrixLab\analysis\run_lr001.png vs d:\MotrixLab\analysis\run_lr0001.png. Which learning rate produces faster convergence? More stable training?" -s
# Deep dive into reward function design
$rewards = copilot --model gpt-4.1 --allow-all -p "Read d:\MotrixLab\starter_kit\<task>\vbot\cfg.py. Analyze the RewardConfig class: 1) List all reward components with weights 2) Identify potential reward hacking risks 3) Suggest improvements for navigation task" -s
# Understand the neural network structure
$arch = copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\motrix_rl\src\motrix_rl -p "Find the PPO policy network definition. Describe: layer sizes, activation functions, observation preprocessing, action output format." -s
# Understand terrain and physics setup
$scene = copilot --model gpt-4.1 --allow-all -p "Read d:\MotrixLab\starter_kit\<task>\vbot\xmls\<scene>.xml. Describe: 1) Terrain geometry 2) Obstacle placements 3) Goal marker positions 4) Physics parameters" -s
# Ask subagent to research while you work on something else
$research = copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab -p "Research question: What curriculum learning strategies would help VBot learn stair climbing? Consider the reward structure in cfg.py and suggest a 3-stage curriculum." -s
Write-Host "Research findings: $research"
# Have subagent analyze your hypothesis
$test = copilot --model gpt-4.1 --allow-all -p "Hypothesis: The VBot fails on stairs because heading_tracking reward conflicts with position_tracking when approaching stairs at an angle. Analyze cfg.py reward weights and confirm or refute this." -s
# Connect project to RL best practices
$lit = copilot --model gpt-4.1 --allow-all -p "Compare the PPO hyperparameters in d:\MotrixLab\motrix_rl\src\motrix_rl\skrl\cfg.py against recommended settings from Schulman et al. (2017) and SKRL documentation. What should be adjusted for locomotion tasks?" -s
The subagent is stateless - each invocation is independent. To exchange information:
# Get analysis result and use it
$reward_analysis = copilot --model gpt-4.1 --allow-all -p "What is the termination penalty in <task> cfg.py?" -s
if ($reward_analysis -match "termination") {
Write-Host "Termination penalty detected - robot will be conservative"
}
# First: identify the problem
$problem = copilot --model gpt-4.1 --allow-all -p "Look at d:\MotrixLab\screenshots\failure.png. What type of failure is this?" -s
# Second: suggest fix based on problem
$solution = copilot --model gpt-4.1 --allow-all -p "The VBot failure type is: $problem. What reward modifications in cfg.py would prevent this?" -s
# Generate and save detailed analysis
copilot --model gpt-4.1 --allow-all -p "Analyze d:\MotrixLab\starter_kit\<task>\vbot\cfg.py thoroughly. Output a markdown report covering: environment overview, reward breakdown, training recommendations." -s > d:\MotrixLab\analysis\task_report.md
# One-command visual debugging: play policy, capture frames, get VLM diagnosis
uv run scripts/capture_vlm.py --env <env-name> \
--max-frames 25 --capture-every 10 \
--vlm-prompt "This policy was trained for 5M steps but the robot seems to fall. Diagnose the issue."
# Read the analysis report
Get-Content starter_kit_log/vlm_captures/<env-name>/*/vlm_analysis.md
$errorDir = "d:\MotrixLab\debug\episode_failure_$(Get-Date -Format 'yyyyMMdd_HHmmss')"
New-Item -ItemType Directory -Path $errorDir -Force
# After capturing failure frames to $errorDir...
$diagnosis = copilot --model gpt-4.1 --allow-all --add-dir $errorDir -p "Examine all frames in this directory showing a failed episode. Create a failure timeline: 1) Initial state 2) Critical moment before failure 3) Failure frame 4) Root cause analysis" -s
Write-Host "Failure diagnosis: $diagnosis"
# Analyze locomotion quality
$gait = copilot --model gpt-4.1 --allow-all --add-dir d:\MotrixLab\renders\locomotion_test -p "Examine the sequence of VBot frames. Analyze gait quality: 1) Foot contact pattern 2) Stride symmetry 3) Body stability 4) Compare to typical quadruped trotting gait" -s
| Context | Prompt Template |
|---|---|
| Screenshot state | "What is the robot's current pose? Is it stable or about to fall?" |
| Failure diagnosis | "What caused this failure? Check leg positions, terrain contact, body angle." |
| Reward analysis | "Is this reward structure balanced? Any risk of reward hacking?" |
| Config comparison | "What differs between these configs? Which is harder?" |
| Progress check | "Based on this reward curve, is training converging? How many more steps needed?" |
| Gait quality | "Is this a healthy quadruped gait? What locomotion issues are visible?" |
| File | Analysis Purpose |
|---|---|
scripts/capture_vlm.py | VLM frame capture + analysis pipeline |
starter_kit/{task}/vbot/cfg.py | Reward structure, env params |
starter_kit/{task}/vbot/vbot_*_np.py | Reward function implementation |
starter_kit/{task}/vbot/xmls/*.xml | Scene geometry, physics |
motrix_rl/src/motrix_rl/cfgs.py | PPO hyperparameters |
runs/*/checkpoints/ | Trained policy analysis |
starter_kit_log/vlm_captures/ | VLM capture outputs + analysis reports |
starter_kit_docs/{task}/Task_Reference.md | Task-specific env IDs, reward scales, terrain data |
Full list of environment IDs, terrains, and packages is documented in:
starter_kit_docs/navigation1/Task_Reference.md→ "Environment IDs" sectionstarter_kit_docs/navigation2/Task_Reference.md→ "Environment IDs" section
capture_vlm.py first — For policy visual analysis, always prefer the automated pipeline over manual screenshots$result = copilot ...-s flag — Silent mode gives clean output for parsing--add-dir when subagent needs multiple files--vlm-batch-size ≤ 10 to avoid context overflow--vlm-prompt to focus VLM on specific suspected bugs| Limitation | Workaround |
|---|---|
| Stateless between calls | Chain invocations, pass context in prompts |
| Cannot run training | Use for analysis only, execute commands yourself |
| Large images slow | Crop or resize before analysis |
| PDF parsing imperfect | Ask specific questions, verify key details |
| Model constrained | Must use gpt-4.1 for free tier |
| Screenshot captures full screen | Position MotrixSim window prominently before capture |
| No in-engine camera capture | Scene XMLs lack <camera> definitions; uses PIL ImageGrab instead |
| VLM batch size limit | Keep ≤ 10 frames per batch to avoid token limits |