SSH job queue for multi-seed/multi-config ML experiments with OOM-aware retry, stale-screen cleanup, and wave-transition race prevention. Use when user says "batch experiments", "队列实验", "run grid", "multi-seed sweep", "auto-chain experiments", or when /run-experiment is insufficient for 10+ jobs that need orchestration.
Orchestrate large batches of ML experiments on SSH remote GPU servers with proper state tracking, OOM retry, stale cleanup, and wave transitions.
Use when /run-experiment is insufficient:
Do NOT use for:
/run-experiment)Based on session audit (2026-04-16), the major wall-clock sinks in multi-seed grid experiments are:
All of these are pure engineering friction that can be orchestrated.
A manifest lists jobs with explicit state: