Machine Learning
RL Training Debugger
Systematic RL training debugger. Four-phase diagnostic process for identifying and fixing RL training failures. Use when (1) reward not growing or plateaued, (2) gradient explosion or vanishing, (3) entropy collapsed or policy died, (4) Q-values diverging in SAC, (5) user says "debug training", "reward stuck", "why isn't it learning", (6) validate trainer before real training, (7) run probe environments, (8) user mentions "explained variance", "advantage stats", "action collapse", or any diagnostic metric, (9) pre-flight check before training, (10) reward function debugging, reward component analysis, reward SNR, (11) environment verification, observation sufficiency, action space issues, (12) task design assessment, curriculum necessity, exploration problems. Works with PPO, SAC, MM-RKHS, and any TorchRL-based trainer.