Train models with verifiers environments using hosted RL or prime-rl. Use when asked to configure RL runs, tune key hyperparameters, diagnose instability, set up difficulty filtering and oversampling, or create practical train and eval loops for new environments.
Run stable RL training loops with environment-aware hyperparameter choices and clear diagnostics.
prime lab setup
prime-rl workflow:prime lab setup --prime-rl
uv run prime-rl configs/prime-rl/wiki-search.toml
prime-rl as a power-user path and assume users are comfortable working with GPU infrastructure and troubleshooting.prime-rl training requires local GPU access.configs/endpoints.toml for eval and train loops.gpt-4.1 series, qwen3 instruct series.gpt-5 series, qwen3 thinking series, glm series.--skip-upload unless the user explicitly requests that deviation:prime env install my-env
prime eval run my-env -m openai/gpt-4.1-mini -n 20 -r 3 -s
PUBLIC or PRIVATE.prime env push my-env --visibility PUBLIC
or
prime env push my-env --visibility PRIVATE
owner/my-env in config [[env]].id).rollouts_per_example and batch_size together.batch_size as total rollout samples per step, not number of groups.batch_size divisible by rollouts_per_example.rollouts_per_example = 8batch_size = 128 (or lower)rollouts_per_example = 16batch_size = 512 (common strong starting point)buffer.online_difficulty_filtering = trueoversampling_factor > 1 (for example 2.0)easy_threshold and hard_threshold only after observing reward distributions.max_concurrent >= rollouts_per_example * workers_per_env.max_async_level) and monitor off-policy drift.Return: