스킬 파일

Train With Environments

Name: Train With Environments
Author: PrimeIntellect-ai

Train models with verifiers environments using hosted RL or prime-rl. Use when asked to configure RL runs, tune key hyperparameters, diagnose instability, set up difficulty filtering and oversampling, or create practical train and eval loops for new environments.

PrimeIntellect-ai4,026 스타2026. 4. 12.

직업
카테고리: 머신러닝

스킬 내용

Goal

Run stable RL training loops with environment-aware hyperparameter choices and clear diagnostics.

Preferred Training Paths

By default, assume users intend to use Hosted Training unless they explicitly ask for self-managed training.
Hosted Training service path from lab setup:

prime lab setup

Self-managed prime-rl workflow:

prime lab setup --prime-rl
uv run prime-rl configs/prime-rl/wiki-search.toml

Treat prime-rl as a power-user path and assume users are comfortable working with GPU infrastructure and troubleshooting.
Runtime expectation:

Hosted Training is intended to be launched from a CPU machine.

Train With Environments

Goal

Preferred Training Paths

Train With Environments

Goal

Preferred Training Paths

Endpoint Shortcuts And Model Family Choice

First-Run Protocol

Publish Gate Before RL

Hyperparameter Rules Of Thumb

Difficulty Filtering And Oversampling

Stability Constraints From Prime-RL

Failure Diagnosis

Non-Negotiable Environment Quality During Training

Deliverable

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns