Skill-Datei

Launch Experiment

Name: Launch Experiment
Author: hao-ai-lab

Generate and execute a training launch command for FastVideo models

hao-ai-lab3,401 Sterne09.03.2026

Beruf: Data Scientists
Kategorien: Machine Learning

Skill-Inhalt

Purpose

Construct a fully-specified torchrun training command for a FastVideo model given a target pipeline, dataset, and hyperparameter overrides. This skill automates the boilerplate of setting environment variables, picking the right entrypoint, and applying defaults from the closest example script.

Prerequisites

The repo is cloned and fastvideo is installed (uv pip install -e .[dev]).
Dataset is preprocessed (see docs/training/data_preprocess.md).
WANDB_API_KEY is set in the environment (or WANDB_MODE=offline for local).
GPU resources are available (multi-GPU requires NCCL).

Inputs

Parameter	Required	Description
`pipeline`	Yes	Training pipeline type: `finetune`, , , ,

Verwandte Skills

Launch Experiment | Skills Pool

distill-dmd

self-forcing

lora

consistency

Pipeline	Entrypoint
`finetune` (Wan T2V)	`fastvideo/training/wan_training_pipeline.py`
`finetune` (Wan I2V)	`fastvideo/training/wan_i2v_training_pipeline.py`
`finetune` (LTX-2)	`fastvideo/training/ltx2_training_pipeline.py`
`finetune` (MatrixGame)	`fastvideo/training/matrixgame_training_pipeline.py`
`distill-dmd`	`fastvideo/training/wan_distillation_pipeline.py`
`self-forcing`	`fastvideo/training/wan_self_forcing_distillation_pipeline.py`

Model	Example Script Directory
`wan-t2v-1.3B`	`examples/training/finetune/wan_t2v_1.3B/crush_smol/`
`wan-i2v-14B`	`examples/training/finetune/wan_i2v_14B_480p/crush_smol/`
`ltx2`	`examples/training/finetune/ltx2/`
`matrixgame`	`examples/training/finetune/MatrixGame2.0/`
`distill-dmd`	`scripts/distill/v1_distill_dmd_wan.sh`

export WANDB_API_KEY="${WANDB_API_KEY}"
export WANDB_BASE_URL="https://api.wandb.ai"
export FASTVIDEO_ATTENTION_BACKEND=FLASH_ATTN
export TOKENIZERS_PARALLELISM=false
export TRITON_CACHE_DIR=/tmp/triton_cache

torchrun --nnodes 1 --nproc_per_node <num_gpus> \
    <entrypoint> \
    --pretrained_model_name_or_path <model_hf_id> \
    --data_path "<data_path>" \
    --output_dir "<output_dir>" \
    --wandb_run_name "<run_name>" \
    --tracker_project_name "<project_name>" \
    --log_validation \
    <...all hyperparameters...>

## [YYYY-MM-DD] Experiment: <run_name>
- **Hypothesis**: <user-provided or auto-generated>
- **Config**: model=<model>, lr=<lr>, sp_size=<sp>, gpus=<n>, script=<entrypoint>
- **W&B run**: <pending — will be updated by monitor skill>
- **Status**: running

Launch a Wan T2V 1.3B finetune on 4 GPUs with lr=5e-5 and max_train_steps=1000:

  pipeline: finetune
  model: wan-t2v-1.3B
  data_path: data/crush_smol_preprocessed/
  num_gpus: 4
  overrides:
    learning_rate: 5e-5
    max_train_steps: 1000

Launch Experiment

Purpose

Prerequisites

Inputs

Launch Experiment

Purpose

Prerequisites

Inputs

Steps

1. Identify the training entrypoint

2. Resolve default hyperparameters

3. Set environment variables

4. Construct the torchrun command

5. Log to experiment journal

Outputs

Example Usage

References

Changelog

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns