Run distributed GPU training jobs on CoreWeave with multi-node PyTorch. Use when training models across multiple GPUs, setting up distributed training, or running fine-tuning jobs on CoreWeave H100 clusters. Trigger with phrases like "coreweave training", "coreweave multi-gpu", "distributed training coreweave", "fine-tune on coreweave".
Run distributed GPU training on CoreWeave: single-node multi-GPU and multi-node training with PyTorch DDP, Slurm-on-Kubernetes, and shared storage.
# training-job.yaml
apiVersion: batch/v1