Optimize data loading pipeline to prevent GPU starvation. Use when setting up DataLoader or data preprocessing.
name data-loading description Optimize data loading pipeline to prevent GPU starvation. Use when setting up DataLoader or data preprocessing. metadata {"category":"tooling","trigger-keywords":"data,loading,dataloader,dataset,preprocessing,augmentation","applicable-stages":"10","priority":"6","version":"1.0","author":"researchclaw","references":"PyTorch Data Loading Tutorial, pytorch.org"} Efficient Data Loading Best Practice Use num_workers = min(8, os.cpu_count()) for DataLoader Enable pin_memory=True when using GPU Use persistent_workers=True to avoid re-spawning Pre-compute and cache transformations when possible For image data: use torchvision.transforms.v2 (faster) For large datasets: consider memory-mapped files or WebDataset Profile with torch.utils.bottleneck to find I/O bottlenecks