Run GPU workloads on Modal's serverless infrastructure. Use when the user needs remote GPU compute for training, inference, benchmarks, or batch processing and Modal CLI is available.
Use the modal CLI for serverless GPU workloads. No pod lifecycle to manage — write a decorated Python script and run it.
pip install modal
modal setup
| Command | Description |
|---|---|
modal run script.py | Run a script on Modal (ephemeral) |
modal run --detach script.py | Run detached (background) |
modal deploy script.py | Deploy persistently |
modal serve script.py | Serve with hot-reload (dev) |
modal shell --gpu a100 | Interactive shell with GPU |
modal app list | List deployed apps |
T4, L4, A10G, L40S, A100, A100-80GB, H100, H200, B200
Multi-GPU: "H100:4" for 4x H100s.
import modal
app = modal.App("experiment")
image = modal.Image.debian_slim(python_version="3.11").pip_install("torch==2.8.0")
@app.function(gpu="A100", image=image, timeout=600)
def train():
import torch
# training code here
@app.local_entrypoint()
def main():
train.remote()
command -v modal