Deploy and manage GPU/CPU pods, network volumes, and templates on RunPod. Use when the user asks to launch a GPU server, manage cloud compute, run ML training or inference on remote GPUs, or interact with the RunPod platform in any way. Triggers: runpod, gpu cloud, launch pod, spin up server, deploy gpu, remote training, cloud gpu, inference server.
Set RUNPOD_API_KEY environment variable, or prompt user for their API key.
Authorization: Bearer <RUNPOD_API_KEY>
Two APIs exist. Use REST for everything unless noted otherwise.
| API | Base URL | Use For |
|---|---|---|
| REST (primary) | https://rest.runpod.io/v1 | All CRUD operations on pods, volumes, templates |
| GraphQL | https://api.runpod.io/graphql | GPU availability queries, runtime metrics, spot instance deployment |
CAUTION: api.runpod.io is GraphQL only. REST calls to api.runpod.io will fail silently. Always use rest.runpod.io for REST.
Use GraphQL — REST has no availability filter.
POST https://api.runpod.io/graphql
query {
gpuTypes {
id
displayName
memoryInGb
communityPrice
securePrice
stockStatus # LOW, MEDIUM, HIGH, or null (unavailable)
communityCloud
secureCloud
}
}
| GPU | VRAM | ID | Notes |
|---|---|---|---|
| RTX 4090 | 24 GB | NVIDIA GeForce RTX 4090 | Dev/testing, cheap |
| RTX 4000 Ada | 20 GB | NVIDIA RTX 4000 Ada Generation | Light inference |
| L40S | 48 GB | NVIDIA L40S | Best value for training |
| A40 | 48 GB | NVIDIA A40 | Inference workhorse |
| RTX 6000 Ada | 48 GB | NVIDIA RTX 6000 Ada Generation | Alternative 48 GB |
| A100 80 GB | 80 GB | NVIDIA A100 80GB PCIe | Large models |
| A100 SXM | 80 GB | NVIDIA A100-SXM4-80GB | Higher bandwidth A100 |
| H100 SXM | 80 GB | NVIDIA H100 80GB HBM3 | Fastest training |
| H100 NVL | 94 GB | NVIDIA H100 NVL | Max VRAM H100 |
GPU availability is seconds-level volatile for premium cards (H100, A100, L40S). Always wrap deploys in retry logic with a fallback GPU list.
| Use Case | Try In Order |
|---|---|
| Training (48 GB+) | L40S → A40 → RTX 6000 Ada → A100 80 GB |
| Inference (24 GB+) | RTX 4090 → RTX 4000 Ada → L40S |
| Frontier models | H100 SXM → H100 NVL → A100 SXM → A100 80 GB |
| Budget dev | RTX 4090 → RTX 4000 Ada |
POST https://rest.runpod.io/v1/pods
Content-Type: application/json
Authorization: Bearer <API_KEY>
{
"name": "my-pod",
"imageName": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
"gpuTypeIds": ["NVIDIA L40S"],
"gpuCount": 1,
"cloudType": "ALL",
"containerDiskInGb": 50,
"volumeInGb": 100,
"volumeMountPath": "/workspace",
"ports": ["8888/http", "22/tcp"],
"env": {
"JUPYTER_PASSWORD": "mypassword",
"HF_TOKEN": "hf_..."
}
}
Returns 201 with full pod object including id.
| Field | Type | Default | Notes |
|---|---|---|---|
name | string | "my pod" | Max 191 chars |
imageName | string | required | Container image tag |
gpuTypeIds | string[] | — | Array. GPU pods only |
gpuCount | int | 1 | Multi-GPU: 2, 4, 8 |
gpuTypePriority | string | "availability" | availability or custom (use ordering in array) |
computeType | string | "GPU" | GPU or CPU |
cpuFlavorIds | string[] | — | CPU pods only: cpu3c, cpu3g, cpu3m, cpu5c, cpu5g, cpu5m |
cloudType | string | "SECURE" | SECURE, COMMUNITY, or omit for secure only |
containerDiskInGb | int | 50 | Ephemeral — wiped on every restart |
volumeInGb | int | 20 | Persists across restarts, mounted at volumeMountPath |
volumeMountPath | string | "/workspace" | Where volume mounts |
networkVolumeId | string | — | Attach a network volume (replaces local volume) |
ports | string[] | ["8888/http","22/tcp"] | Format: port/protocol |
env | object | {} | {"KEY": "value"} format |
dockerEntrypoint | string[] | [] | Override image ENTRYPOINT |
dockerStartCmd | string[] | [] | Override image CMD |
templateId | string | — | UUID from template list |
interruptible | bool | false | Spot pricing (can be preempted at any time) |
locked | bool | false | Prevents accidental stop/reset |
dataCenterIds | string[] | all | e.g. ["US-TX-3","US-KS-2","EU-RO-1"] |
dataCenterPriority | string | "availability" | availability or custom |
countryCodes | string[] | — | Filter by country |
allowedCudaVersions | string[] | — | e.g. ["12.4","12.3"] |
minRAMPerGPU | int | 8 | GPU pods only |
minVCPUPerGPU | int | 2 | GPU pods only |
vcpuCount | int | 2 | CPU pods only |
supportPublicIp | bool | — | Community Cloud only |
globalNetworking | bool | false | Low-latency private networking (limited availability) |
Spot instances (podRentInterruptable) are only available via GraphQL.
POST https://api.runpod.io/graphql
mutation {
podRentInterruptable(input: {
name: "spot-training"
imageName: "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04"
gpuTypeId: "NVIDIA L40S"
gpuCount: 1
cloudType: SECURE
bidPerGpu: 0.40
containerDiskInGb: 50
volumeInGb: 100
volumeMountPath: "/workspace"
ports: "8888/http,22/tcp"
env: [{ key: "HF_TOKEN", value: "hf_..." }]
}) {
id
machineId
machine { podHostId }
}
}
CRITICAL DIFFERENCE: GraphQL
envis[{key, value}]array. RESTenvis{"key": "value"}object. GraphQLgpuTypeIdis singular string. RESTgpuTypeIdsis plural array. GraphQLportsis a single comma-separated string. RESTportsis an array.
SPOT WARNING: Interruptible pods can be terminated at any time with zero notice. Always checkpoint work to network volumes or external storage.
GET https://rest.runpod.io/v1/pods
Returns array of pod objects with fields: id, name, desiredStatus (RUNNING/EXITED/TERMINATED), costPerHr, gpu, image, ports, publicIp, portMappings, env, volumeInGb, containerDiskInGb, machine, lastStatusChange, interruptible, locked.
GET https://rest.runpod.io/v1/pods/{podId}
REST doesn't return live GPU utilization or container metrics. Use GraphQL:
query {
myself {
pods {
id
name
desiredStatus
costPerHr
runtime {
uptimeInSeconds
ports { ip isIpPublic privatePort publicPort type }
gpus { id gpuUtilPercent memoryUtilPercent }
container { cpuPercent memoryPercent }
}
}
}
}
POST https://rest.runpod.io/v1/pods/{podId}/stop
POST https://rest.runpod.io/v1/pods/{podId}/start
POST https://rest.runpod.io/v1/pods/{podId}/restart
POST https://rest.runpod.io/v1/pods/{podId}/reset
DELETE https://rest.runpod.io/v1/pods/{podId}
TERMINATE IS IRREVERSIBLE. All data not on a network volume is permanently destroyed.
mutation {
podBidResume(input: {
podId: "abc123"
bidPerGpu: 0.40
gpuCount: 1
}) {
id
desiredStatus
}
}
PATCH https://rest.runpod.io/v1/pods/{podId}
Content-Type: application/json
{
"env": {"NEW_VAR": "value"},
"ports": ["8888/http", "22/tcp", "5000/http"],
"volumeInGb": 200,
"imageName": "runpod/pytorch:2.8.0-py3.11-cuda12.6-devel-ubuntu22.04"
}
Updatable fields: containerDiskInGb, containerRegistryAuthId, dockerEntrypoint, dockerStartCmd, env, globalNetworking, imageName, locked, name, ports, volumeInGb, volumeMountPath.
Updates may trigger a pod reset. Warn user that running processes will be interrupted.
Network volumes are persistent storage independent of any pod. They survive pod termination and can be attached to any pod in the same datacenter. Critical for ML workflows.
POST https://rest.runpod.io/v1/networkvolumes
{
"name": "training-data",
"size": 100,
"dataCenterId": "US-TX-3"
}
GET https://rest.runpod.io/v1/networkvolumes
GET https://rest.runpod.io/v1/networkvolumes/{networkVolumeId}
PATCH https://rest.runpod.io/v1/networkvolumes/{networkVolumeId}
DELETE https://rest.runpod.io/v1/networkvolumes/{networkVolumeId}
Set networkVolumeId in the pod create call. This replaces the local volume — the network volume mounts at volumeMountPath (default /workspace).
{
"name": "my-pod",
"imageName": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
"gpuTypeIds": ["NVIDIA L40S"],
"networkVolumeId": "vol_abc123",
"volumeMountPath": "/workspace"
}
dataCenterIds on pod create to match.GET https://rest.runpod.io/v1/templates
POST https://rest.runpod.io/v1/templates
{
"name": "my-training-template",
"imageName": "runpod/pytorch:2.4.0-py3.11-cuda12.4.1-devel-ubuntu22.04",
"containerDiskInGb": 50,
"volumeInGb": 100,
"volumeMountPath": "/workspace",
"ports": ["8888/http", "22/tcp"],
"env": {"JUPYTER_PASSWORD": "default"},
"dockerStartCmd": []
}
PATCH https://rest.runpod.io/v1/templates/{templateId}
DELETE https://rest.runpod.io/v1/templates/{templateId}
Template must not be in use by any pod or serverless endpoint.