Cloud GPU processing via RunPod serverless. Use when setting up RunPod endpoints, deploying Docker images, managing GPU resources, troubleshooting endpoint issues, or understanding costs. Covers all 5 toolkit images (qwen-edit, realesrgan, propainter, sadtalker, qwen3-tts).
Run open-source AI models on cloud GPUs via RunPod serverless. Pay-per-second, no minimums.
# 1. Create account at https://runpod.io
# 2. Add API key to .env
echo "RUNPOD_API_KEY=your_key_here" >> .env
# 3. Deploy any tool with --setup
python tools/image_edit.py --setup
python tools/upscale.py --setup
python tools/dewatermark.py --setup
python tools/sadtalker.py --setup
python tools/qwen3_tts.py --setup
Each --setup command:
.env (e.g. RUNPOD_QWEN_EDIT_ENDPOINT_ID)All images are public on GHCR — no authentication needed.
| Tool | Docker Image | GPU | VRAM | Typical Cost |
|---|
| image_edit | ghcr.io/conalmullan/video-toolkit-qwen-edit:latest | A6000/L40S | 48GB+ | ~$0.05-0.15/job |
| upscale | ghcr.io/conalmullan/video-toolkit-realesrgan:latest | RTX 3090/4090 | 24GB | ~$0.01-0.05/job |
| dewatermark | ghcr.io/conalmullan/video-toolkit-propainter:latest | RTX 3090/4090 | 24GB | ~$0.05-0.30/job |
| sadtalker | ghcr.io/conalmullan/video-toolkit-sadtalker:latest | RTX 4090 | 24GB | ~$0.05-0.15/job |
| qwen3_tts | ghcr.io/conalmullan/video-toolkit-qwen3-tts:latest | ADA 24GB | 24GB | ~$0.01-0.05/job |
Total monthly cost: Rarely exceeds $10 even with heavy use.
All tools follow the same pattern:
Local CLI → Upload input to cloud storage → RunPod API → Poll for result → Download output
R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME), falling back to free upload services/run endpoint, then poll /status/{job_id} until completeworkersMin: 0 — Scale to zero when idle (no cost)
workersMax: 1 — Max concurrent jobs (increase for throughput)
idleTimeout: 5 — Seconds before worker scales down
Across all endpoints, you share a total worker pool based on your RunPod plan. If you hit limits, reduce workersMax on endpoints you're not actively using.
Each tool stores its endpoint ID in .env:
| Tool | Env Var |
|---|---|
| image_edit | RUNPOD_QWEN_EDIT_ENDPOINT_ID |
| upscale | RUNPOD_UPSCALE_ENDPOINT_ID |
| dewatermark | RUNPOD_DEWATERMARK_ENDPOINT_ID |
| sadtalker | RUNPOD_SADTALKER_ENDPOINT_ID |
| qwen3_tts | RUNPOD_QWEN3_TTS_ENDPOINT_ID |
To free worker slots without deleting the endpoint, set workersMax=0 via the RunPod dashboard or GraphQL API.
Use these to query and manage endpoints programmatically. RunPod disables GraphQL introspection, so these field names are verified and must be exact.
All API calls require Authorization: Bearer $RUNPOD_API_KEY.
POST https://api.runpod.io/graphqlhttps://api.runpod.ai/v2/{endpoint_id}/...List all endpoints:
query { myself { endpoints { id name gpuIds templateId workersMax workersMin } } }
Current spend rate:
query { myself { currentSpendPerHr spendDetails { localStoragePerHour networkStoragePerHour gpuComputePerHour } } }
List pods:
query { myself { pods { id name runtime { uptimeInSeconds } machine { gpuDisplayName } desiredStatus } } }
Common mistakes: Field names are camelCase with full words —
localStoragePerHournotlocalStoragePerHr. Endpoints areendpointsnotserverlessWorkers.spendingis not a field — usecurrentSpendPerHrandspendDetails.
Update endpoint GPU or config:
mutation { saveEndpoint(input: {
id: "endpoint_id",
name: "endpoint-name",
templateId: "template_id",
gpuIds: "AMPERE_24",
workersMin: 0,
workersMax: 1
}) { id gpuIds } }
saveEndpoint requires name and templateId even for updates — query first to get current values.
| Action | Method | URL |
|---|---|---|
| Submit job | POST | /v2/{id}/run |
| Check status | GET | /v2/{id}/status/{job_id} |
| Cancel job | POST | /v2/{id}/cancel/{job_id} |
| List pending | GET | /v2/{id}/requests |
| Health/stats | GET | /v2/{id}/health |
Health response includes job counts and worker state:
{
"jobs": { "completed": 16, "failed": 1, "inProgress": 0, "inQueue": 2, "retried": 0 },
"workers": { "idle": 0, "initializing": 1, "ready": 0, "running": 0, "throttled": 0 }
}
Note:
/requestsonly returns pending/queued jobs. Completed job history is not available via the API — check the RunPod web console for logs.
| ID | GPU | VRAM | Typical Cost |
|---|---|---|---|
AMPERE_24 | RTX 3090 | 24GB | ~$0.34/hr |
ADA_24 | RTX 4090 | 24GB | ~$0.69/hr |
AMPERE_48 | A6000 | 48GB | ~$0.76/hr |
AMPERE_80 | A100 | 80GB | ~$1.99/hr |
Availability note: ADA_24 (4090) is frequently throttled/unavailable on RunPod. Always configure endpoints with multiple fallback GPU types (comma-separated) to avoid jobs getting stuck in queue indefinitely:
gpuIds: "AMPERE_24,ADA_24" # Try 3090 first, fall back to 4090
All toolkit tools also enforce a 5-minute queue timeout — if no GPU is available within 300 seconds, the job is automatically cancelled to prevent runaway billing from failed initialization cycles.
R2 uses the S3-compatible API but requires --region auto:
AWS_ACCESS_KEY_ID="$R2_ACCESS_KEY_ID" \
AWS_SECRET_ACCESS_KEY="$R2_SECRET_ACCESS_KEY" \
aws s3api list-objects-v2 \
--bucket "$R2_BUCKET_NAME" \
--endpoint-url "https://${R2_ACCOUNT_ID}.r2.cloudflarestorage.com" \
--region auto
Common mistake: Omitting
--region autocausesInvalidRegionNameerror. R2 valid regions:wnam,enam,weur,eeur,apac,oc,auto.
When you push a new Docker image version, RunPod may still use the cached old one. To force a pull:
imageName to use @sha256:DIGEST notation:latest tag after confirmingIf cold starts are a problem, set workersMin: 1 (costs money when idle).
The model needs more VRAM than the GPU provides. Options:
--resize-ratio (default 0.5 for safety)--stepsYou've hit your plan's concurrent worker limit. Either:
workersMax=0 on endpoints you're not usingAll Dockerfiles live in docker/runpod-*/. Images use runpod/pytorch as the base to share layers across tools.
Building for RunPod (from Apple Silicon Mac):
docker buildx build --platform linux/amd64 -t ghcr.io/conalmullan/video-toolkit-<name>:latest docker/runpod-<name>/
docker push ghcr.io/conalmullan/video-toolkit-<name>:latest
GHCR packages default to private — you must manually make them public for RunPod to pull them. Go to GitHub > Packages > Package Settings > Change Visibility.
workersMin: 0 on all endpoints (scale to zero)workersMax=0 to disable idle endpoints without deleting them