Use when launching cloud VMs, Kubernetes pods, or Slurm jobs for GPU/TPU/CPU workloads, training or fine-tuning models on cloud GPUs, deploying inference servers (vllm, TGI, etc.) with autoscaling, writing or debugging SkyPilot task YAML files, using spot/preemptible instances for cost savings, comparing GPU prices across clouds, managing compute across 25+ clouds, Kubernetes, Slurm, and on-prem clusters with failover between them, troubleshooting resource availability or SkyPilot errors, or optimizing cost and GPU availability.
SkyPilot is a unified framework to run AI workloads on any cloud, Slurm or Kubernetes. It provides a single interface to launch clusters, run jobs, and serve models across 25+ clouds (AWS, GCP, Azure, Coreweave, Nebius, Lambda, Together AI, RunPod, and more), Kubernetes clusters, and Slurm clusters.
Use SkyPilot when you need to:
Don't use SkyPilot for:
SkyPilot has three core abstractions. Use the right one for each stage of your workflow:
1. SkyPilot Clusters (sky launch / sky exec) — Interactive development and debugging
code --remote ssh-remote+CLUSTER), iterate quickly2. Managed Jobs (sky jobs launch) — Long-running training and batch jobs
3. SkyServe (sky serve up) — Production model serving
sky launch + open port to test your serving setup, then use sky serve up to scaleBootstrap to confirm SkyPilot is installed, connected to an API server, and has cloud credentials. Once confirmed, skip straight to the user's task.
Step 1: Check installation and API server connectivity
sky api info
| Output contains | Meaning | Next action |
|---|---|---|
| Server version and status | Server is running and connected | Bootstrap done. Skip to user's task. |
No SkyPilot API server is connected | No server connected | Go to "Start or connect a server" below. |
Could not connect to SkyPilot API server | Remote server unreachable or auth expired | Tell the user and suggest sky api login --relogin -e <endpoint> to reconnect. |
command not found: sky | SkyPilot not installed | Go to "Install SkyPilot" below. |
Install SkyPilot (only if sky command not found):
pip install "skypilot[aws,gcp,kubernetes]" # Pick clouds the user needs
Ask the user which clouds they need if unclear, then re-run sky api info.
Start or connect a server (only if "not running"):
Ask the user:
Do you have an existing SkyPilot API server to connect to, or should I start one locally?
sky api login -e <API_SERVER_URL> — get the URL from the user.sky api startAfter either path, re-run sky api info to confirm the server is reachable.
Step 2: Check cloud credentials (only for fresh setups — skip if the server was already running)
sky check -o json
This shows which clouds are enabled or disabled. If the user's target cloud is not enabled, guide them through credential setup (see Troubleshooting).
Use -o json with status/query commands to get structured JSON output instead of tables.
Clusters — interactive development and debugging:
| Command | Description |
|---|---|
sky launch -c NAME task.yaml | Launch a cluster or run a task |
sky exec NAME task.yaml | Run task on existing cluster (skips provisioning); syncs workdir each time |
sky exec NAME task.yaml -d | Same, but detach immediately (don't stream logs) |
sky status -o json | Show all clusters |
sky logs NAME | Stream job logs from a cluster |
sky logs NAME --no-follow | Print existing logs and exit immediately |
sky logs NAME --tail 50 | Print last 50 lines of logs and exit |
sky logs NAME --status | Exit with code 0=succeeded, 100=failed, 101=not finished, 102=not found, 103=cancelled |
sky queue NAME -o json | List jobs on a cluster with status (structured JSON) |
sky stop NAME / sky start NAME | Stop/restart to save costs (preserves disk) |
sky down NAME | Tear down a cluster completely |
sky gpus list -o json | List available GPU types across clouds |
Managed Jobs — long-running unattended workloads:
| Command | Description |
|---|---|
sky jobs launch task.yaml | Launch a managed job (auto lifecycle + recovery) |
sky jobs queue -o json | Show all managed jobs and their status |
sky jobs logs JOB_ID | Stream logs from a managed job |
sky jobs cancel JOB_ID | Cancel a managed job |
SkyServe — model serving with autoscaling:
| Command | Description |
|---|---|
sky serve up serve.yaml -n NAME | Start a model serving service |
sky serve status NAME | Show service status and endpoint URL |
sky serve update NAME new.yaml | Update a running service (rolling) |
sky serve down NAME | Tear down a service |
For complete CLI reference, see CLI Reference.
# Launch a GPU cluster
sky launch -c mycluster --gpus H100 -- nvidia-smi
# Run a task from YAML
sky launch -c mycluster task.yaml
# SSH into cluster
ssh mycluster
# Connect VSCode or Cursor to the cluster for interactive development
code --remote ssh-remote+mycluster /home/user/sky_workdir
# or: cursor --remote ssh-remote+mycluster /home/user/sky_workdir
# Tear down
sky down mycluster
The task YAML is SkyPilot's primary interface. All fields are optional.
# task.yaml