Decision-making guide and Vast.ai CLI workflow for migrating ML projects to a new VM. Helps categorize files (Git vs cloud storage vs manual vs skip), handle dot-folders (.claude/, .env, .vscode/), plan infrastructure, and provision Vast.ai instances via CLI (search offers, launch, connect, manage lifecycle). Use when the user asks to (1) decide what goes on GitHub vs cloud storage for a new ML project, (2) categorize files for migration, (3) handle dot-folders and personal configs during migration, (4) plan infrastructure for a new ML project on a cloud GPU provider, (5) decide whether to use Docker, venv, or conda for a project, (6) search for or rent a Vast.ai instance, (7) manage Vast.ai instances (stop, start, destroy, list), (8) choose GPU type or spot vs on-demand for a workload, (9) set up Vast.ai account-level env vars. Does NOT handle post-SSH setup execution — see vm-setup skill for that.
Help the user decide how to structure a new ML project for portability across VMs, and provision Vast.ai instances via CLI.
For any ML project, categorize every file into one of four buckets:
| Bucket | What belongs | Transfer via |
|---|---|---|
| Git | Code, configs, pyproject.toml, Dockerfile, .github/, docs, small data (<50 MB), CLAUDE.md, .claude/skills/, scripts | GitHub |
| Cloud storage | Model weights (.pt, .bin, .safetensors), checkpoints, large datasets (>50 MB), optionally HF cache | B2 / S3 / GCS |
| Manual | .env (secrets), .claude/memory/ (personal), .claude/settings.local.json (machine-specific) | SCP / rsync |
| Skip | , , , , , |
__pycache__/.mypy_cache/.pytest_cache/.wandb/.ipynb_checkpoints/nohup.out| Never (regenerated) |
| Dot-folder/file | Commit to Git? | Transfer manually? | Notes |
|---|---|---|---|
.gitignore | Yes | N/A | Essential |
.dockerignore | Yes | N/A | Essential |
.github/ | Yes | N/A | CI/CD |
CLAUDE.md | Yes | N/A | Project instructions |
.claude/skills/ (shared) | Yes | N/A | Shared knowledge |
.claude/settings.local.json | No | Optional | Machine-specific paths |
.claude/memory/ | No | Optional | Personal context |
.claude/worktrees/ | No | No | Regenerated by GSD tooling |
.planning/ | Depends | No | Git if shared with team, skip if personal |
.env | No | SCP | Secrets, never Git |
.vscode/settings.json | No | Use Settings Sync | Auto-syncs via VS Code |
.cache/, .huggingface/ | No | Via cloud storage | Or re-download |
.wandb/ | No | No | Regenerated |
.mypy_cache/, .pytest_cache/ | No | No | Regenerated |
| Scenario | Recommendation |
|---|---|
| Fast internet on target VM | Re-download (simpler) |
| Slow/metered internet | Transfer via cloud storage |
| Custom/fine-tuned models | Always transfer (not re-downloadable) |
| Base models only | Re-download |
| Approach | When to use |
|---|---|
| Generic Docker image + mount code | Reusable across projects, fast iteration, cloud GPU providers (Vast.ai, RunPod) |
| Project-specific Docker image | Need exact reproducibility, complex build, shared with team |
| venv + pyproject.toml | Small projects, no Docker needed, local development |
| conda | Need non-Python deps (C libraries, specific CUDA) |
On common base images (e.g., vastai/pytorch, nvidia/pytorch):
USER directive to Dockerfile — breaks boot sequenceuser account (UID 1001) typically exists with passwordless sudosu - useruser directly for non-root sessionsIf the source project used pip install -e ., the target VM must re-run it after cloning — editable installs create symlinks to the source directory, which won't exist on the new machine.
End-to-end workflow for spinning up a new VM via CLI. Requires vastai CLI authenticated (vastai set api-key).
# 1. Launch best-match instance directly
vastai launch instance -g RTX_4090 -n 1 -i vastai/pytorch --ssh --direct -d 64
# 2. Wait for instance to start, then get SSH command
vastai show instances
vastai ssh-url <INSTANCE_ID>
# 3. SCP .env and connect
SCP_PREFIX=$(vastai scp-url <ID>)
scp -P <port> .env ${SCP_PREFIX}/workspace/.env
ssh -p <port> root@<host>
# 4. Run vm-setup skill on the instance
# Search with specific constraints
vastai search offers 'reliability>0.98 num_gpus=1 gpu_name=RTX_4090 disk_space>100 cpu_ram>32 cuda_vers>=12.1'
# Then create from a specific offer ID
vastai create instance <OFFER_ID> --image vastai/pytorch --ssh --direct --disk 64
Set once, injected into every new instance automatically:
vastai create env-var -k WANDB_API_KEY -v 'wk_...'
vastai create env-var -k HF_TOKEN -v 'hf_...'
vastai create env-var -k ANTHROPIC_API_KEY -v 'sk-ant-...'
vastai create env-var -k GH_TOKEN -v 'ghp_...'
vastai show instances # list all
vastai stop instance <ID> # pause billing
vastai start instance <ID> # resume
vastai destroy instance <ID> # permanent delete
vastai logs <ID> # view logs
After SSH into the instance, use the vm-setup skill to run script/setup.sh which installs all tools, clones the repo, and authenticates services.
Read references/migration-checklist.md for:
.gitignore template for ML projectsRead references/vastai-cli.md for: