스킬 파일

ML Project Migration — Decision Guide

Name: ML Project Migration — Decision Guide
Author: Albatross679

Decision-making guide and Vast.ai CLI workflow for migrating ML projects to a new VM. Helps categorize files (Git vs cloud storage vs manual vs skip), handle dot-folders (.claude/, .env, .vscode/), plan infrastructure, and provision Vast.ai instances via CLI (search offers, launch, connect, manage lifecycle). Use when the user asks to (1) decide what goes on GitHub vs cloud storage for a new ML project, (2) categorize files for migration, (3) handle dot-folders and personal configs during migration, (4) plan infrastructure for a new ML project on a cloud GPU provider, (5) decide whether to use Docker, venv, or conda for a project, (6) search for or rent a Vast.ai instance, (7) manage Vast.ai instances (stop, start, destroy, list), (8) choose GPU type or spot vs on-demand for a workload, (9) set up Vast.ai account-level env vars. Does NOT handle post-SSH setup execution — see vm-setup skill for that.

Albatross6790 스타2026. 3. 29.

직업
카테고리: Git 워크플로

스킬 내용

Help the user decide how to structure a new ML project for portability across VMs, and provision Vast.ai instances via CLI.

File Categorization Framework

For any ML project, categorize every file into one of four buckets:

Bucket	What belongs	Transfer via
Git	Code, configs, `pyproject.toml`, Dockerfile, `.github/`, docs, small data (<50 MB), `CLAUDE.md`, `.claude/skills/`, scripts	GitHub
Cloud storage	Model weights (`.pt`, `.bin`, `.safetensors`), checkpoints, large datasets (>50 MB), optionally HF cache	B2 / S3 / GCS
Manual	`.env` (secrets), `.claude/memory/` (personal), `.claude/settings.local.json` (machine-specific)	SCP / rsync
Skip	, , , , ,

관련 스킬

ML Project Migration — Decision Guide | Skills Pool

__pycache__/

.mypy_cache/

.pytest_cache/

.wandb/

.ipynb_checkpoints/

nohup.out

Dot-folder/file	Commit to Git?	Transfer manually?	Notes
`.gitignore`	Yes	N/A	Essential
`.dockerignore`	Yes	N/A	Essential
`.github/`	Yes	N/A	CI/CD
`CLAUDE.md`	Yes	N/A	Project instructions
`.claude/skills/` (shared)	Yes	N/A	Shared knowledge
`.claude/settings.local.json`	No	Optional	Machine-specific paths
`.claude/memory/`	No	Optional	Personal context
`.claude/worktrees/`	No	No	Regenerated by GSD tooling
`.planning/`	Depends	No	Git if shared with team, skip if personal
`.env`	No	SCP	Secrets, never Git
`.vscode/settings.json`	No	Use Settings Sync	Auto-syncs via VS Code
`.cache/`, `.huggingface/`	No	Via cloud storage	Or re-download
`.wandb/`	No	No	Regenerated
`.mypy_cache/`, `.pytest_cache/`	No	No	Regenerated

Scenario	Recommendation
Fast internet on target VM	Re-download (simpler)
Slow/metered internet	Transfer via cloud storage
Custom/fine-tuned models	Always transfer (not re-downloadable)
Base models only	Re-download

Approach	When to use
Generic Docker image + mount code	Reusable across projects, fast iteration, cloud GPU providers (Vast.ai, RunPod)
Project-specific Docker image	Need exact reproducibility, complex build, shared with team
venv + pyproject.toml	Small projects, no Docker needed, local development
conda	Need non-Python deps (C libraries, specific CUDA)

# 1. Launch best-match instance directly
vastai launch instance -g RTX_4090 -n 1 -i vastai/pytorch --ssh --direct -d 64

# 2. Wait for instance to start, then get SSH command
vastai show instances
vastai ssh-url <INSTANCE_ID>

# 3. SCP .env and connect
SCP_PREFIX=$(vastai scp-url <ID>)
scp -P <port> .env ${SCP_PREFIX}/workspace/.env
ssh -p <port> root@<host>

# 4. Run vm-setup skill on the instance

# Search with specific constraints
vastai search offers 'reliability>0.98 num_gpus=1 gpu_name=RTX_4090 disk_space>100 cpu_ram>32 cuda_vers>=12.1'

# Then create from a specific offer ID
vastai create instance <OFFER_ID> --image vastai/pytorch --ssh --direct --disk 64

vastai create env-var -k WANDB_API_KEY -v 'wk_...'
vastai create env-var -k HF_TOKEN -v 'hf_...'
vastai create env-var -k ANTHROPIC_API_KEY -v 'sk-ant-...'
vastai create env-var -k GH_TOKEN -v 'ghp_...'

vastai show instances          # list all
vastai stop instance <ID>      # pause billing
vastai start instance <ID>     # resume
vastai destroy instance <ID>   # permanent delete
vastai logs <ID>               # view logs

ML Project Migration — Decision Guide

File Categorization Framework

ML Project Migration — Decision Guide

File Categorization Framework

Dot-Folder Decision Matrix

HuggingFace Cache Decision

Environment Reproduction Decision

Non-root User on Cloud GPU Providers

Editable Install Gotcha

Vast.ai Instance Provisioning

Quick Provision (Common Case)

Advanced Search (Custom Requirements)

Account-Level Env Vars

Lifecycle Commands

Handoff to vm-setup

Detailed Reference

Openclaw Ghsa Maintainer

Gh Issues

Security Triage

Github

Author Contributions

Repository Setup