Write and operate robust, reusable experiment runner scripts (Bash) for ML/data pipelines: config-driven runs, GPU sharding, CPU fallback, structured logs, checkpoint/resume, and post-processing chaining. Use when the user asks how to run a project efficiently via scripts, or asks you to create/standardize run scripts.
Experiment Run Scripts (Fast execution + robust authoring)
Goals
Run efficiently: launch parallel workers (often per-GPU) with clean logs and deterministic inputs.
Write maintainable scripts: keep scripts as orchestrators (resource selection, sharding, logging, chaining), with the experiment logic living in Python modules/configs.
Non-negotiable conventions
Run from repo root (or make paths robust): treat configs/data/outputs as repo-relative by default.
Config-driven: the script takes a single CONFIG path (YAML/JSON) and passes it through; avoid editing code to switch datasets/params.
Stable outputs: results and logs always land in a predictable outputs/<run-name>/... folder (or a config-defined output dir).
Resume-first: rerunning the same command should be safe. Prefer idempotent “skip if output exists” behavior in the underlying program.
Verwandte Skills
Operator quickstart (copy/paste)
Environment
Python deps: install with the project’s dependency manager (requirements.txt, pyproject.toml, conda env, etc.).
GPU deps (optional): if nvidia-smi is available, the script can shard across GPUs; otherwise it must fallback to CPU single-shard.
Common pattern: batch parallelism over seeds/tasks