Reproducible training runs — W&B sweeps, seeded, eval harness, model registry discipline.
transformers.set_seed(42) + random.seed(42) + np.random.seed(42) + torch.manual_seed(42).dataloader deterministic where it doesn't hurt throughput (torch.use_deterministic_algorithms(True) on ablations).pyproject.toml with ~= at minimum.(code_sha, config_sha, data_sha).ml/training/config/. Default config + overrides:
uv run python ml/training/train.py --config-name=ner_clinicalbert lr=5e-5 epochs=5
wandb.init(project="clinical-nlp", job_type="train", config=cfg).baseline, ablation, production-candidate.ml/training/sweeps/.ml/training/evaluation/eval.py takes --model-path + --golden-set and outputs a structured JSON + markdown report.clinical-nlp/ner-v1, -v2).production. CI deploy reads the production tag.ml/ablations.md: what was changed, hypothesis, result, conclusion.