Name: Ml Experimentation
Author: HustleDanie

Reproducibility (non-negotiable)

Seed: transformers.set_seed(42) + random.seed(42) + np.random.seed(42) + torch.manual_seed(42).
dataloader deterministic where it doesn't hurt throughput (torch.use_deterministic_algorithms(True) on ablations).
Pin every library version in pyproject.toml with ~= at minimum.
Commit the training config alongside results — runs are uniquely identified by (code_sha, config_sha, data_sha).

Config

Hydra-style YAML in ml/training/config/. Default config + overrides:

uv run python ml/training/train.py --config-name=ner_clinicalbert lr=5e-5 epochs=5