Standards for logging and tracking experiments
Before running any experiment, capture:
# 1. Git state
git rev-parse HEAD # commit hash
git status --short # working directory status
git diff --stat # uncommitted changes summary
# 2. Environment
python --version
pip list | grep -E "torch|transformers|accelerate|numpy" # key packages
nvidia-smi --query-gpu=name,memory.total --format=csv # GPU info
# 3. Config validation
# Verify the config file is valid YAML/JSON and all paths exist
Each experiment record in results/ should follow this structure:
# Experiment: {experiment-name}
Date: {YYYY-MM-DD HH:MM}
Git Commit: {hash} (clean: yes/no)
Config: {path-to-config-file}
## Setup
- GPU: {gpu-type} x {count}
- Training time: {hours}h {minutes}m
- Effective batch size: {batch_size * grad_accum * gpu_count}
## Config Summary
{Key hyperparameters in table format}
## Results
### Primary Metrics
| Metric | Value |
|--------|-------|
| ... | ... |
### Training Curve Notes
- Converged at epoch: ...
- Final train loss: ...
- Best validation score at epoch: ...
## Comparison with Baselines
| Method | Metric-1 | Metric-2 | Notes |
|--------|----------|----------|-------|
| Baseline | ... | ... | paper-reported |
| Ours | ... | ... | this run |
## Observations
- [What worked, what didn't, surprising findings]
## Next Steps
- [What to try next based on these results]
results/{exp-name}_{YYYYMMDD}.mdresults/{series-name}/run_{NNN}.mdWhen comparing multiple experiments, generate a summary table:
# Experiment Comparison: {series-name}
| Run | Date | Key Change | Metric-1 | Metric-2 | Notes |
|-----|------|-----------|----------|----------|-------|
| run_001 | ... | baseline | ... | ... | ... |
| run_002 | ... | +augment | ... | ... | ... |
| run_003 | ... | +lr-decay | ... | ... | ... |
Status: FAILED — {reason}