ML research loop supporting both fully autonomous and collaborative (human-runs-training, agent-analyzes) modes. Conducts literature review, proposes architecture changes, implements code behind flags, diagnoses results, hunts bugs in measurement pipelines, designs isolation experiments, and iterates. Use when: you want systematic research on a model — from architecture audit through implementation, experimentation, and evidence-driven pivots.
Systematic ML research loop in two modes:
| File | Contents | Read when |
|---|---|---|
diagnostic-investigation.md | Steps 5b.1-5b.8: symptom identification, root cause diagnosis, targeted literature search, agenda reprioritization, measurement verification, isolation experiments, cross-experiment evidence, component health | Step 5 — Evaluate Results |
collaborative-and-templates.md | Human feedback checkpoint (Step 5c), progress summaries, working file template, journal entry template, final report template, config format | Bootstrap (init files), Journaling, Collaborative mode |
All sub-files in: .github/skills/autonomous-research/
Before trusting ANY metric: Is it computed? Computed correctly? Does it measure what we think? When paradoxical → investigate measurement code before model.
Decide what to investigate based on evidence:
Read prior state (parallel sub-agents):
/memories/repo/architecture.md/home/elerson/Documents/Obsidian Vault/Fovea/Training models - Ideas.mdValidate measurement pipeline: For each target metric, trace computation → storage → reporting. Check polarity/encoding. Check init values. Fix bugs first.
Establish baseline: Use recent results (< 24h) or run default-config training. Record ALL metrics.
Architecture audit: Document strengths, weaknesses, unused components, bottlenecks, capacity, inductive bias.
Literature survey (parallel sub-agents):
Read Ideas Inbox: Parse ideas, score by (impact × feasibility) / complexity, mark as QUEUED/TESTED/DEFERRED/OUT_OF_SCOPE.
Build research agenda: Rank all candidates. Interleave inbox items with literature/diagnosis ideas.
Initialize working file → See collaborative-and-templates.md for template.
if args.flag:configuration_suggestions.mdFramework checks (JAX/Flax/JIT):
static_argnums?model.init(), not model.apply()static_argnums carefullyAutonomous mode:
pkill -f "train_" 2>/dev/null; sleep 2{command} --{flag} {value} > /tmp/run_{N}.log 2>&1 &Collaborative mode:
collaborative-and-templates.md)Poll log every 60-90 seconds:
diagnostic-investigation.md and execute Steps 5b.1-5b.8collaborative-and-templates.md→ Use journal entry template from collaborative-and-templates.md.
→ Use final report template from collaborative-and-templates.md.
loss_total:, IC, BSS, RegAcc, Dir Acc, grad_norm, NaN/inf, Epoch, SpreadRatio, AUC
kill -0 $PID)projects/training_lab_jax/models/hierarchical_transformer.pyprojects/training_lab_jax/experiments/train_hierarchical.pyprojects/training_lab_jax/experiments/experiment_journal.mdconfiguration_suggestions.md