Standardize mechanistic interpretability experiments for TRM-style recurrent models with a fixed trace to SAE feature to intervention to report loop. Use when implementing or running reproducible tomography studies, feature ablation and patch tests, and baseline versus intervened evaluation reports from stepwise traces.
Use this workflow to prevent one-off experiment scripts and keep outputs comparable.
Enforce exactly three commands:
trminterp tracetrminterp sae fittrminterp interveneEnforce exactly two report formats:
trace.jsonlreport.jsonAllow optional caches/artifacts:
states.npysae.ptsae_metrics.jsonRun in this order:
trminterp tracetrminterp sae fittrminterp interveneUse deterministic seeds for every stage. Write all generated outputs under a single run directory. Never mix code changes and generated outputs in the same commit unless explicitly requested.
If the repo has no trminterp binary yet, map to existing modules and keep output names stable:
trminterp trace:
python -m mechinterp_cli trace --out-dir <run_dir>/traces --seed <seed>
Then normalize/rename the selected trace file to <run_dir>/trace.jsonl. Optionally export stacked states to <run_dir>/states.npy.
trminterp sae fit:
python -m analysis.sae --summary-json <run_dir>/summary.json --out-json <run_dir>/sae_metrics.json --save-model-json <run_dir>/sae.pt
trminterp intervene:
python -m analysis.causality --summary-json <run_dir>/summary.json --seed <seed> --out-json <run_dir>/report.json
Keep the report focused on baseline vs intervened outcomes and top causal features.
Implement only these policies in v0:
ablate_featuresclamp_featurespatch_featuresAvoid GUI work, feature browsers, seed-matching research tooling, and model-zoo abstractions in this skill version.
Keep internal abstractions minimal:
TraceDataset.load(path) -> states[N,D], metaSAE.encode(h) -> a, SAE.decode(a) -> h_hatInterventionPolicy.apply(step, h, sae, context) -> h_primeEvaluator.compare(baseline_trace, new_trace) -> reportRead references/v0-spec.md for the exact payload expectations.