Record experiment plans and run outcomes under `kb/programs/<program-id>/experiments/runs/`, maintain a durable follow-up view, and capture statuses, metrics, failure modes, linked artifacts, and next actions. Use when Codex needs to persist experiment execution details instead of leaving them in chat or only in a static experiment matrix.
Convert run-by-run experimentation into durable program memory.
AGENTS.md plus kb/memory/user-profile.yaml.workflow/state.yaml and existing experiments/* context.kb/programs/<program-id>/experiments/runs/.kb/programs/<program-id>/experiments/run-log.yamlkb/programs/<program-id>/experiments/follow-up.mdexperiments/matrix.yaml; complement it with run-by-run execution memory.python3 .agents/skills/research-experiment-tracker/scripts/track_experiment.py log-run \
--program-id my-program \
--title "hierarchical interface baseline v0" \
--intent "验证最小分层接口是否至少能达到 baseline parity" \
--status failed \
--result-summary "任务成功率没有超过 baseline,且低层接口对齐不稳定。" \
--failure-mode "latent interface 与 low-level API 对齐不稳定" \
--next-action "先退回更简单的 skill token 接口,再重做对照实验。" \
--metric success_rate=0.18 \
--metric recovery_rate=0.05
python3 .agents/skills/research-experiment-tracker/scripts/track_experiment.py preview \
--program-id my-program \
--title "实验记录预览" \
--intent "只预览,不写文件。"
method-designer; this skill logs execution and follow-up, not the whole design pack.