Post-training pipeline - sync checkpoint to skt, run eval on L40S, collect results. Use when user wants end-to-end automation after training completes.
Automated pipeline: training completion -> checkpoint sync -> evaluation -> result collection.
Use the run_pipeline MCP tool for fully automated execution:
run_pipeline(
source_cluster="rlwrld1",
job_id="12345",
eval_task_name="task-Cube_Box-5cmLeft",
dest_cluster="skt"
)
This automatically: watches training job -> finds checkpoint -> syncs to skt -> submits eval -> waits for eval -> parses metrics -> updates experiment store.
job_info or watch_jobwatch_job polls until done)find_checkpoint_path helper auto-detects the checkpoint locationsync_checkpoint relays via local Mac (rsync down, then rsync up)remote_exec to run huggingface-cli uploadeval_configs.isaacsim_default):
run_eval toolparse_eval_metrics for:
best_metric and statusexperiment_summaryUser: "학습 끝나면 eval까지 자동으로 돌려줘" -> Watch job -> sync ckpt to skt -> submit eval -> collect results
User: "aws에서 끝난 체크포인트 skt로 옮겨서 eval 돌려" -> sync_checkpoint aws->skt -> run_eval on skt
User: "지금 끝난 실험 결과 정리해줘" -> find completed experiments -> tail eval logs -> parse metrics -> update store
training_presets.isaacsim_finetune_default - standard training configeval_configs.isaacsim_default - standard eval configeval_configs.isaacsim_tasks - task name mappingsync.eval_hub - skt (centralized eval)sync.eval_partition - l40s-gpu