스킬 파일

Pipeline

Name: Pipeline
Author: kang-jaehyun

Post-training pipeline - sync checkpoint to skt, run eval on L40S, collect results. Use when user wants end-to-end automation after training completes.

kang-jaehyun0 스타2026. 2. 24.

직업
카테고리: 머신러닝

스킬 내용

Post-Training Pipeline

Automated pipeline: training completion -> checkpoint sync -> evaluation -> result collection.

Quick Start (Automated)

Use the run_pipeline MCP tool for fully automated execution:

run_pipeline(
    source_cluster="rlwrld1",
    job_id="12345",
    eval_task_name="task-Cube_Box-5cmLeft",
    dest_cluster="skt"
)

This automatically: watches training job -> finds checkpoint -> syncs to skt -> submits eval -> waits for eval -> parses metrics -> updates experiment store.

Manual Pipeline Steps

Phase 1: Detect Training Completion

Check the training job status using job_info or watch_job
If still running, offer to watch it (watch_job polls until done)

Pipeline

Post-Training Pipeline

Quick Start (Automated)

Manual Pipeline Steps

Phase 1: Detect Training Completion

Pipeline

Post-Training Pipeline

Quick Start (Automated)

Manual Pipeline Steps

Phase 1: Detect Training Completion

Phase 2: Sync Checkpoint to skt

Phase 3: Run Evaluation on skt L40S

Phase 4: Collect and Report Results

Quick Pipeline Command Examples

Knowledge Base References

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns