Name: Ci Workflow Guide
Author: sgl-project

搵技能.../

Ci Workflow Guide | Skills Pool

.github/workflows/pr-test.yml

 ┌──────────────┐
 │ build kernel │
 └──────┬───────┘
        │
        ├─ check-changes ──── detects which packages changed
        │                      (main_package, sgl_kernel, jit_kernel, multimodal_gen)
        │
        ├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
        │
        ├─────────────────────────────────────────────────────┐
        │                                                     │
        ▼                                                     │
 ┌─────────────────────────────────────┐                      │
 │          Stage A (~3 min)           │                      │
 │         pre-flight check            │                      │
 │                                     │                      │
 │  ┌─────────────────────────────┐    │                      │
 │  │ stage-a-test-1-gpu-small    │    │                      │
 │  │ (small GPUs)                │    │                      │
 │  └─────────────────────────────┘    │                      │
 │  ┌─────────────────────────────┐    │                      │
 │  │ stage-a-test-cpu            │    │                      │
 │  │ (CPU)                       │    │                      │
 │  └─────────────────────────────┘    │                      │
 └──────┬──────────────────────────────┘                      │
        │                                                     │
        ▼                                                     ▼
 ┌─────────────────────────────────────┐          ┌──────────────────────────┐
 │          Stage B (~30 min)          │          │      kernel test         │
 │           basic tests               │          └──────────────────────────┘
 │                                     │          ┌──────────────────────────┐
 │  ┌─────────────────────────────┐    │          │   multimodal gen test    │
 │  │ stage-b-test-1-gpu-small    │    │          └──────────────────────────┘
 │  │ (small GPUs, e.g. 5090)     │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ stage-b-test-1-gpu-large    │    │
 │  │ (large GPUs, e.g. H100)     │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ stage-b-test-2-gpu-large    │    │
 │  │ (large GPUs, e.g. H100)     │    │
 │  └─────────────────────────────┘    │
 └──────┬──────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────────┐
 │          Stage C (~30 min)          │
 │          advanced tests             │
 │                                     │
 │  ┌─────────────────────────────┐    │
 │  │ stage-c-test-4-gpu-h100     │    │
 │  │ (H100 GPUs)                 │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ stage-c-test-8-gpu-h200     │    │
 │  │ (8 x H200 GPUs)             │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ stage-c-test-4-gpu-b200     │    │
 │  │ (4 x B200 GPUs)             │    │
 │  └─────────────────────────────┘    │
 │  ┌─────────────────────────────┐    │
 │  │ Other advanced tests        │    │
 │  │ (DeepEP, PD Disagg, GB300)  │    │
 │  └─────────────────────────────┘    │
 └──────┬──────────────────────────────┘
        │
        ▼
 ┌─────────────────────────────────────┐
 │         pr-test-finish              │
 │  aggregates all results, fails if   │
 │  any job failed/cancelled           │
 └─────────────────────────────────────┘

Layer	Mechanism	Granularity	Disabled on schedule?
1. Test method → file	`unittest -f` (failfast)	One test method fails → entire test file stops immediately	Yes
2. File → suite	`run_unittest_files()` default	One test file fails → entire suite stops (`--continue-on-error` off)	Yes
3. Job → job (same stage)	`check-stage-health` action	One job fails → other waiting jobs in same stage fast-fail (red X)	Yes
4. Stage → stage (cross-stage)	`wait-for-stage` + `needs`	Stage A fails → stage B/C jobs skip entirely (never get a runner)	Yes (wait jobs skipped)

Aspect	PR (`pull_request`)	Scheduled (`cron`, every 6h)	`/rerun-stage` (`workflow_dispatch`)
Stage ordering	Sequential: A → B → C via `wait-for-stage-*`	Parallel (all at once)	Single target stage only
Cross-job fast-fail	Yes (`check-stage-health`)	Yes	Yes
continue-on-error	No (stop at first failure within suite)	Yes (run all tests)	No
Retry	Enabled	Enabled	Enabled
max_parallel	3 (default), 14 if `high priority` label	14	3 (default), 14 if `high priority`
PR gate	Yes (draft, label, rate limit)	Skipped	Skipped
Concurrency	`cancel-in-progress: true` per branch	Queue (no cancel)	Isolated per stage+SHA

[
  {"prefix": "stage-b-test-1-gpu-small", "expected_count": 8},
  {"prefix": "stage-b-test-1-gpu-large", "expected_count": 14},
  {"prefix": "stage-b-test-2-gpu-large", "expected_count": 4},
  {"prefix": "stage-b-test-4-gpu-b200", "expected_count": 1}
]

Ci Workflow Guide

SGLang CI Workflow Orchestration Guide

Naming Conventions

Key Files

Ci Workflow Guide

SGLang CI Workflow Orchestration Guide

Naming Conventions

Key Files

Architecture Overview

Fast-Fail Layers

Execution Modes

Stage Gating (`wait-for-jobs` action)

Cross-Job Fast-Fail (`check-stage-health` action)

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns

Ci Workflow Guide

SGLang CI Workflow Orchestration Guide

Naming Conventions

Key Files

Ci Workflow Guide

SGLang CI Workflow Orchestration Guide

Naming Conventions

Key Files

Architecture Overview

Fast-Fail Layers

Execution Modes

Stage Gating (wait-for-jobs action)

Cross-Job Fast-Fail (check-stage-health action)

Github

Openclaw Parallels Smoke

Update Screenshots

Azure Pipelines

Deployment Patterns

Deployment Patterns

Stage Gating (`wait-for-jobs` action)

Cross-Job Fast-Fail (`check-stage-health` action)