Guide to SGLang CI workflow orchestration — stage ordering, fast-fail, gating, partitioning, execution modes, and debugging CI failures. Use when modifying CI workflows, adding stages, debugging CI pipeline issues, or understanding how tests are dispatched and gated across stages.
This skill covers the CI infrastructure layer — how tests are dispatched, gated, and fast-failed across stages. For test authoring (templates, fixtures, registration, model selection), see the write-sglang-test skill.
stage-{a,b,c}-test-{gpu_count}-gpu-{hardware} (e.g., stage-b-test-1-gpu-small){gpu_count}-gpu-{hardware} (e.g., 1-gpu-5090, 4-gpu-h100, 8-gpu-h200)| File | Role |
|---|---|
.github/workflows/pr-test.yml| Main workflow — all stages, jobs, conditions, matrix definitions |
.github/workflows/pr-gate.yml | PR gating: draft check, run-ci label, per-user rate limiting |
.github/actions/check-stage-health/action.yml | Cross-job fast-fail: queries API for any failed job |
.github/actions/wait-for-jobs/action.yml | Stage gating: polls API until stage jobs complete |
.github/actions/check-maintenance/action.yml | Maintenance mode check |
test/run_suite.py | Suite runner: collects, filters, partitions, executes tests |
python/sglang/test/ci/ci_register.py | Test registration (AST-parsed markers), LPT auto-partition |
python/sglang/test/ci/ci_utils.py | run_unittest_files(): execution, retry, continue-on-error |
scripts/ci/utils/slash_command_handler.py | Handles slash commands from PR comments |
┌──────────────┐
│ build kernel │
└──────┬───────┘
│
├─ check-changes ──── detects which packages changed
│ (main_package, sgl_kernel, jit_kernel, multimodal_gen)
│
├─ call-gate ──────── pr-gate.yml (draft? label? rate limit?)
│
├─────────────────────────────────────────────────────┐
│ │
▼ │
┌─────────────────────────────────────┐ │
│ Stage A (~3 min) │ │
│ pre-flight check │ │
│ │ │
│ ┌─────────────────────────────┐ │ │
│ │ stage-a-test-1-gpu-small │ │ │
│ │ (small GPUs) │ │ │
│ └─────────────────────────────┘ │ │
│ ┌─────────────────────────────┐ │ │
│ │ stage-a-test-cpu │ │ │
│ │ (CPU) │ │ │
│ └─────────────────────────────┘ │ │
└──────┬──────────────────────────────┘ │
│ │
▼ ▼
┌─────────────────────────────────────┐ ┌──────────────────────────┐
│ Stage B (~30 min) │ │ kernel test │
│ basic tests │ └──────────────────────────┘
│ │ ┌──────────────────────────┐
│ ┌─────────────────────────────┐ │ │ multimodal gen test │
│ │ stage-b-test-1-gpu-small │ │ └──────────────────────────┘
│ │ (small GPUs, e.g. 5090) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ stage-b-test-1-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ stage-b-test-2-gpu-large │ │
│ │ (large GPUs, e.g. H100) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Stage C (~30 min) │
│ advanced tests │
│ │
│ ┌─────────────────────────────┐ │
│ │ stage-c-test-4-gpu-h100 │ │
│ │ (H100 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ stage-c-test-8-gpu-h200 │ │
│ │ (8 x H200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ stage-c-test-4-gpu-b200 │ │
│ │ (4 x B200 GPUs) │ │
│ └─────────────────────────────┘ │
│ ┌─────────────────────────────┐ │
│ │ Other advanced tests │ │
│ │ (DeepEP, PD Disagg, GB300) │ │
│ └─────────────────────────────┘ │
└──────┬──────────────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ pr-test-finish │
│ aggregates all results, fails if │
│ any job failed/cancelled │
└─────────────────────────────────────┘
Every stage test job includes a check-stage-health step after checkout — if any job in the run has already failed, the job fast-fails (red X) with a root cause annotation.
Scheduled runs skip wait-for-stage-* jobs, running all stages in parallel. Fast-fail is also disabled.
4 layers of fast-fail, from fine to coarse:
| Layer | Mechanism | Granularity | Disabled on schedule? |
|---|---|---|---|
| 1. Test method → file | unittest -f (failfast) | One test method fails → entire test file stops immediately | Yes |
| 2. File → suite | run_unittest_files() default | One test file fails → entire suite stops (--continue-on-error off) | Yes |
| 3. Job → job (same stage) | check-stage-health action | One job fails → other waiting jobs in same stage fast-fail (red X) | Yes |
| 4. Stage → stage (cross-stage) | wait-for-stage + needs | Stage A fails → stage B/C jobs skip entirely (never get a runner) | Yes (wait jobs skipped) |
-f flag appended to all python3 -m pytest / unittest invocations in ci_utils.py--continue-on-error flag in run_suite.py — off for PRs, on for scheduled runscheck-stage-health auto-detects schedule event and skips; filters out cascade failures to show only root cause jobswait-for-stage-* jobs are conditioned on github.event_name == 'pull_request' — skipped for scheduled runs| Aspect | PR (pull_request) | Scheduled (cron, every 6h) | /rerun-stage (workflow_dispatch) |
|---|---|---|---|
| Stage ordering | Sequential: A → B → C via wait-for-stage-* | Parallel (all at once) | Single target stage only |
| Cross-job fast-fail | Yes (check-stage-health) | Yes | Yes |
| continue-on-error | No (stop at first failure within suite) | Yes (run all tests) | No |
| Retry | Enabled | Enabled | Enabled |
| max_parallel | 3 (default), 14 if high priority label | 14 | 3 (default), 14 if high priority |
| PR gate | Yes (draft, label, rate limit) | Skipped | Skipped |
| Concurrency | cancel-in-progress: true per branch | Queue (no cancel) | Isolated per stage+SHA |
wait-for-jobs action)wait-for-stage-a and wait-for-stage-b are lightweight ubuntu-latest jobs that poll the GitHub Actions API.
How it works:
listJobsForWorkflowRun to list all jobs in the current runstage-b-test-1-gpu-small (3))conclusion === 'failure' → fail immediately (fast-fail)expected_count → successpoll-interval-seconds (default: 60s) and retrymax-wait-minutes (240 min for stage-a, 480 min for stage-b)Job specs example (stage-b):
[
{"prefix": "stage-b-test-1-gpu-small", "expected_count": 8},
{"prefix": "stage-b-test-1-gpu-large", "expected_count": 14},
{"prefix": "stage-b-test-2-gpu-large", "expected_count": 4},
{"prefix": "stage-b-test-4-gpu-b200", "expected_count": 1}
]
Critical:
expected_countmust match the matrix size. If you add/remove matrix entries, update the wait job's spec accordingly.
PR only: Condition github.event_name == 'pull_request' && !inputs.target_stage — scheduled runs and /rerun-stage skip these entirely, allowing parallel execution.
check-stage-health action)Composite action called after checkout in every stage test job (21 jobs total across pr-test.yml, pr-test-multimodal-gen.yml, pr-test-sgl-kernel.yml, pr-test-jit-kernel.yml).
How it works:
listJobsForWorkflowRun for the current workflow runconclusion === 'failure' whose failing step is NOT check-stage-health (excludes cascade failures)core.setFailed() with the list of root cause job namesCascade filtering: When job A fast-fails due to health check, it also has conclusion: failure. Without filtering, job B would list both the original failure AND job A's fast-fail. The filter checks each failed job's steps array — if the failing step name contains check-stage-health or Check stage health, it's excluded from the root cause list.
Usage pattern: