Skill File

Onboard Gb200 1node Tests

Name: Onboard Gb200 1node Tests
Author: NVIDIA

Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests. Use when the user asks to add GB200 github-mr tests, create single-node variants of existing tests, or expand CI coverage for GB200.

NVIDIA16,079 starsApr 17, 2026

Occupation
Categories: Testing

Skill Content

Onboard GB200 1-Node GitHub MR Tests

Create 1-node (mr-github) variants of existing 2-node (mr-scoped) GB200 functional tests. Each GB200 node has 4 GPUs. A 2-node test uses 8 GPUs total; the 1-node variant uses 4.

Background

GB200 functional tests live in tests/test_utils/recipes/gb200/:

Recipe file	Notes
`gpt.yaml`	GPT dense tests, `nodes: 2, gpus: 4` (8 total)
`moe.yaml`	MoE tests, `nodes: 2, gpus: 4` (8 total)
`moe-1node.yaml`	Existing 1-node MoE tests, `nodes: 1, gpus: 4` (4 total)
`gpt-1node.yaml`	1-node GPT tests (create if not present)

Model configs live at: tests/functional_tests/test_cases/{model}/{test_case}/model_config.yaml

Related Skills

Onboard Gb200 1node Tests | Skills Pool

--tensor-model-parallel-size   (TP)
--pipeline-model-parallel-size (PP)
--expert-model-parallel-size   (EP)
--expert-tensor-parallel-size  (ETP)
--context-parallel-size        (CP)
--global-batch-size
--micro-batch-size

Condition	Action
`TP × PP ≤ 4`	Trivial copy. Config unchanged; DP is halved automatically.
`TP × PP = 8` (e.g. tp4 pp2)	Reduce PP. Set `PP = PP / 2` (e.g. pp2→1). Verify `TP × PP_new ≤ 4`.
`EP > 4` (e.g. ep8 with tp1 pp1)	Reduce EP. Set `EP = 4`. Experts stay at `num-experts` (each EP rank holds more experts).
`EP > 4` and `TP × PP > 4`	Reduce both PP and EP as above.
ETP test (ep × etp ≤ TP × DP)	Check `EP × ETP ≤ TP × DP_new` after PP reduction. Usually satisfied when pp→1.

# Trivial copy
mkdir -p tests/functional_tests/test_cases/{model}/{test_case}_1node
cp tests/functional_tests/test_cases/{model}/{test_case}/model_config.yaml \
   tests/functional_tests/test_cases/{model}/{test_case}_1node/model_config.yaml

# Then apply any parallelism changes (EP or PP) with Edit tool

Onboard Gb200 1node Tests

Onboard GB200 1-Node GitHub MR Tests

Background

Onboard Gb200 1node Tests

Onboard GB200 1-Node GitHub MR Tests

Background

Workflow

Step 1 — Find candidate tests

Step 2 — Read each model config

Step 3 — Classify: trivial copy vs. needs adaptation

Step 4 — Create `_1node` model config directories

Step 5 — Create or update recipe files

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Onboard Gb200 1node Tests

Onboard GB200 1-Node GitHub MR Tests

Background

Onboard Gb200 1node Tests

Onboard GB200 1-Node GitHub MR Tests

Background

Workflow

Step 1 — Find candidate tests

Step 2 — Read each model config

Step 3 — Classify: trivial copy vs. needs adaptation

Step 4 — Create _1node model config directories

Step 5 — Create or update recipe files

Test

Feature Flags

Unit Tests

Integration Tests

Write Frontend Tests

Golang Testing

Step 4 — Create `_1node` model config directories