Add a new model to the SGLang Cookbook, including documentation, sidebar, config generator component, and model YAML configuration.
Interactive, multi-step workflow. Collect inputs incrementally — don't ask for everything upfront.
Ask the user for:
Qwen/Qwen3-Coder-Next). Fetch the page to extract description, capabilities, etc. If the model isn't public yet, ask the user to paste what they know (name, param count, architecture, capabilities, context length).Qwen3CoderConfigGenerator and Qwen3NextConfigGenerator for multi-variant patterns.sglang serve --model-path command with all flags (tp, dp, ep, etc.). Not python -m sglang.launch_server (deprecated, issue #33). If the model card provides one, use it as starting point but verify format.v0.5.8data/models/src/<version>/Read ALL reference templates first, then create files.
docs/autoregressive/ (e.g., Qwen3-Coder.md, DeepSeek-V3_2.md)src/components/autoregressive/ (e.g., Qwen3NextConfigGenerator/index.js)data/models/src/<version>/<similar-model>.yaml. List data/models/src/ for available versions.sidebars.jsdata/models/vendors.yamlsrc/components/autoregressive/<ModelNameConfigGenerator>/index.js (not nested in vendor folders)data/models/src/<version>/ (not directly in data/models/src/)ConfigGenerator component: src/components/base/ConfigGeneratorsglang serve — never python -m sglang.launch_servergh pr list --search "<model name>") to avoid duplicate workcommandRule options, follow the Object.entries(this.options).forEach(...) pattern from existing generatorsOnly include platforms the user has actually tested.
| Platform | Vendor | Memory | Docker Image |
|---|---|---|---|
| A100 | NVIDIA | 80GB | lmsysorg/sglang:<ver> |
| H100 | NVIDIA | 80GB | lmsysorg/sglang:<ver> |
| H200 | NVIDIA | 141GB | lmsysorg/sglang:<ver> |
| B200 | NVIDIA | 180GB | lmsysorg/sglang:<ver> |
| B300 | NVIDIA | 275GB | lmsysorg/sglang:<ver> |
| MI300X | AMD | 192GB | lmsysorg/sglang:<ver>-rocm720-mi30x |
| MI325X | AMD | 256GB | lmsysorg/sglang:<ver>-rocm720-mi30x |
| MI350X | AMD | 288GB | lmsysorg/sglang:<ver>-rocm720-mi35x |
| MI355X | AMD | 288GB | lmsysorg/sglang:<ver>-rocm720-mi35x |
TP calculation: model_weight_GB / gpu_mem_GB, round up to nearest power of 2. Leave 20-30% headroom.
Platform-specific flags (only add if tested):
--attention-backend trtllm_mha--attention-backend tritonSGLANG_USE_AITER=1, SGLANG_ROCM_FUSED_DECODE_MLA=0heads_per_gpu % 16 == 0)Create docs/autoregressive/<Vendor>/<ModelName>.md:
TODO output placeholdersTODO result placeholdersBenchmark commands:
python3 benchmark/gsm8k/bench_sglang.py --port <port>python3 benchmark/mmlu/bench_sglang.py --port <port>python3 benchmark/mmmu/bench_sglang.py --port <port> — uses a universal answer regex that works across models. Don't use model-specific parsing (e.g., <|begin_of_box|>) as it breaks with standard answer formats.python3 -m sglang.bench_serving --backend sglang --num-prompts 10 --max-concurrency 1 ...python3 -m sglang.bench_serving --backend sglang --num-prompts 1000 --max-concurrency 100 ...Keep benchmarks concise. Order: accuracy first, then speed. Don't add multiple scenarios or concurrency levels unless asked.
Notes:
temperature, top_p) — SGLang uses generation_config.json defaultsenable_thinking: False) examplesChatCompletionMessage(...)) into readable structured outputEdit sidebars.js — add the new entry under the right vendor.
Update docs/intro.md (homepage):
- [x] if doc has real content, - [ ] if stub/placeholderNEW tags to 3 or fewer total — if adding one, remove the oldest first (check git history)intro.md should match sidebars.jsCreate src/components/autoregressive/<ModelName>ConfigGenerator/index.js.
ConfigGenerator componentmodelConfigs with per-hardware tp and mem values: h200: { fp8: { tp: 8, mem: 0.85 }, bf16: { tp: 16, mem: 0.85 } }generateCommand:
const isAMD = ['mi300x','mi325x','mi350x','mi355x'].includes(hardware);
const isBlackwell = ['b200','b300'].includes(hardware);
if (isAMD) { /* AMD-specific flags */ }
if (isBlackwell) { /* Blackwell-specific flags */ }
commandRule for optional features (tool calling, reasoning parser, etc.)Reasoning parser: For hybrid models, use Enabled/Disabled toggle (the model always thinks; parser just separates output). For separate Instruct/Thinking variants, toggle changes the model name suffix.
DP Attention: Disabled (Low Latency) / Enabled (High Throughput). The --dp value commonly matches --tp but this isn't mandatory. Handle in generateCommand, not via static commandRule:
if (values.dpattention === 'enabled') {
cmd += ` \\\n --dp ${tpValue} \\\n --enable-dp-attention`;
}
In config tips, describe --dp matching --tp as a common pattern, not a requirement.
Large models (>400B): BF16 needs ~2x GPUs vs FP8. Reflect this in modelConfigs. Omit combos that don't fit.
Multiple variants: Add modelSize and/or quantization selectors. See GLM51ConfigGenerator, GLM5ConfigGenerator, Qwen3CoderConfigGenerator, Qwen3NextConfigGenerator for patterns.
Platform-required flags: If a platform requires certain flags to function at all (e.g., AMD MI355X needs --attention-backend triton), add them unconditionally for that platform — NOT gated behind optional checkboxes like "Performance Optimizations". Optional optimizations go inside checkbox guards; required-to-work flags go outside.
No dead code: Don't define commandRule on options if generateCommand handles them directly (the rules will never be called). Don't use getDynamicItems if the items don't depend on other option values — use static items instead. Don't leave unused helper functions.
No silent ignores: If a feature (e.g., DP attention) is unsupported on a platform, either disable the UI option or show an explicit message (like a "Work In Progress" note). Never silently drop user selections.
Scope discipline: If adding support for one platform, don't accidentally add global flags. Always check conditionals: if (quantization === 'fp8') without a hardware guard affects ALL platforms. Be explicit: if (hardware === 'h200' && quantization === 'fp8').
License accuracy: Always verify the actual HuggingFace model license before writing the license section. Don't copy from other model docs — licenses vary (Apache 2.0, MIT, community licenses, etc.).
Create data/models/src/<version>/<modelname>.yaml:
default — balanced single-nodehigh-throughput-dp — if DP attention supportedspeculative-mtp or speculative-eagle — if speculative decoding supportedValid thinking_capability enum values: non_thinking, thinking, hybrid. Don't use hybrid_thinking or other variants — pre-commit validation rejects them.
Ensure venv exists:
python3 -m venv .venv
source .venv/bin/activate && pip install pre-commit pyyaml
Compile and validate:
source .venv/bin/activate && python data/scripts/compile_models.py
cd data/schema && npm install && npm test
Full build (catches import errors, broken links, component issues — more reliable than dev server):
npm run build
Dev server for visual check:
npm start
Check the page renders at http://localhost:3000.
User deploys the model, runs test scripts, pastes results. Replace TODO placeholders with actual outputs:
Ask for:
mem-fraction-static valuesAdd to docs.
Can be triggered with /add-model review. Also consider running /review-pr on the PR for an automated checklist pass.
Review the complete documentation for:
```)base_url port on the same pageTODO placeholders replaced with actual resultsexport default matches the actual class name (common copy-paste bug)sglang serve — no deprecated python -m sglang.launch_server or python3 -m sglang.launch_servermodelConfigs include both tp and mem values per hardware/quantization--dp value dynamically matches --tp in the generatordocs/intro.md) includes the new model entry and matches sidebar orderChatCompletionMessage(...)) are formatted into readable structured output (Reasoning/Content/Tool Calls sections)commandRule, unused helper functions, getDynamicItems returning static arrays)export VAR=<your-value>), not export VAR=${VAR} (which is a bash no-op)Always create a new branch — never commit to main directly.
git checkout -b add-<model-name>
# ... make changes ...
git add <specific files>
git commit -m "Add <Model Name> cookbook"
git push -u origin add-<model-name>
gh pr create --title "Add <Model Name> cookbook" --body "..."
When checking homepage entries, verify the doc has real content — not just a "Community contribution welcome" stub.