Scope: aggregate/IFB (in-flight batching) colocated prefill+decode, single node, PyTorch backend, non-speculative by default; DeepSeek-R1 MTP is the standard mode (all checked-in configs include it).

Input: model, GPU, ISL (input sequence length), OSL (output sequence length), concurrency, TP, performance objective (Min Latency | Balanced | Max Throughput | unspecified). Output: repo-grounded starting YAML for trtllm-serve --config.

If the request is adjacent but out of scope, provide a best-effort answer using the nearest in-scope config as a starting point, clearly label inferred vs. verified fields, and point to the relevant feature doc in docs/source/features/ (e.g., speculative-decoding, disagg-serving, parallel-strategy) or examples/llm-api/.

Constraints

Speculative exclusion: Exclude configs containing speculative_config by default. Exception: exact checked-in DeepSeek-R1 MTP configs (models with decoding_type: MTP in examples/configs/). When including MTP, copy the full block verbatim — never interpolate speculative fields.

Constraints

Speculative exclusion: Exclude configs containing speculative_config by default. Exception: exact checked-in DeepSeek-R1 MTP configs (models with decoding_type: MTP in examples/configs/). When including MTP, copy the full block verbatim — never interpolate speculative fields.

Serve Config Guide

Constraints

Serve Config Guide

Constraints

Response Format

Step 0: Lock Objective and Decode Mode

Step 1: Exact Database Match

Step 2: Nearest Checked-In Config

Step 3: Read Model Docs

Step 4: Adjust Source-Backed Fields

Validation Checklist

Mcporter

Sonoscli

Openhue

Healthcheck

Things Mac

Eightctl