ModelOpt Post-Training Quantization

Produce a quantized checkpoint from a pretrained model. Read examples/llm_ptq/README.md first — it has the support matrix, CLI flags, and accuracy guidance.

Step 1 — Environment

Read skills/common/environment-setup.md and skills/common/workspace-management.md. After completing them you should know:

ModelOpt source is available
Local or remote (+ cluster config if remote)
SLURM / Docker+GPU / bare GPU
Launcher available?
Which workspace to use

Step 2 — Is the model supported?

Check the support table in examples/llm_ptq/README.md for verified HF models.

Listed → supported, use hf_ptq.py (step 4A/4B)
Not listed → read references/unsupported-models.md to determine if can still work or if a custom script is needed (step 4C)

ModelOpt Post-Training Quantization

Produce a quantized checkpoint from a pretrained model. Read examples/llm_ptq/README.md first — it has the support matrix, CLI flags, and accuracy guidance.

Step 1 — Environment

Read skills/common/environment-setup.md and skills/common/workspace-management.md. After completing them you should know:

ModelOpt source is available
Local or remote (+ cluster config if remote)
SLURM / Docker+GPU / bare GPU
Launcher available?
Which workspace to use

Step 2 — Is the model supported?

Check the support table in examples/llm_ptq/README.md for verified HF models.

Listed → supported, use hf_ptq.py (step 4A/4B)
Not listed → read references/unsupported-models.md to determine if can still work or if a custom script is needed (step 4C)

Reference	When to read
`skills/common/environment-setup.md`	Step 1: always
`skills/common/workspace-management.md`	Step 1: always
`references/launcher-guide.md`	Step 4B only (launcher path)
`tools/launcher/CLAUDE.md`	Step 4B only, if you need more launcher detail
`references/unsupported-models.md`	Step 4C only (unlisted model)
`references/checkpoint-validation.md`	Step 5: validate quantization pattern matches recipe
`skills/common/remote-execution.md`	Step 4A/4C only, if target is remote
`skills/common/slurm-setup.md`	Step 4A/4C only, if using SLURM manually (not launcher)
`references/slurm-setup-ptq.md`	Step 4A/4C only, PTQ-specific SLURM (container, GPU sizing, FSDP2)
`examples/llm_ptq/README.md`	Step 3: support matrix, CLI flags, accuracy
`modelopt/torch/quantization/config.py`	Step 3: format definitions
`modelopt/torch/export/model_utils.py`	Step 4C: TRT-LLM export type mapping
`modelopt_recipes/`	Step 3: pre-built recipes

Ptq

ModelOpt Post-Training Quantization

Step 1 — Environment

Step 2 — Is the model supported?

Ptq

ModelOpt Post-Training Quantization

Step 1 — Environment

Step 2 — Is the model supported?

Step 2.5 — Check for model-specific dependencies

Step 3 — Choose quantization format

Step 4 — Run PTQ

Which path?

4A — Direct: supported model, manual execution

4B — Launcher: supported model on SLURM or local Docker

4C — Unlisted model

Monitoring

Step 5 — Verify output

Post-quantization validation

Key API Rules

Common Pitfalls

References

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns