Adapt AutoRound to support a new diffusion model architecture (DiT, UNet, hybrid AR+DiT). Use when a new diffusion model fails quantization, needs custom output configs, requires a custom pipeline function, or is a hybrid architecture with both autoregressive and diffusion components.
AutoRound's DiffusionCompressor works with standard diffusers pipelines (e.g., FLUX). This skill covers what code changes are needed when a new diffusion model doesn't work out-of-the-box. Common reasons for adaptation:
output_configspipe(prompts, ...))from auto_round import AutoRound
ar = AutoRound(
"your-org/your-diffusion-model",
scheme="W4A16",
iters=2,
nsamples=2,
num_inference_steps=5,
)
ar.quantize_and_save(output_dir="./test_output", format="fake")
| Error / Symptom | Root Cause | Fix Section |
|---|---|---|
| "using LLM mode" instead of Diffusion | Model not detected as diffusion | Step 1 |
assert len(output_config) == len(tmp_output) | Block output config mismatch | Step 2 |
| Pipeline call fails | Non-standard inference API | Step 3 |
| Hybrid model only quantizes DiT | AR component not handled | Step 4 |
AutoRound detects diffusion models by checking for model_index.json in the
model directory:
# auto_round/utils/model.py
def is_diffusion_model(model_or_path):
# Checks for model_index.json presence
If your model doesn't have model_index.json, either:
ExtraConfig:from auto_round.compressors import ExtraConfig
ar = AutoRound(
model,
extra_config=ExtraConfig(diffusion_config=DiffusionConfig(...)),
)
diffusion_load_model() uses AutoPipelineForText2Image.from_pretrained() and
extracts pipe.transformer as the quantizable model. If your model uses a
different attribute (e.g., pipe.unet), this needs adjustment in
auto_round/utils/model.py.
This is the most common adaptation needed. The output_configs dict maps
transformer block class names to their output tensor names. Without this,
calibration crashes because AutoRound doesn't know how to collect activations.
import diffusers
pipe = diffusers.AutoPipelineForText2Image.from_pretrained("your-model")
for name, module in pipe.transformer.named_modules():
if hasattr(module, "forward") and "block" in name.lower():
print(f"{name}: {type(module).__name__}")
output_configsEdit auto_round/compressors/diffusion/compressor.py:
output_configs = {
"FluxTransformerBlock": ["encoder_hidden_states", "hidden_states"],
"FluxSingleTransformerBlock": ["encoder_hidden_states", "hidden_states"],
# Add your block type:
"YourTransformerBlock": ["hidden_states"], # output tensor names in order
}
The list must match the exact order of tensors returned by the block's
forward() method.
forward() method in diffusers source codehidden_states, sometimes also
encoder_hidden_states)Example: If forward() returns (hidden_states, encoder_hidden_states):
output_configs["YourBlock"] = ["hidden_states", "encoder_hidden_states"]
Example: If forward() returns just hidden_states:
output_configs["YourBlock"] = ["hidden_states"]
If your model's inference API differs from the standard
pipe(prompts, guidance_scale=..., num_inference_steps=...), provide a custom
pipeline function.
pipeline_fn parameter (no code changes)def your_model_pipeline_fn(pipe, prompts, guidance_scale=7.5, num_inference_steps=28, generator=None, **kwargs):
"""Custom pipeline function for YourModel."""
for prompt in (prompts if isinstance(prompts, list) else [prompts]):
pipe.generate(
prompt=prompt,
cfg_scale=guidance_scale,
steps=num_inference_steps,
generator=generator,
)
ar = AutoRound(
"your-model",
pipeline_fn=your_model_pipeline_fn,
num_inference_steps=28,
guidance_scale=7.5,
)
If using diffusion_load_model() directly:
pipe._autoround_pipeline_fn = your_model_pipeline_fn
For full control, override _run_pipeline():
from auto_round.compressors.diffusion.compressor import DiffusionCompressor
class YourModelCompressor(DiffusionCompressor):
def _run_pipeline(self, prompts):
generator = (
None
if self.generator_seed is None
else torch.Generator(device=self.pipe.device).manual_seed(self.generator_seed)
)
self.pipe.your_custom_generate(
prompts,
steps=self.num_inference_steps,
cfg=self.guidance_scale,
generator=generator,
)
For models with both autoregressive and diffusion components (e.g., GLM-Image).
Edit auto_round/compressors/diffusion/hybrid.py:
HYBRID_AR_COMPONENTS = [
"vision_language_encoder", # GLM-Image
"your_ar_component", # Your model's AR attribute name
]
The attribute name must match what exists on the diffusers pipeline object
(i.e., pipe.your_ar_component).
Also in hybrid.py, add the DiT-specific output config:
output_configs["YourDiTBlock"] = ["hidden_states", "encoder_hidden_states"]
In auto_round/special_model_handler.py, add a block handler for the AR
component so AutoRound knows which layers to quantize:
def _get_your_hybrid_multimodal_block(model, quant_vision=False):
block_names = []
if quant_vision and hasattr(model, "vision_encoder"):
block_names.append([f"vision_encoder.blocks.{i}" for i in range(len(model.vision_encoder.blocks))])
block_names.append([f"language_model.layers.{i}" for i in range(len(model.language_model.layers))])
return block_names
SPECIAL_MULTIMODAL_BLOCK["your_model_type"] = _get_your_hybrid_multimodal_block
The HybridCompressor runs two phases:
ar = AutoRound(
"your-hybrid-model",
dataset="coco2014", # DiT calibration
ar_dataset="NeelNanda/pile-10k", # AR calibration
quant_ar=True,
quant_dit=True,
)
If your model needs a specific dataset format:
Edit auto_round/compressors/diffusion/dataset.py:
def get_diffusion_dataloader(dataset_name, nsamples, ...):
# Add handling for your dataset format
if dataset_name == "your_custom_dataset":
return _load_your_dataset(dataset_name, nsamples)
...
The default coco2014 dataset works for most text-to-image models. Custom
datasets need a TSV file with id and caption columns.
def test_your_diffusion_model():
ar = AutoRound(
"your-org/your-diffusion-model",
scheme="W4A16",
iters=2,
nsamples=4,
num_inference_steps=5,
guidance_scale=7.5,
)
compressed_model, layer_config = ar.quantize()
assert len(layer_config) > 0, "No layers quantized"
ar.save_quantized(output_dir="./test_output", format="fake")
For hybrid models, test both phases:
ar = AutoRound(
"your-hybrid-model",
quant_ar=True,
quant_dit=True,
iters=2,
nsamples=4,
)
is_diffusion_model() detects model (or forced via extra_config)output_configs entry added with correct output tensor names and orderpipeline_fn provided if non-standard APIHYBRID_AR_COMPONENTSSPECIAL_MULTIMODAL_BLOCKhybrid.pyfake format works| File | Purpose |
|---|---|
auto_round/compressors/diffusion/compressor.py | DiffusionCompressor, output_configs dict |
auto_round/compressors/diffusion/hybrid.py | HybridCompressor, HYBRID_AR_COMPONENTS |
auto_round/compressors/diffusion/dataset.py | Calibration dataset loading |
auto_round/utils/model.py | is_diffusion_model(), diffusion_load_model() |
auto_round/special_model_handler.py | AR block handlers for hybrid models |
auto_round/autoround.py | Model type routing (diffusion vs hybrid vs LLM) |
| Model | Type | What Was Adapted |
|---|---|---|
| FLUX.1-dev | Pure DiT | output_configs for FluxTransformerBlock/FluxSingleTransformerBlock |
| GLM-Image | Hybrid AR+DiT | HYBRID_AR_COMPONENTS + SPECIAL_MULTIMODAL_BLOCK + DiT output_configs |
| NextStep | Custom pipeline | pipeline_fn parameter for non-standard inference API |