Name: Add Vlm Model
Author: intel

Add Vlm Model | Skills Pool

model.layers

thinker.model.layers

language_model.layers

def _get_your_vlm_multimodal_block(model, quant_vision=False):
    """Get block names for YourVLM model.

    YourVLM structure:
    - model.vision_encoder.blocks: vision encoder
    - model.projector.layers: vision-language projector
    - model.language_model.layers: text decoder

    By default, only the text decoder is quantized. Set quant_vision=True
    to include vision encoder and projector blocks.
    """
    block_names = []

    if quant_vision:
        if hasattr(model, "model") and hasattr(model.model, "vision_encoder"):
            if hasattr(model.model.vision_encoder, "blocks"):
                block_names.append(
                    [f"model.vision_encoder.blocks.{i}" for i in range(len(model.model.vision_encoder.blocks))]
                )
        # Add projector if it has quantizable layers
        if hasattr(model, "model") and hasattr(model.model, "projector"):
            if hasattr(model.model.projector, "layers"):
                block_names.append([f"model.projector.layers.{i}" for i in range(len(model.model.projector.layers))])

    # Language model layers (always quantized)
    if hasattr(model, "model") and hasattr(model.model, "language_model"):
        if hasattr(model.model.language_model, "layers"):
            block_names.append(
                [f"model.language_model.layers.{i}" for i in range(len(model.model.language_model.layers))]
            )

    return block_names

SPECIAL_MULTIMODAL_BLOCK["your_vlm"] = _get_your_vlm_multimodal_block

# If your VLM supports text-only calibration (most do):
SUPPORT_ONLY_TEXT_MODELS.append("your_vlm")

# If your VLM has batch size limitations:
mllms_with_limited_bs = (
    ...,
    "your_vlm",
)

{
    "model_type": "your_vlm",
    "format_user": "<|user|>\n{content}\n",
    "format_assistant": "<|assistant|>\n{content}\n",
    "format_system": "<|system|>\n{content}\n",
    "format_observation": "",
    "system": "",
    "separator": "",
    "stop_words": ["<|end|>"]
}

_register_template(
    "your_vlm",
    default_dataset="liuhaotian/llava_conv_58k",  # or appropriate dataset
    processor=PROCESSORS["default"],  # or a custom processor
)

def _your_vlm_processor(raw_data, model_path, seqlen, processor=None, **kwargs):
    """Process calibration data for YourVLM.

    Args:
        raw_data: Dataset samples
        model_path: Path to the model
        seqlen: Sequence length for calibration
        processor: The model's processor

    Returns:
        list: Processed samples ready for calibration
    """
    # Build prompts with images and text
    ...

PROCESSORS["your_vlm"] = _your_vlm_processor

def _your_vlm_forward(model, **kwargs):
    """Custom forward pass for YourVLM during calibration."""
    # Handle special input processing
    # Route inputs to correct sub-models
    return model.language_model(**kwargs)

def _handle_special_model(model):
    ...
    if hasattr(model, "config") and model.config.model_type == "your_vlm":
        from functools import partial

        model.forward = partial(_your_vlm_forward, model)
    return model

@register_dataset("your_vlm_dataset")
class YourVLMDataset:
    def __init__(self, dataset_name, model_path, seqlen, **kwargs): ...

    def __len__(self):
        return len(self.data)

    def __iter__(self):
        for sample in self.data:
            yield sample

def test_your_vlm_quantization():
    model_name = "your-org/your-vlm-small"
    ar = AutoRound(
        model_name,
        bits=4,
        group_size=128,
        iters=2,
        nsamples=2,
        quant_nontext_module=False,  # text-only quantization
    )
    compressed_model, _ = ar.quantize()
    ar.save_quantized(output_dir="./tmp_your_vlm", format="auto_round")

ar = AutoRound(
    model_name,
    bits=4,
    group_size=128,
    quant_nontext_module=True,  # also quantize vision encoder
)

Model Type	Block Handler	Template	Special Forward
`llava`	`_get_llava_multimodal_block`	llava template	No
`qwen2_vl`	`_get_qwen2_vl_multimodal_block`	qwen2_vl template	No
`qwen2_5_omni`	`_get_qwen2_5_omni_multimodal_block`	qwen2_5_omni template	Yes (`_qwen2_5_omni_forward`)
`qwen3_omni_moe`	`_get_qwen3_omni_moe_multimodal_block`	qwen3_omni_moe template	Yes (`_qwen3_omni_moe_forward`)
`deepseek_vl_v2`	`_get_deepseek_vl2_multimodal_block`	deepseek_vl_v2 template	Yes (`_deepseek_vl2_forward`)
`glm_image`	`_get_glm_image_multimodal_block`	glm_image template	No
`phi3_v`	via generic handler	phi3_v template	No

What	Where	Mechanism
Block handler	`special_model_handler.py`	`SPECIAL_MULTIMODAL_BLOCK[model_type]`
Text-only support	`special_model_handler.py`	`SUPPORT_ONLY_TEXT_MODELS` list
Batch limit	`special_model_handler.py`	`mllms_with_limited_bs` tuple
Template	`compressors/mllm/templates/*.json`	`_register_template()`
Processor	`compressors/mllm/template.py`	`PROCESSORS` dict
Custom forward	`special_model_handler.py`	`_handle_special_model()`
Dataset loader	`calib_dataset.py`	`@register_dataset()`

Add Vlm Model

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Add Vlm Model

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Step 1: Add Multimodal Block Handler

1a. Create a block discovery function

1b. Register in the `SPECIAL_MULTIMODAL_BLOCK` dict

1c. Add to support lists

Step 2: Add Calibration Template

2a. Create template JSON

2b. Register the template

2c. Add a custom processor (if needed)

Step 3: Handle Special Forward Pass (If Needed)

Step 4: Add Custom Calibration Dataset (Optional)

Step 5: Test

Step 6: Update Documentation

Reference: Existing VLM Implementations

Key Registration Points

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns

Add Vlm Model

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Add Vlm Model

Adding a New Vision-Language Model to AutoRound

Overview

Prerequisites

Step 1: Add Multimodal Block Handler

1a. Create a block discovery function

1b. Register in the SPECIAL_MULTIMODAL_BLOCK dict

1c. Add to support lists

Step 2: Add Calibration Template

2a. Create template JSON

2b. Register the template

2c. Add a custom processor (if needed)

Step 3: Handle Special Forward Pass (If Needed)

Step 4: Add Custom Calibration Dataset (Optional)

Step 5: Test

Step 6: Update Documentation

Reference: Existing VLM Implementations

Key Registration Points

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns

1b. Register in the `SPECIAL_MULTIMODAL_BLOCK` dict