Ptq Workflow Integration | Skills Pool
Ptq Workflow Integration Use when integrating a new PTQ workflow into cache-dit; designing quantize/load API shape, backend-specific config validation, save/load manifests, benchmark and regression tests, or reviewing a PTQ integration plan. Uses the SVDQ PTQ integration only as a style and coverage reference. Do not copy the SVDQ implementation mechanically.
vipshop 1,145 Sterne 08.04.2026 Beruf Kategorien Data Engineering PTQ Workflow Integration for cache-dit
Goal
Integrate one PTQ workflow in a way that feels native to cache-dit:
public API stays simple
backend-specific logic stays localized
save/load UX is predictable
fast tests and slow tests cover different risks
docs show only the public workflow
This skill is based on lessons from the SVDQ PTQ integration, but SVDQ PTQ is a reference only.
Core Rule
Do not mechanically copy SVDQ PTQ files.
Use the SVDQ PTQ integration to learn:
what belongs in public API vs private backend code
where config validation should live
where backend serialization and load logic should live
how tests should be split by cost and purpose
what documentation shape is acceptable for cache-dit
Do not reuse SVDQ PTQ by blind copy-paste.
Specifically do not copy without redesigning first:
Schnellinstallation
Ptq Workflow Integration npx skills add vipshop/cache-dit
Sterne 1,145
Aktualisiert 08.04.2026
Beruf
private helper structure
class names or helper names
logging layout
exact file decomposition
prompts, thresholds, or benchmark constants
test bodies that only happen to fit SVDQ
Treat SVDQ as a style reference, not a template to replay.
When to Use Use this skill when you need to:
add a new PTQ backend or algorithm into cache-dit
extend an existing PTQ backend with save/load support
decide where PTQ files and tests should live
review whether a PTQ integration follows cache-dit API style
plan coverage for a PTQ feature before coding
Do not use this skill for:
generic quantization work that does not involve PTQ workflow design
blind upstream porting
one-off benchmark scripts with no repository integration
Reference Style Rule Use repo-relative references only.
For cache-dit files, use paths like src/cache_dit/quantization/config.py.
For docs, use paths like docs/user_guide/QUANTIZATION.md.
For tests, use paths like tests/quantization/test_svdquant_ptq.py.
Do not write machine-local absolute paths into the skill.
Design Principles to Keep
1. Public API symmetry matters Prefer a user-facing flow like:
cache_dit.quantize(...)
cache_dit.load(...)
QuantizeConfig(...)
If save/load is part of the workflow, keep quantize and load at the same API layer unless there is a very strong reason not to.
2. Keep backend-specific knobs grouped and validated Prefer validated backend-specific kwargs or a clearly scoped backend config section over many new top-level config fields.
The pattern to follow is:
generic config contract in src/cache_dit/quantization/config.py
backend-specific validation still triggered from that shared config layer
backend math and orchestration remain under the backend package
3. Hide internal PTQ machinery Private PTQ context objects, calibrators, observers, or loaders should stay internal unless users truly need them.
Tests and docs should generally use public APIs only.
4. Save/load should be ergonomic and deterministic If the PTQ workflow serializes checkpoints:
normalize output to a deterministic file name
keep a machine-readable manifest next to the checkpoint when directory loading is supported
resolve config, file path, and directory path through one internal load path resolver
validate metadata before mutating the target module
5. Slow validation must be opt-in Fast regression coverage should run by default when feasible.
Pipeline-scale validation, compile validation, and large-model comparisons should be environment-gated.
File Placement Guidelines Only add new files when they correspond to a real boundary.
Usually edit existing shared files for
generic config schema: src/cache_dit/quantization/config.py
public quantize/load routing: src/cache_dit/quantization/dispatch.py
package exports if public API changes: src/cache_dit/__init__.py
optional quantization package exports: src/cache_dit/quantization/__init__.py
user docs: docs/user_guide/QUANTIZATION.md
Usually add backend-local files for
PTQ orchestration and serialization: src/cache_dit/quantization/<backend>/ptq.py
backend math or accumulation logic: src/cache_dit/quantization/<backend>/quantizer.py
backend module wrappers or load helpers: src/cache_dit/quantization/<backend>/...
Usually add tests in
backend public-workflow tests: tests/kernels/test_<backend>_ptq.py
backend math or lower-level tests: tests/kernels/test_<backend>_quantizer.py
Add a separate shared test utility file only if multiple test files genuinely reuse the same helpers.
What the SVDQ PTQ Integration Teaches About API Design Use these as design lessons, not copy targets.
Shared config layer src/cache_dit/quantization/config.py is the right place for:
quant type parsing and normalization
backend auto-resolution
validation of PTQ-specific unsupported combinations
normalization of serialize_to
validation of backend-specific kwargs
Do not push these checks into only the backend implementation file if the public config object can reject them earlier.
Backend PTQ implementation layer src/cache_dit/quantization/svdquant/ptq.py demonstrates the right kind of responsibilities for a backend PTQ file:
run calibration through the public config callback
quantize target modules
serialize checkpoint artifacts
write a lightweight JSON manifest for directory load UX
resolve load inputs and validate metadata
attach runtime metadata back onto the loaded module
This is the right layer for backend save/load orchestration.
Public docs and tests layer docs/user_guide/QUANTIZATION.md and tests/quantization/test_svdquant_ptq.py demonstrate the preferred user story:
user interacts with QuantizeConfig
user calls cache_dit.quantize
user calls cache_dit.load
user does not need private PTQ classes
Keep new PTQ integrations aligned with that style unless the backend genuinely requires a different experience.
Serialization and Load Checklist If the PTQ workflow needs saved checkpoints, check all of the following:
The serialized checkpoint file name is deterministic.
A colocated manifest exists when directory loading is supported.
The load path accepts ergonomic inputs only when they can be resolved unambiguously.
Metadata validation happens before module mutation.
Quant type mismatches fail clearly.
Missing manifest or malformed metadata fail clearly.
Round-trip tests prove loaded output matches quantized output.
Test Strategy Do not ship a PTQ integration with only one slow end-to-end test.
Fast tests should cover
public API quantization replaces the expected layers
save/load roundtrip restores the quantized module
config validation rejects unsupported combinations
directory load, checkpoint load, and config-driven load if all are supported
invalid metadata and incomplete checkpoint failure cases
exclusion or filtering behavior
backend-specific calibration or buffering behavior when applicable
Slow tests should cover
at least one real pipeline or model integration path
serialization plus reload closure
one primary quality gate
additional metrics reported separately from the hard gate
latency, transformer memory, or peak memory when those are part of the PTQ value proposition
Optional compile tests should cover
loading an already quantized module
enabling compile configs if the repo expects them
torch.compile(...)
one warmup run
one actual inference run
Compile validation should be behind a separate environment variable, not mixed into the default slow test path.
Test Style Rules to Preserve Follow these rules unless there is a strong reason to violate them:
integration tests should use public APIs, not private PTQ classes
slow tests should self-skip with clear environment-variable guidance
deterministic generators or seeds should be used for pipeline tests
large test artifacts should go under repo-local .tmp/tests/...
benchmark tables and visuals are reports, not pass/fail criteria unless explicitly required
hard accuracy gates should stay minimal and explainable
Suggested Coverage Map When integrating a new PTQ workflow, think in layers:
config/schema layer
backend quantizer/math layer
serialization/load layer
public API layer
model or pipeline integration layer
optional compile layer
If one of these layers is intentionally out of scope, say so explicitly in the PR or plan.
Recommended Implementation Order
Survey existing public quantization API and decide whether the new PTQ flow fits it.
Add or update shared config validation in src/cache_dit/quantization/config.py.
Implement backend PTQ orchestration in src/cache_dit/quantization/<backend>/ptq.py.
Add backend helper logic only where a real separation exists.
Wire dispatch and exports only after backend behavior is stable.
Add fast tests for public API, roundtrip, validation, and failure cases.
Add env-gated slow tests for real model integration.
Add optional compile tests only if compile compatibility matters.
Update docs/user_guide/QUANTIZATION.md with public API examples only.
Review Questions Before merging a PTQ integration, ask:
Does the user-facing flow still look like cache-dit?
Are backend-specific options validated centrally?
Are save/load artifacts deterministic and discoverable?
Can the feature be tested quickly without a giant model?
Are slow tests opt-in and clearly scoped?
Do docs and tests avoid private PTQ symbols?
Does this integration borrow SVDQ lessons without copying SVDQ internals?
Common Mistakes
exposing private PTQ context classes in docs or integration tests
adding too many top-level config fields instead of validated backend-specific kwargs
implementing save/load UX only through one exact file path
skipping malformed metadata tests
writing only slow model tests and no fast public-API tests
making compile validation part of the default slow path
hardcoding local machine paths into docs or instructions
copying SVDQ helper structure line-for-line
Reference Touchpoints These are the main SVDQ PTQ reference files for style and coverage shape:
src/cache_dit/quantization/config.py
src/cache_dit/quantization/svdquant/ptq.py
tests/quantization/test_svdquant_ptq.py
tests/quantization/test_svdquant_quantizer.py
docs/user_guide/QUANTIZATION.md
Use them to understand cache-dit conventions.
Do not reproduce them mechanically.
02
Goal