Implements core VideoQuant quantization algorithms (TPQ, SQJL, MAMP) with CPU-optimized kernels
NOTE: Startup and cleanup are handled by worker-base. This skill defines the WORK PROCEDURE.
This worker handles implementation of VideoQuant's core quantization algorithms:
Features using this skill:
None - pure Python/NumPy/PyTorch implementation.
Write failing tests first (RED)
Define tensor shapes and dtypes
Implement TPQ (Temporal-Polar Quantization)
Implement SQJL (Spatial-QJL)
Implement MAMP (Metric-Aware Mixed Precision)
Run tests to verify (GREEN)
Manual verification
{
"salientSummary": "Implemented TPQ with polar transform achieving 99.2% roundtrip accuracy and 2.7x effective compression. SQJL sign-bit quantization verified with zero memory overhead. MAMP layer profiles configured for DiT attention types.",
"whatWasImplemented": "Three-stage quantization pipeline: (1) TPQ with recursive polar compression, (2) SQJL with JL projection and 1-bit sign encoding, (3) MAMP with layer-specific precision allocation. CPU-optimized kernels using Numba JIT.",
"whatWasLeftUndone": "Integration with Diffusers pipeline pending. CUDA kernels not implemented (no GPU).",
"verification": {
"commandsRun": [
{"command": "python -m pytest tests/test_tpq.py -v", "exitCode": 0, "observation": "5 tests passed, roundtrip accuracy 99.2%"},
{"command": "python -m pytest tests/test_sqjl.py -v", "exitCode": 0, "observation": "4 tests passed, distance preservation 0.98"},
{"command": "python -m pytest tests/test_mamp.py -v", "exitCode": 0, "observation": "5 tests passed"},
{"command": "python benchmarks/compression_ratio.py", "exitCode": 0, "observation": "2.7x compression achieved at 3.5-bit average"}
],
"interactiveChecks": []
},
"tests": {
"added": [
{"file": "tests/test_tpq.py", "cases": [
{"name": "test_polar_transform_accuracy", "verifies": "VAL-TPQ-001"},
{"name": "test_recursive_compression", "verifies": "VAL-TPQ-002"},
{"name": "test_bit_allocation", "verifies": "VAL-TPQ-003"},
{"name": "test_temporal_redundancy", "verifies": "VAL-TPQ-004"},
{"name": "test_roundtrip_accuracy", "verifies": "VAL-TPQ-005"}
]},
{"file": "tests/test_sqjl.py", "cases": [
{"name": "test_jl_distance_preservation", "verifies": "VAL-SQJL-001"},
{"name": "test_sign_bit_overhead", "verifies": "VAL-SQJL-002"},
{"name": "test_unbiased_estimator", "verifies": "VAL-SQJL-003"},
{"name": "test_spatial_preservation", "verifies": "VAL-SQJL-004"}
]},
{"file": "tests/test_mamp.py", "cases": [
{"name": "test_layer_precision", "verifies": "VAL-MAMP-001"},
{"name": "test_timestep_allocation", "verifies": "VAL-MAMP-002"}
]}
]
},
"discoveredIssues": [
{
"severity": "low",
"description": "Recursive polar transform O(N log N) - could optimize with iterative approach",
"suggestedFix": "Consider GPU/CUDA for production"
}
]
}