This skill provides guidance for implementing custom compression encoders that must be compatible with existing decoders (especially arithmetic coding). It should be used when the task requires writing a compressor/encoder that produces output compatible with a given decompressor/decoder, or when implementing arithmetic coding or similar bit-level compression schemes.
This skill guides the implementation of compression encoders that must produce output compatible with an existing decoder. The key challenge is ensuring encoder state transitions exactly mirror decoder expectations—particularly critical for arithmetic coding where even minor state drift causes decompression failures.
Critical Success Factors
1. Decoder-First Analysis
Before writing any encoder code:
Create a decoder simulator in Python - Reimplement the decoder logic to enable step-by-step tracing
Trace with minimal inputs - Create the smallest possible test cases (single bit, single byte) and manually compute expected values
Map the state machine - Document exactly how fraction, range, low, and other state variables change on each operation
Derive the byte output formula mathematically - Work out the exact relationship between encoder state and output bytes on paper before coding
相关技能
2. Incremental Complexity Approach
Build encoder functionality in strict order of complexity:
Single bit encoding - Verify encoding a single 0 bit, then a single 1 bit
Single integer encoding - Verify encoding one small integer
Multiple values - Encode a sequence and verify each step
Full compression - Only attempt complete file compression after simpler cases work
Never skip to full file compression before validating simpler cases.
3. Dual Simulation Verification
Maintain both encoder and decoder implementations in Python that can be run side-by-side:
For each encoding operation:
1. Run encoder to produce output
2. Run decoder simulator consuming that output
3. Assert encoder state matches decoder state
4. Assert decoded value matches original value
4. State Synchronization
For arithmetic coding specifically:
Renormalization must match exactly - The encoder's decision to output a byte must align with when the decoder expects to read one
Count arrays must stay synchronized - Any probability model updates must happen identically in encoder and decoder
Range calculations must use identical formulas - Integer division behavior must match
Verification Strategy
Unit Tests Before Integration
Create explicit unit tests for each encoding primitive:
def test_encode_bit():
# Test that encode_bit(0) followed by decode produces 0
# Test that encode_bit(1) followed by decode produces 1
def test_encode_integer():
# Test encoding/decoding small integers
# Test boundary cases
def test_state_sync():
# After each operation, encoder.low, encoder.range
# must predict decoder.fraction, decoder.range
Debug Output Protocol
Add comprehensive debug output to both encoder and decoder: