PaddleOCR Troubleshooting Skill

This skill provides a systematic approach to diagnosing and resolving issues in the ContractDiff engine where text differences fail to align correctly and coordinate highlights unexpectedly cover the entire page when using the paddleocr parser.

Background Context

The ContractDiff comparison engine relies heavily on fine-grained, line-level or short paragraph-level bounding boxes (bboxes). These are critical for:

Accurate Diffing: The two-stage diff algorithm (paragraph alignment -> character diff) requires reasonably sized chunks. Huge chunks cause misalignment.
Accurate Highlighting: The frontend PDF canvas expects coordinates bounding actual text lines.

PaddleOCR-VL-1.5 handles layout processing through the UseLayoutDetection parameter:

true (Classical Layout): Crops the page into fine-grained fragments (text, tables). Returns precise bounding boxes and properly chunks text. Required for the Diff engine to function correctly.

PaddleOCR Troubleshooting Skill

Background Context

The ContractDiff comparison engine relies heavily on fine-grained, line-level or short paragraph-level bounding boxes (bboxes). These are critical for:

Accurate Diffing: The two-stage diff algorithm (paragraph alignment -> character diff) requires reasonably sized chunks. Huge chunks cause misalignment.
Accurate Highlighting: The frontend PDF canvas expects coordinates bounding actual text lines.

PaddleOCR-VL-1.5 handles layout processing through the UseLayoutDetection parameter:

true (Classical Layout): Crops the page into fine-grained fragments (text, tables). Returns precise bounding boxes and properly chunks text. Required for the Diff engine to function correctly.

Paddleocr Troubleshooting

PaddleOCR Troubleshooting Skill

Background Context

Paddleocr Troubleshooting

PaddleOCR Troubleshooting Skill

Background Context

Step-by-Step Troubleshooting

1. Verify Backend Configuration

2. Run Diagnostic Script

3. Apply Resolution

Continuous Learning V2

Continuous Learning V2

Continuous Learning V2

Continuous Learning

Continuous Learning

Pytorch Patterns