Diagnoses and troubleshoots contractdiff diffing engine and coordinate alignment issues related to PaddleOCR VL layouts.
This skill provides a systematic approach to diagnosing and resolving issues in the ContractDiff engine where text differences fail to align correctly and coordinate highlights unexpectedly cover the entire page when using the paddleocr parser.
The ContractDiff comparison engine relies heavily on fine-grained, line-level or short paragraph-level bounding boxes (bboxes). These are critical for:
PaddleOCR-VL-1.5 handles layout processing through the UseLayoutDetection parameter:
true (Classical Layout): Crops the page into fine-grained fragments (text, tables). Returns precise bounding boxes and properly chunks text. Required for the Diff engine to function correctly.false (Native VL Multimodal): Uses the VL model to natively recognize full-page multi-modal structures (retaining cross-page tables and title hierarchies). However, it outputs the entire page as a single ocr block with a bounding box spanning [0, 0, pageWidth, pageHeight]. This destroys the granularity necessary for the contract comparison engine.Check backend/config.yaml and backend/config/config.go.
Look for use_layout_detection under paddleocr. If it is set to false or commented out (which defaults to false in the Go config layer), this is the root cause of the whole-page coordinate issue.
Use the provided Node.js diagnostic script to hit the local API and verify the parsed output structure.
node .agents/skills/paddleocr_troubleshooting/scripts/test_parser.js <path_to_pdf>
The script will upload the PDF to the local backend at http://localhost:28080, poll for completion, and analyze the resulting paragraphs.
[WARN] Page X only has ONE paragraph covering the entire page!, then layout detection is missing or disabled.To restore proper diffing and accurate coordinate highlights, modify the active backend/config.yaml: