Archivo del skill

Paddleocr Doc Parsing

Name: Paddleocr Doc Parsing
Author: Arry8

Use this skill to extract structured Markdown/JSON from PDFs and document images—tables with cell-level precision, formulas as LaTeX, figures, seals, charts, headers/footers, multi-column layout and correct reading order. Trigger terms: 文档解析, 版面分析, 版面还原, 表格提取, 公式识别, 多栏排版, 扫描件结构化, 发票, 财报, 复杂 PDF, PDF转Markdown, 图表, 阅读顺序; reading order, formula, LaTeX, layout parsing, structure extraction, PP-StructureV3, PaddleOCR-VL.

Arry81 estrellas3 abr 2026

Ocupación
Categorías: Documentos

Contenido de la habilidad

PaddleOCR Document Parsing Skill

When to Use This Skill

Trigger keywords (routing): Bilingual trigger terms (Chinese and English) are listed in the YAML description above—use that field for discovery and routing.

Use this skill for:

Documents with tables (invoices, financial reports, spreadsheets)
Documents with mathematical formulas (academic papers, scientific documents)
Documents with charts and diagrams
Multi-column layouts (newspapers, magazines, brochures)
Complex document structures requiring layout analysis
Any document requiring structured understanding

Do not use for:

Simple text-only extraction
Quick OCR tasks where speed is critical
Screenshots or simple images with clear text

Installation

Scripts declare their dependencies inline (). No separate install step is needed — resolves dependencies automatically:

Skills relacionados

Paddleocr Doc Parsing | Skills Pool

uv run scripts/layout_caller.py --help

Identify the input source:
- User provides URL: Use the --file-url parameter
- User provides local file path: Use the --file-path parameter
Execute document parsing:
```
uv run scripts/layout_caller.py --file-url "URL provided by user" --pretty
```
Or for local files:
```
uv run scripts/layout_caller.py --file-path "file path" --pretty
```
Optional: explicitly set file type:
```
uv run scripts/layout_caller.py --file-url "URL provided by user" --file-type 0 --pretty
```
- --file-type 0: PDF
- --file-type 1: image
- If omitted, the type is auto-detected from the file extension. For local files, a recognized extension (.pdf, .png, .jpg, .jpeg, .bmp, .tiff, .tif, .webp) is required; otherwise pass --file-type explicitly. For URLs with unrecognized extensions, the service attempts inference.
Performance note: Parsing time scales with document complexity. Single-page images typically complete in 1-5 seconds; large PDFs (50+ pages) may take several minutes. Allow adequate time before assuming a timeout.

Default behavior: save raw JSON to a temp file:
- If --output is omitted, the script saves automatically under the system temp directory
- Default path pattern: <system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json
- If --output is provided, it overrides the default temp-file destination
- If --stdout is provided, JSON is printed to stdout and no file is saved
- In save mode, the script prints the absolute saved path on stderr: Result saved to: /absolute/path/...
- In default/custom save mode, read and parse the saved JSON file before responding
- Use --stdout only when you explicitly want to skip file persistence
Parse JSON response:
- Check the ok field: true means success, false means error
- The output contains complete document data: text, tables, formulas (LaTeX), figures, seals, headers/footers, and reading order
- Use the appropriate field based on what the user needs:
  - text — full document text across all pages
  - result.result.layoutParsingResults[n].markdown.text — page-level markdown
  - result.result.layoutParsingResults[n].prunedResult — structured layout data with positions and confidence
- Handle errors: If ok is false, display error.message
Present results to user:
- Display content based on what the user requested (see "Complete Output Display" below)
- If the content is empty, the document may contain no extractable text
- In save mode, always tell the user the saved file path and that full raw JSON is available there

User: "Extract all the text from this document"
Agent: I've parsed the complete document. Here's all the extracted text:

[Display entire text field or concatenated regions in reading order]

Document Statistics:
- Total regions: 25
- Text blocks: 15
- Tables: 3
- Formulas: 2
Quality: Excellent (confidence: 0.92)

User: "Extract all the text"
Agent: "I found a document with multiple sections. Here's the beginning:
'Introduction...' (content truncated for brevity)"

uv run scripts/layout_caller.py \
  --file-url "https://example.com/paper.pdf" \
  --pretty

uv run scripts/layout_caller.py \
  --file-path "./financial_report.pdf" \
  --pretty

uv run scripts/layout_caller.py \
  --file-url "URL" \
  --stdout \
  --pretty

{
  "ok": false,
  "text": "",
  "result": null,
  "error": {
    "code": "CONFIG_ERROR",
    "message": "PADDLEOCR_DOC_PARSING_API_URL not configured. Get your API at: https://paddleocr.com"
  }
}

uv run scripts/optimize_file.py input.png output.jpg --quality 85
uv run scripts/layout_caller.py --file-path "output.jpg" --pretty

uv run scripts/layout_caller.py --file-url "https://your-server.com/large_file.pdf"

# Extract pages 1-5
uv run scripts/split_pdf.py large.pdf pages_1_5.pdf --pages "1-5"

# Mixed ranges are supported
uv run scripts/split_pdf.py large.pdf selected_pages.pdf --pages "1-5,8,10-12"

# Then process the smaller file
uv run scripts/layout_caller.py --file-path "pages_1_5.pdf"

Large or high-resolution images: Compress with optimize_file.py before parsing — oversized inputs can degrade layout detection:
```
uv run scripts/optimize_file.py input.png optimized.jpg --quality 85
```
Check confidence: result.result.layoutParsingResults[n].prunedResult includes confidence scores per layout element — low values indicate regions worth reviewing

uv run scripts/smoke_test.py
uv run scripts/smoke_test.py --skip-api-test
uv run scripts/smoke_test.py --test-url "https://..."

Paddleocr Doc Parsing

PaddleOCR Document Parsing Skill

When to Use This Skill

Installation

Paddleocr Doc Parsing

PaddleOCR Document Parsing Skill

When to Use This Skill

Installation

How to Use This Skill

Basic Workflow

What to Do After Parsing

Complete Output Display

Understanding the Output

Usage Examples

First-Time Configuration

Handling Large Files

Optimize Large Images Before Parsing

Use URL for Large Local Files (Recommended)

Process Specific Pages (PDF Only)

Error Handling

Tips for Better Results

Reference Documentation

Testing the Skill

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing