Convert PDF documents to AI-accessible markdown format using IBM's Docling library. This skill should be used when the user needs to extract content from PDFs including text, figures, and tables in a structured markdown format. It handles scientific papers, technical documents, reports, and any PDF requiring content extraction for AI processing or analysis.
Convert PDF documents to structured markdown using IBM's Docling library. Extract complete document content including text, figures (as PNG files), and tables (as separate markdown files) in an AI-accessible format optimized for further processing and analysis.
Use this skill when the user needs to:
Convert a PDF using the bundled script:
uv run --python 3.10 scripts/convert_pdf.py input.pdf output_folder
This produces:
full_document.md - Complete markdown with cleaned referencesfigures/ - Numbered PNG files (figure_001.png, figure_002.png, etc.)tables/ - Individual markdown tables (table_001.md, table_002.md, etc.)metadata.json - Document statistics and conversion timingThe script uses uv to manage dependencies automatically. No manual installation required. The script's inline metadata specifies all required packages.
Run the conversion script with the following syntax:
uv run --python 3.10 scripts/convert_pdf.py <pdf_file> <output_folder> [options]
Required arguments:
pdf_file - Path to the input PDF fileoutput_folder - Directory where output will be savedOptional arguments:
--image-resolution-scale F - Scale factor for extracted images (default: 2.0)Examples:
# Basic conversion
uv run --python 3.10 scripts/convert_pdf.py paper.pdf output/
The conversion creates a structured output directory:
output_folder/
├── full_document.md # Complete markdown (cleaned references)
├── figures/ # PNG images
│ ├── figure_001.png
│ ├── figure_002.png
│ └── ...
├── tables/ # Markdown tables
│ ├── table_001.md
│ ├── table_002.md
│ └── ...
└── metadata.json # Conversion statistics
Key features:
figures/ directoryThe scripts/convert_pdf.py script is a standalone Python script with inline dependencies that:
Important: The script is designed to be run with uv run which handles environment creation and dependency management automatically. Do not try to run it directly with python3 without first installing dependencies.