Use when converting PDF documents to Markdown format for documentation or content processing.
required_canon_version: >=3.0.0
Version: 0.1.0
Status: Draft
Use when converting PDF documents to Markdown format, typically for documentation purposes or to make PDF content more accessible and editable.
input.json with the following structure:
{
"pdf_path": "path/to/document.pdf",
"output_path": "path/to/output.md",
"options": {
"extract_images": false,
"preserve_formatting": true,
"page_breaks": "---"
}
}
pdf_path (required, string): Absolute or relative path to input PDF fileoutput_path (required, string): Path where Markdown file will be writtenoptions.extract_images (optional, boolean): Whether to extract embedded images (default: false)options.preserve_formatting (optional, boolean): Attempt to preserve text formatting (default: true)options.page_breaks (optional, string): String to insert between pages (default: "---")output_path containing:
# Document Title
Section header
Paragraph text with **bold** and *italic* formatting.
| Column 1 | Column 2 |
|----------|----------|
| Data 1 | Data 2 |
---
Page 2 content continues...
pdfplumber>=0.9.0 - PDF text and structure extractionfixtures/basic/ - Simple PDF conversion testfixtures/multi-page/ - Multi-page document with page breaksfixtures/tables/ - PDF containing tables for table extractionrequired_canon_version: >=3.0.0