스킬 파일

PDF to Markdown Converter

Name: PDF to Markdown Converter
Author: Eduardo-Rezende-MDK

Convert entire PDF documents to clean, structured Markdown for full context loading. Use this skill when the user wants to extract ALL text from a PDF into context (not grep/search), when discussing or analyzing PDF content in full, when the user mentions "load the whole PDF", "bring the PDF into context", "read the entire PDF", or when partial extraction/grepping would miss important context. This is the preferred method for PDF text extraction over page-by-page or grep approaches.

Eduardo-Rezende-MDK0 스타2026. 3. 21.

직업
카테고리: 문서

스킬 내용

Extract complete PDF content as structured Markdown using IBM Docling AI, preserving:

Headers (detected by font size, converted to # tags)
Bold, italic, monospace formatting
Tables (high-accuracy extraction using TableFormer AI model)
Lists (ordered and unordered)
Multi-column layouts (correct reading order)
Code blocks
Images (extracted and copied next to output with relative paths)

When to Use This Skill

USE THIS when:

User wants the "whole PDF" or "entire document" in context
Analyzing, summarizing, or discussing PDF content
User says "load", "read", "bring in", "extract" a PDF
Grepping/searching would miss context or structure
PDF has tables, formatting, or structure to preserve

Environment Setup

This skill uses a dedicated virtual environment at ~/.claude/skills/pdf-to-markdown/.venv/ to avoid polluting the user's working directory.

First-Time Setup (if .venv doesn't exist)

관련 스킬

PDF to Markdown Converter | Skills Pool

cd ~/.claude/skills/pdf-to-markdown && uv venv .venv && uv pip install --python .venv/bin/python pymupdf docling docling-core

~/.claude/skills/pdf-to-markdown/.venv/bin/python -c "import pymupdf; import docling; import docling_core; print('OK')"

# Convert PDF to markdown (always extracts images)
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py document.pdf

# Output: document.md + images/ folder (next to the .md file)

test -d ~/.claude/skills/pdf-to-markdown/.venv || (cd ~/.claude/skills/pdf-to-markdown && uv venv .venv && uv pip install --python .venv/bin/python pymupdf docling docling-core)

~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py "/path/to/document.pdf"

# Output is written to document.md in the same directory as the PDF
cat /path/to/document.md

# Clear cache for a specific PDF
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py document.pdf --clear-cache

# Clear entire cache
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py --clear-all-cache

# Show cache statistics
~/.claude/skills/pdf-to-markdown/.venv/bin/python ~/.claude/skills/pdf-to-markdown/scripts/pdf_to_md.py --cache-stats

~/.cache/pdf-to-markdown/<cache_key>/
├── metadata.json    # source path, mtime, size, total_pages
├── full_output.md   # cached full markdown
└── images/          # extracted images

**[Image: figure_1.png (1200x800, 125.3KB)]**

---

PDF to Markdown Converter

When to Use This Skill

Environment Setup

First-Time Setup (if .venv doesn't exist)

PDF to Markdown Converter

When to Use This Skill

Environment Setup

First-Time Setup (if .venv doesn't exist)

Verify Installation

Quick Start

Standard Workflow

Step 1: Ensure the skill venv exists

Step 2: Convert PDF to Markdown

Step 3: Read the output

Caching

How It Works

Cache Commands

Cache Contents

Image Handling

Auto-View Behavior for Images

Output Format

Header (metadata)

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing