Name: Document to Markdown
Author: managementmaars-art

Document to Markdown

PDF converter powered by MinerU — convert PDF to Word, Markdown, HTML, LaTeX, or plain text. Also handles image-to-text OCR, scanned document recognition, and Office formats (DOCX, PPTX, Excel). Supports 80+ languages. Use this skill when the user wants to convert, extract, read, parse, or summarize any PDF or document. Also applies when the user shares a PDF file or link and asks about its content, needs tables or formulas extracted, wants PDF OCR, or says things like 'turn this into a doc' or 'what does this paper say'.

managementmaars-art0 星标2026年4月16日

职业
分类: 文档

Convert PDF, images, Office docs, and more to clean Markdown using the MinerU Open API CLI. No API key needed for basic use.

Language Rule

Reply to the user in the SAME language they use. This is non-negotiable.

Core Workflow

Extraction is often just the first step. The typical flow is:

Extract — Use mineru-open-api to convert the document to Markdown
Read & Process — Help the user with what they actually need

MinerU outputs raw Markdown — it doesn't interpret or restructure the content. If the user asks to "extract the tables", "summarize the paper", or "find the key findings", you need to read the output and do that work yourself. MinerU handles the OCR and layout; you handle the understanding.

Use -o to save to a file when the user wants persistent output (conversion, batch processing). Skip -o and read stdout directly when the content is consumed immediately (summarization, Q&A).

For example:

"帮我把这个PDF转成markdown" → use -o to save to file, done

Convert PDF, images, Office docs, and more to clean Markdown using the MinerU Open API CLI. No API key needed for basic use.

Language Rule

Reply to the user in the SAME language they use. This is non-negotiable.

Core Workflow

Extraction is often just the first step. The typical flow is:

Extract — Use mineru-open-api to convert the document to Markdown
Read & Process — Help the user with what they actually need

Use -o to save to a file when the user wants persistent output (conversion, batch processing). Skip -o and read stdout directly when the content is consumed immediately (summarization, Q&A).

For example:

"帮我把这个PDF转成markdown" → use -o to save to file, done

Situation	Mode
"What does this PDF say?"	flash-extract
Quick summary or content scan	flash-extract
Need images/tables/formulas preserved	extract
Document > 10 MB or > 20 pages	extract
Batch converting multiple files	extract
Need DOCX/LaTeX/HTML output	extract
Scanned document needs OCR	extract with `--ocr`

Language	Code	Language	Code
Chinese + English	`ch`	Japanese	`japan`
English	`en`	Korean	`korean`
French	`fr`	Chinese Traditional	`chinese_cht`
German	`de`	Spanish	`es`
Russian	`ru`	Arabic	`ar`
Portuguese	`pt`	Hindi	`hi`
Italian	`it`	Vietnamese	`vi`
Thai	`th`	Turkish	`tr`

Document to Markdown

Language Rule

Core Workflow

Document to Markdown

Language Rule

Core Workflow

Two Extraction Modes

flash-extract — Fast, no auth

extract — Precision, auth required

When to Use Which

Language Support

Data Flow

Troubleshooting

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing