Skill ファイル

MarkItDown

Name: MarkItDown
Author: RayChang

Convert files and URLs to Markdown using Microsoft's markitdown library via uvx (zero-install). This skill should be used when the user provides a file (PDF, DOCX, PPTX, XLSX, HTML, CSV, JSON, XML, EPUB, Jupyter notebook, audio, ZIP) or a URL (including YouTube, Wikipedia, RSS feeds) and wants its content converted to Markdown for context ingestion. Also triggers when the user explicitly asks to convert a file to Markdown, read a non-text document, extract content from a URL, batch convert documents, or do document analysis.

RayChang0 スター2026/04/17

カテゴリ: ドキュメント

スキル内容

Convert files and URLs to Markdown using Microsoft's markitdown via uvx (zero-install). Preserves document structure (headings, lists, tables, links) for optimal LLM context ingestion.

Setup — One-time auto-invoke registration

Trigger this setup on first install when the user runs /markitdown setup or says "set up markitdown" / "configure markitdown". The goal: register a preference in the user's global Claude config so Claude auto-prefers this skill whenever a file or URL needs to be read.

Steps

Determine the target config file. Default to ~/.claude/CLAUDE.md (global user config). If the user explicitly asks for project-level (--project or "project-level"), target ./CLAUDE.md in the current working directory.
Check for existing registration. Read the target file. If it already contains a ## File & URL Reading heading, stop and tell the user: "markitdown is already registered in <path>. No changes made."

関連 Skill

MarkItDown | Skills Pool


## File & URL Reading
- When the user provides a file path or URL to read, invoke the `markitdown` skill (via Skill tool) first
- Supported by markitdown: PDF, DOCX, PPTX, XLSX/XLS, HTML, EPUB, CSV, JSON, XML, ZIP, audio (WAV/MP3), YouTube URLs, general web URLs
- Use Read tool directly instead for:
  - Plain text: `.txt`, `.md`
  - Source code: `.ts`, `.js`, `.py`, `.go`, etc.
  - Images: `.jpg`, `.png`, `.gif`, `.webp`, etc. — Claude reads natively (multimodal); markitdown does support OCR but Read is preferred

File Type	Use Case	Command
PDF	Reports, papers	`markitdown report.pdf`
DOCX	Word documents	`markitdown document.docx`
PPTX	Presentations	`markitdown slides.pptx`
XLSX/XLS	Spreadsheets, data tables	`markitdown data.xlsx`
HTML	Web pages	`markitdown page.html`
URL	Live web content	`markitdown "https://example.com"`
YouTube	Video transcripts	`markitdown "https://youtube.com/watch?v=..."`
Wikipedia	Wiki articles	`markitdown "https://en.wikipedia.org/wiki/..."`
RSS/Atom	Feed content	`markitdown "https://example.com/feed.xml"`
.ipynb	Jupyter notebooks	`markitdown notebook.ipynb`
CSV/JSON/XML	Structured data	`markitdown data.csv`
ZIP	Archive contents (iterates)	`markitdown archive.zip`
Audio	EXIF metadata	`markitdown recording.wav`
EPUB	E-books	`markitdown book.epub`
MSG	Outlook emails	`markitdown email.msg`

uvx --from 'markitdown[all]' markitdown "<source>"

uvx --from 'markitdown[all]' markitdown "<source>"

for f in /path/to/docs/*.pdf; do
  uvx --from 'markitdown[all]' markitdown "$f" -o "${f%.pdf}.md"
done

find /path/to/docs -type f \( -name "*.pdf" -o -name "*.docx" -o -name "*.pptx" \) | \
  xargs -P 4 -I {} sh -c 'uvx --from "markitdown[all]" markitdown "{}" -o "$(echo {} | sed "s/\.[^.]*$/.md/")"'

Error	Resolution
`uvx` not found	Inform the user to install uv: `curl -LsSf https://astral.sh/uv/install.sh \| sh`
Conversion fails on a URL	Verify the URL is accessible; try fetching with `curl` first
Empty output	The file may be image-only; inform the user that text extraction was not possible
Stdin input	Pipe content with extension hint: `cat file \| uvx --from 'markitdown[all]' markitdown -x .html`
Import/dependency error	Ensure Python >= 3.10 is available; uvx handles the rest
Partial format support	Try selective extras: `uvx --from 'markitdown[pdf,docx]' markitdown file`

pip install markitdown-mcp

docker run --rm -i ghcr.io/microsoft/markitdown:latest < document.pdf > output.md

uvx --from 'markitdown[pdf]' markitdown report.pdf
uvx --from 'markitdown[docx,pptx]' markitdown presentation.pptx

# Convert a PDF
uvx --from 'markitdown[all]' markitdown report.pdf

# Convert a URL
uvx --from 'markitdown[all]' markitdown "https://example.com/article"

# Convert and save to file
uvx --from 'markitdown[all]' markitdown presentation.pptx -o /tmp/slides.md

# YouTube transcript
uvx --from 'markitdown[all]' markitdown "https://www.youtube.com/watch?v=dQw4w9WgXcQ"

# Wikipedia article
uvx --from 'markitdown[all]' markitdown "https://en.wikipedia.org/wiki/Markdown"

# Jupyter notebook
uvx --from 'markitdown[all]' markitdown analysis.ipynb

# Pipe from stdin
cat page.html | uvx --from 'markitdown[all]' markitdown -x .html

# Batch convert all PDFs in a directory
for f in *.pdf; do uvx --from 'markitdown[all]' markitdown "$f" -o "${f%.pdf}.md"; done

MarkItDown

Setup — One-time auto-invoke registration

Steps

MarkItDown

Setup — One-time auto-invoke registration

Steps

When to skip setup

When to Use

Quick Reference

Conversion Command

Workflow

Step 1: Convert

Step 2: Handle output size

Step 3: Context integration

Batch Conversion

Error Handling

Advanced Usage

MCP Server

Docker

Selective Extras

Examples

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing