Convert PDF/EPUB library to Markdown and generate Obsidian MOC notes
Converts a local PDF/EPUB book and paper library to Markdown, then generates
lightweight MOC notes in Obsidian. Callable standalone or from /obsidian-setup
as an optional post-setup step.
Step 1 -- Conversion (scripts/convert-to-md.py)
Scans immediate subdirectories of --library-root as categories. Files at the
root itself go to a "General" category. Converts to Markdown and writes to
--output-root. EPUB is preferred over PDF when both exist. Resumable: skips
files already converted.
Step 2 -- MOC generation (scripts/generate-book-moc.py)
For each source file: extracts TOC and embedded metadata (no AI, no tokens).
Writes a structured Obsidian note to <vault-root>/02. AI-Vault/Library/Books/
or .../Papers/. The note stores a library-relative path so Claude can read
the full content on demand. Full book content never goes into Obsidian.
Automatically calls generate-library-moc.py when new notes are created.
(, called automatically) Scans all MOC notes and regenerates with counts, categories, and wiki-link entries. Only runs when at least one new note was created. To force a refresh: run .
scripts/generate-library-moc.py<vault-root>/02. AI-Vault/Library/Library MOC.mdpython scripts/generate-library-moc.py --vault-root <path>| Argument | Description | Example |
|---|---|---|
--library-root | Root of your book/paper directory | /run/media/user/USB/Books |
--vault-root | Obsidian vault root | /home/user/Documents/Obsidian |
--output-root | Where to write converted Markdown | Defaults to <library-root>/../Markdown |
No paths are hardcoded. Every run requires these arguments.
| Dependency | Required For | Install |
|---|---|---|
pymupdf4llm | PDF conversion and TOC extraction | pip install pymupdf4llm |
pandoc | EPUB conversion | Auto-installed (see below) |
tqdm | Progress bar (optional) | pip install tqdm |
watchdog | Watch mode only | pip install watchdog |
The script attempts to install pandoc automatically:
winget install JohnMacFarlane.Pandocbrew install pandocsudo apt-get install -y pandoc and exits -- run the command, then re-run the scriptOn Linux, pandoc installation requires elevated access. Claude will print the command; do not assume sudo is available without a password.
1. Gate Check
Ask: "Do you have a PDF/EPUB book or paper library you want to index?"
If no, skip. If Obsidian vault is not set up, suggest running /obsidian-setup first.
2. Collect Paths Ask the user:
Validate both required paths exist before proceeding. If either does not exist, report the error and stop.
3. Dependency Check
python -c "import pymupdf4llm; print('pymupdf4llm OK')"
pandoc --version
If pymupdf4llm is missing: abort -- PDF conversion is impossible without it.
If pandoc is missing: the script will attempt auto-install. On Linux it will
print an install command and exit; run it, then re-run the pipeline.
4. Run Pipeline
python scripts/convert-to-md.py \
--library-root "<library-root>" \
--output-root "<output-root>"
python scripts/generate-book-moc.py \
--library-root "<library-root>" \
--vault-root "<vault-root>" \
--output-root "<output-root>"
Report convert-to-md.py output as "Conversion:" and generate-book-moc.py
output as "MOC generation:" separately so partial failures are visible.
5. Verify MOC
Check <vault-root>/02. AI-Vault/Library/Library MOC.md was created or updated.
If Obsidian MCP is active, read it to confirm entry count matches expectations.
For direct use without Claude:
# Linux/macOS
bash scripts/run-library-pipeline.sh /path/to/library /path/to/vault
# Windows
scripts\run-library-pipeline.bat "C:\Books" "C:\Vault"
Wrappers prompt interactively if paths are not passed as arguments.
Output-root defaults to a Markdown folder next to library-root.
python scripts/convert-to-md.py --library-root ... --output-root ... --dry-run
python scripts/convert-to-md.py --library-root ... --output-root ... --category Cybersec
python scripts/convert-to-md.py --library-root ... --output-root ... --watch
python scripts/generate-book-moc.py --library-root ... --vault-root ... --file path/to/book.pdf
python scripts/generate-library-moc.py --vault-root /path/to/vault
| Missing | Behavior |
|---|---|
pymupdf4llm | Cannot convert PDFs or extract TOC. Script exits 1 immediately. |
pandoc | Auto-install attempted. If it fails on Linux, script exits 1 with install command. |
tqdm | No progress bar. Continues normally. |
watchdog | Watch mode unavailable. Batch mode still works. |
--output-root not provided | Shell wrappers default to <library-root>/../Markdown. Scripts require it explicitly. |
| Obsidian not running | MOC notes written directly to vault filesystem. MCP tools unavailable but not required. |
| Library root empty | Script reports "No PDF or EPUB files found" and exits cleanly. |
When called from /obsidian-setup, ask the gate question inline:
"Do you have a PDF/EPUB library you want to import into Obsidian?"
If yes, run this skill in full starting from Step 2. If no, skip silently.
--library-root and --vault-root were collected from the user before running any scriptpymupdf4llm is installed; abort if missingLibrary MOC.md exists at <vault-root>/02. AI-Vault/Library/ after a run that created new notes