Read and generate PDF files. Read: extract text, parse tables, analyze structure from existing PDFs. Generate: create PDFs from Markdown, HTML, or structured content with professional design. Also fill PDF forms. Use this skill whenever the task involves reading from or creating PDF files. Keywords: PDF, read PDF, extract text, parse table, generate PDF, export PDF, Markdown to PDF, convert to PDF, report, briefing, invoice, form.
Read, generate, and process PDF files. Covers: extracting text and tables from existing PDFs, converting Markdown to styled PDF, creating professional documents (reports, proposals, invoices), and filling PDF forms.
| Task | Route | Method |
|---|---|---|
| Create new PDF from scratch | CREATE | Dual-engine: HTML cover + ReportLab body |
| Convert Markdown to PDF | MD-TO-PDF | Markdown → styled HTML → Chrome printToPDF |
| Read/extract text from PDF | READ | pdfplumber (tables), pypdf (text) |
| Fill PDF form fields | FILL | pypdf form filling |
| Reformat existing content as PDF | REFORMAT | Extract → restructure → CREATE |
PDFs with covers benefit from two rendering engines:
| Part | Engine | Why |
|---|---|---|
| Cover page | HTML+CSS → Chrome/Playwright → PDF | CSS natively supports gradients, grids, blend modes, custom fonts — hard with PDF drawing APIs |
| Body | ReportLab | Stable paragraph flow, pagination, headers/footers, page numbers |
| Merge | pypdf | Combine cover.pdf + body.pdf into final.pdf |
If Chrome/Playwright is unavailable, fall back to ReportLab for covers (simpler design, but still functional).
Each type has a distinct visual identity (cover pattern, fonts, accent color):
| Type | Cover Style | Best For |
|---|---|---|
| Report | Bold title, date banner | Business reports, quarterly reviews |
| Proposal | Logo-centric, value proposition | Sales proposals, partnerships |
| Whitepaper | Academic, abstract box | Technical deep-dives |
| Manual | Section tabs, version badge | User guides, SOPs |
| Invoice | Compact header, line items | Billing, quotes |
| Certificate | Ornamental border, seal area | Awards, credentials |
| Newsletter | Multi-column, headline | Internal comms, updates |
| Resume | Sidebar + main content | CVs, professional profiles |
| Presentation | Full-bleed images, large type | Handouts, summaries |
| Contract | Numbered clauses, signature blocks | Legal, agreements |
| Research | Abstract + keywords + citation | Academic papers |
| Catalog | Grid layout, product cards | Product listings |
| Brochure | Tri-fold layout, visual-heavy | Marketing materials |
| Menu | Category sections, pricing | Restaurant, services |
| Letterhead | Header + footer branding | Official correspondence |
| Industry | Recommended Accent | Hex |
|---|---|---|
| Finance/Legal | Navy blue | #1B365D |
| Healthcare | Teal | #008080 |
| Technology | Electric blue | #0066FF |
| Education | Forest green | #228B22 |
| Creative | Coral | #FF6B6B |
| Government | Dark red | #8B0000 |
| Consulting | Charcoal + gold | #333333 / #D4AF37 |
Structure document body using these block types:
| Block | Description |
|---|---|
| heading | Section/subsection titles (H1-H4) |
| paragraph | Body text with optional indent |
| bullet_list | Unordered list items |
| numbered_list | Ordered list items |
| table | Row/column data with header row |
| image | Embedded image with caption |
| code_block | Monospace code with syntax highlighting |
| blockquote | Indented quotation with attribution |
| callout | Highlighted box (info, warning, tip) |
| chart | Bar, line, pie rendered as image |
| math | LaTeX formula rendered as image |
| page_break | Force new page |
| horizontal_rule | Section separator |
| footnote | Numbered reference at page bottom |
Convert Markdown files to professionally typeset PDFs using pure Python (ReportLab). No browser needed.
python3 scripts/md2pdf.py --input report.md --output report.pdf --title "My Report" --theme corporate
| Theme | Personality |
|---|---|
corporate | Professional blue, clean lines |
academic | Formal, serif, dense |
modern | Bold gradients, sans-serif |
minimal | White space, understated |
creative | Vibrant accents, playful |
nature | Earth tones, organic |
elegant | Serif, gold accents |
tech | Dark theme, monospace accents |
magazine | Editorial layout, pull quotes |
chinese_formal | GB/T compliant, SimHei/FangSong |
python3 md2pdf.py \
--input report.md \
--output report.pdf \
--title "Quarterly Report" \
--subtitle "Q1 2026" \
--author "Research Team" \
--theme corporate \
--toc \ # Generate table of contents
--watermark "DRAFT" \ # Add watermark
--header-title "Q1 Report" # Running header
pip install reportlab — 5MB, pure Python, zero C dependencies, zero browser. Works everywhere including sandboxed environments.
# Text extraction
from pypdf import PdfReader
reader = PdfReader("file.pdf")
for page in reader.pages:
print(page.extract_text())
# Table extraction
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
for row in table:
print(row)
from pypdf import PdfReader, PdfWriter
reader = PdfReader("form.pdf")
writer = PdfWriter()
writer.append(reader)
# Fill form fields
writer.update_page_form_field_values(
writer.pages[0],
{"field_name": "value", "date_field": "2026-04-01"}
)
with open("filled.pdf", "wb") as f:
writer.write(f)
| Element | Font | Size | Weight |
|---|---|---|---|
| Title | Sans-serif (Helvetica/Arial) | 24-32pt | Bold |
| Subtitle | Sans-serif | 14-18pt | Regular |
| H1 | Sans-serif | 18-20pt | Bold |
| H2 | Sans-serif | 14-16pt | Bold |
| Body | Serif (Times/Georgia) or Sans | 10-12pt | Regular |
| Caption | Sans-serif | 8-9pt | Italic |
| Code | Monospace (Courier) | 9-10pt | Regular |
| Footer | Sans-serif | 8pt | Regular |
Line spacing: 1.2-1.5× font size for body text. Margins: Print-ready = 1 inch (72pt) all sides. Digital = 0.75 inch. Page size: A4 (595×842pt) for international, Letter (612×792pt) for US.
# PDF → PNG pages for visual review
pdftoppm -png -r 150 output.pdf /tmp/preview/page
| Pitfall | Solution |
|---|---|
| Fonts display wrong on recipient's machine | Embed fonts in ReportLab: pdfmetrics.registerFont(TTFont(...)) |
| Tables split mid-row across pages | Use ReportLab KeepTogether or splitFirst/splitLast |
| Images blurry | Use original resolution; don't downscale before embedding |
| Cover CSS renders differently | Test with same Chrome version used for production |
| Chinese/Japanese text missing | Register CJK fonts (SimSun, MS Gothic) with ReportLab |
| Large file size | Compress images before embedding; use JPEG for photos, PNG for diagrams |
| Script | Purpose |
|---|---|
scripts/palette.py | Generate design tokens (colors, fonts) for a document type |
scripts/cover.py | Generate HTML cover page from design tokens |
scripts/render_cover.js | Render HTML cover to PDF via Chrome/Playwright |
scripts/render_body.py | Generate body PDF via ReportLab from content blocks |
scripts/merge.py | Merge cover.pdf + body.pdf into final.pdf |
scripts/fill_inspect.py | Inspect PDF form field names and types |
scripts/fill_write.py | Fill PDF form fields programmatically |
scripts/reformat_parse.py | Extract structure from existing PDF for reformatting |
scripts/make.sh | End-to-end pipeline: palette → cover → body → merge |
scripts/md2pdf.py | Convert Markdown to typeset PDF via ReportLab (10 themes, CJK, TOC, bookmarks, zero browser deps) |
design/design.md — Complete design system reference: 15 document type recipes, accent color palettes, font pairings, cover patterns, content block JSON schemas.
reportlab, pypdfpdfplumberpdftoppm for PNG renderingreportlab[cjk] or manual font registrationThis skill incorporates knowledge from: