Use this skill any time you need to READ, INSPECT, EXTRACT, or ANALYZE the CONTENTS of a PDF file. Triggers: "read this PDF", "what does this PDF say", "summarize this document", "extract the table from this PDF", "analyze this report", "what's in the attached PDF", user uploads a .pdf file and asks about its content, "get the text from this PDF", "OCR this scanned document", "extract form fields". This skill auto-detects document type and applies the correct strategy. Do NOT use for PDF creation, merging, splitting, or watermarking — use the `pdf` skill for those tasks.
Load "references/strategies.md" before proceeding.
Run this before choosing an extraction strategy:
import fitz # pymupdf
doc = fitz.open("/mnt/user-data/uploads/file.pdf")
sample = min(3, len(doc))
total_chars = sum(len(doc[i].get_text()) for i in range(sample))
avg_chars = total_chars / sample if sample else 0
page0 = doc[0]
is_landscape = page0.rect.width > page0.rect.height
if avg_chars < 50:
doc_type = "scanned"
elif is_landscape:
doc_type = "slides"