PDF Reading — Content Extraction & Inspection

Archivo del skill

PDF Reading — Content Extraction & Inspection

Use this skill any time you need to READ, INSPECT, EXTRACT, or ANALYZE the CONTENTS of a PDF file. Triggers: "read this PDF", "what does this PDF say", "summarize this document", "extract the table from this PDF", "analyze this report", "what's in the attached PDF", user uploads a .pdf file and asks about its content, "get the text from this PDF", "OCR this scanned document", "extract form fields". This skill auto-detects document type and applies the correct strategy. Do NOT use for PDF creation, merging, splitting, or watermarking — use the `pdf` skill for those tasks.

KILWA730 estrellas12 abr 2026

Ocupación
Categorías: Documentos

Contenido de la habilidad

Load "references/strategies.md" before proceeding.

PDF Reading — Content Extraction & Inspection

Step 0 — Detect Document Type First

Run this before choosing an extraction strategy:

import fitz  # pymupdf

doc = fitz.open("/mnt/user-data/uploads/file.pdf")
sample = min(3, len(doc))
total_chars = sum(len(doc[i].get_text()) for i in range(sample))
avg_chars = total_chars / sample if sample else 0

page0 = doc[0]
is_landscape = page0.rect.width > page0.rect.height

if avg_chars < 50:
    doc_type = "scanned"
elif is_landscape:
    doc_type = "slides"