Name: Extract
Author: nicograef

Document Extraction

Unified skill for extracting content from common document formats. Identify the file type, then follow the corresponding reference.

Workflow

Identify the file format (.pdf, .docx, .xlsx / .xls).
Follow the format-specific reference:
- PDF → pdf.md
- Word (.docx) → docx.md
- Excel (.xlsx) → xlsx.md
Extract the requested content (text, tables, metadata, images — whatever the user needs).
Present results in a clean, structured format (Markdown tables, code blocks, or DataFrames as appropriate).

Extension	Format	Reference
`.pdf`	PDF (digital or scanned)	pdf.md
`.docx`	Word (Office Open XML)	docx.md
`.xlsx`	Excel (Office Open XML)	xlsx.md
`.doc`	Legacy Word (binary)	Convert to `.docx` first — see docx.md
`.xls`	Legacy Excel (binary)	Convert to `.xlsx` or use `xlrd` — see xlsx.md