Use when a user asks to read, analyze, summarize, or extract from a heavyweight file such as PDF, DOCX, PPTX, XLSX, CSV, or TSV. Convert the file into markdown or CSV first, generate a lightweight index, and only spend model tokens on the compressed artifact. Trigger on requests like "read this PDF", "look through this spreadsheet", "summarize this deck", or any time raw file ingestion would waste tokens.
Agents waste money and context when they read heavyweight files raw. This skill turns bulky documents into cheaper working artifacts first, then tells the main agent how much reasoning power the file actually deserves.
index.md or index.json first. It should tell you what is in the file, how clean the extraction was, and whether escalation is justified.uv run \
--with pdfplumber \
--with python-docx \
--with python-pptx \
--with openpyxl \
python skills/heavy-file-ingestion/scripts/convert_heavy_file.py /absolute/path/to/file.ext
markitdown installed and want to prefer it for PDF or DOCX conversion, rerun with:python skills/heavy-file-ingestion/scripts/convert_heavy_file.py /absolute/path/to/file.ext --prefer markitdown
index.md first.The skill should leave behind:
index.md with file counts, structure hints, preview lines, and a recommended next stepindex.json with the same information in machine-friendly formreferences/open-source-stack.md when you need to choose a better extractor or explain why one was picked.