Use in Codex when a user asks to read, analyze, summarize, or extract from a heavyweight file such as PDF, DOCX, PPTX, XLSX, CSV, or TSV. Convert the file into markdown or CSV first with the bundled script, generate a lightweight index, and only spend model tokens on the compressed artifact.
Codex can run local commands and inspect files, so direct ingestion of bulky documents is usually the wrong move. Convert first, index second, reason last.
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
uv run \
--with pdfplumber \
--with python-docx \
--with python-pptx \
--with openpyxl \
python scripts/convert_heavy_file.py /absolute/path/to/file.ext
index.md first, not the original file.read_extracted_artifact: inspect the generated markdown or CSVcheap_model_or_stronger_converter: retry with a better deterministic tool or use a cheaper model on the extracted artifact onlymanual_review: tell the user the deterministic route failed and propose the next cheapest fallback.ob1/ folder as the working directory for follow-up analysis.references/open-source-stack.md explains the tool choices and fallback tiers.