Convert local documents such as PDF, DOCX, PPTX, XLSX, XLS, HTML, text files, and Outlook messages into Markdown reference files using MarkItDown. Use when Codex needs to preserve source documents as local Markdown for dataset building, reference capture, provenance review, or later analysis, especially when the output should include source metadata and a conversion timestamp.
Use this skill to turn source documents into project-local Markdown references with a small provenance header. Prefer it when collecting supporting material for dataset work, especially if the document should be reviewed later by humans or LLMs without reopening the original binary file.
This skill is pre-installed in the project's Docker environment. When running the autonomous loop via docker compose up, the document-to-markdown dependencies (MarkItDown, Python 3.12) are already available for use by the agent.
Convert one file into artifacts/markdownified/:
python skills/document-to-markdown/scripts/convert_document.py \
path/to/source.pdf \
--output-dir artifacts/markdownified
Include a source URL when the file came from the web:
python skills/document-to-markdown/scripts/convert_document.py \
downloads/report.docx \
--output-dir artifacts/markdownified \
--source-url https://example.org/report.docx
downloads/ or another non-source location.scripts/convert_document.py on the local file.artifacts/markdownified/ unless the user wants a different location.--source-url whenever the local document was downloaded from a public source.The converter prepends YAML frontmatter with:
source_filesource_url when providedconverted_at_utcconverterKeep that header intact so later dataset work can recover provenance and conversion timing.
Use scripts/convert_document.py for deterministic conversion. It:
--output or a derived file path under --output-dir--force is setIf the document type is unsupported or conversion quality is poor, keep the original file and note the limitation rather than silently rewriting the Markdown by hand.
If conversion problems recur and a better structured-document pipeline is needed, consider evaluating Docling as a heavier fallback. It is not the default here because the dependency stack is much larger than the current MarkItDown-based path.
Edit PDFs with natural-language instructions using the nano-pdf CLI.