Convert DOCX files into markdown while preserving headings, lists, tables, metadata, and extracted images.
Convert DOCX files to well-formatted markdown files with images extracted.
python3 docx_to_markdown.py "<docx_path>" -o "<output_path>"
_files_/ folder with document prefixoutput_dir/
├── document.docx # Original file
├── document.md # Extracted markdown
└── _files_/ # Images folder
├── DocTitle_image1.png
├── DocTitle_figure2.jpg
└── ...
Images are extracted with a document prefix derived from the title:
DieEmptyUnleash (first 3 words, special chars removed)_files_/DieEmptyUnleash_image1.pngThis prevents filename collisions when extracting multiple documents to the same folder.
---