Overview

Microsoft Word document processing toolkit for extracting text, tables, headings, and metadata from .docx files. Useful for analyzing scientific manuscripts, grant applications, protocols, and supplementary documents shared in Word format.

Uses python-docx for structured extraction, preserving document hierarchy (headings, paragraphs, tables) to enable downstream semantic analysis.

Usage

# Extract everything from a .docx file
python3 skills/docx/scripts/docx_extract.py --file /path/to/manuscript.docx

# Extract only headings (document structure)
python3 skills/docx/scripts/docx_extract.py --file /path/to/protocol.docx --extract headings

# Extract tables only (supplementary data)
python3 skills/docx/scripts/docx_extract.py --file /path/to/supplementary.docx --extract tables

# Extract metadata (author, date, revision)
python3 skills/docx/scripts/docx_extract.py --file /path/to/grant.docx --extract metadata

Output Format

Overview

Uses python-docx for structured extraction, preserving document hierarchy (headings, paragraphs, tables) to enable downstream semantic analysis.

Usage

# Extract everything from a .docx file
python3 skills/docx/scripts/docx_extract.py --file /path/to/manuscript.docx

# Extract only headings (document structure)
python3 skills/docx/scripts/docx_extract.py --file /path/to/protocol.docx --extract headings

# Extract tables only (supplementary data)
python3 skills/docx/scripts/docx_extract.py --file /path/to/supplementary.docx --extract tables

# Extract metadata (author, date, revision)
python3 skills/docx/scripts/docx_extract.py --file /path/to/grant.docx --extract metadata

Docx

Overview

Usage

Output Format

Docx

Overview

Usage

Output Format

Dependencies

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing