Overview

A .docx file is a ZIP archive containing XML files.

Quick Reference

Task	Approach
Read/analyze content	`pandoc` or unpack for raw XML
Create new document	Use `docx-js` - see Creating New Documents below
Edit existing document	Unpack → edit XML → repack - see Editing Existing Documents below

Converting .doc to .docx

Legacy .doc files must be converted before editing:

python scripts/office/soffice.py --headless --convert-to docx document.doc

Reading Content

# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md

# Raw XML access
python scripts/office/unpack.py document.docx unpacked/

python scripts/office/soffice.py --headless --convert-to pdf document.docx
pdftoppm -jpeg -r 150 document.pdf page

python scripts/accept_changes.py input.docx output.docx

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
        PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
        TabStopType, TabStopPosition, Column, SectionType,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

python scripts/office/validate.py doc.docx

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results

DOCX creation, editing, and analysis | Skills Pool

DOCX creation, editing, and analysis

DOCX creation, editing, and analysis

Overview

Quick Reference

Converting .doc to .docx

Reading Content

Converting to Images

Accepting Tracked Changes

Creating New Documents

Setup

Validation

Page Size

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing