Use this skill whenever the user wants to create, read, edit, or manipulate Word documents (.docx files). Triggers include: any mention of 'Word doc', 'word document', '.docx', or requests to produce professional documents with formatting like tables of contents, headings, page numbers, or letterheads. Also use when extracting or reorganizing content from .docx files, inserting or replacing images in documents, performing find-and-replace in Word files, working with tracked changes or comments, or converting content into a polished Word document. If the user asks for a 'report', 'memo', 'letter', 'template', or similar deliverable as a Word or .docx file, use this skill. Do NOT use for PDFs, spreadsheets, Google Docs, or general coding tasks unrelated to document generation.
A .docx file is a ZIP archive containing XML files.
| Task | Approach |
|---|---|
| Read/analyze content | pandoc or unpack for raw XML |
| Create new document | Use docx-js - see Creating New Documents below |
| Edit existing document | Unpack → edit XML → repack - see Editing Existing Documents below |
| Convert to PDF | Layered fallback: 1) Native tools (macOS: textutil+cupsfilter) 2) pandoc+wkhtmltopdf 3) LibreOffice - see Dependencies below |
Legacy .doc files must be converted before editing:
python scripts/office/soffice.py --headless --convert-to docx document.doc
# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md
# Raw XML access
python scripts/office/unpack.py document.docx unpacked/
Use layered fallback strategy (see Dependencies section):
# Priority 1: macOS native tools (fastest)
textutil -convert txt document.docx -output temp.txt
cupsfilter temp.txt > output.pdf
# Priority 2: pandoc + wkhtmltopdf (lightweight)
pandoc document.docx -o output.pdf
# Priority 3: LibreOffice (fallback)
python scripts/office/soffice.py --headless --convert-to pdf document.docx
# First convert to PDF (use strategy above), then:
pdftoppm -jpeg -r 150 document.pdf page
To produce a clean document with all tracked changes accepted (requires LibreOffice):
python scripts/accept_changes.py input.docx output.docx
npm install -g docx (for creating new documents)Priority 1: Platform-Native Tools (Fastest, No Installation)
macOS:
# Check availability
which textutil && which cupsfilter
# Convert docx → txt → pdf
textutil -convert txt document.docx -output temp.txt
cupsfilter temp.txt > output.pdf
Windows: Check for Microsoft Print to PDF (system built-in)
Linux: Check for system tools
Priority 2: Lightweight Cross-Platform (Recommended)
pandoc + wkhtmltopdf (~100MB total):
# Check availability
which pandoc && which wkhtmltopdf
# Convert
pandoc document.docx -o output.pdf
Installation:
Priority 3: LibreOffice (Fallback, ~300-500MB)
Only download if native tools and pandoc are unavailable:
macOS:
# Check if user has LibreOffice installed
if [ -d "/Applications/LibreOffice.app" ]; then
/Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to pdf document.docx
else
echo "LibreOffice not found. Download from: https://www.libreoffice.org/download/download/"
fi
Windows:
# Check installation
where soffice
# If not found, download portable version:
# https://www.libreoffice.org/download/portable-versions/
Linux:
# Check installation
which soffice
# Install if needed
sudo apt-get install libreoffice # Debian/Ubuntu
sudo yum install libreoffice # RHEL/CentOS
Priority 1: User's Microsoft Word (If Available)
Priority 2: LibreOffice (Required)
This is the most reliable open-source solution for .doc conversion. Follow LibreOffice installation above.
Note: .doc format is legacy (pre-2003). Most modern files are .docx.
Requires LibreOffice. Follow installation above.
Before attempting conversion, check available tools:
# Check platform
OS=$(uname -s)
# Check for native tools (macOS)
if [ "$OS" = "Darwin" ]; then
if command -v textutil &> /dev/null && command -v cupsfilter &> /dev/null; then
echo "✓ macOS native tools available"
exit 0
fi
fi
# Check for pandoc + wkhtmltopdf
if command -v pandoc &> /dev/null && command -v wkhtmltopdf &> /dev/null; then
echo "✓ pandoc + wkhtmltopdf available"
exit 0
fi
# Check for LibreOffice
if command -v soffice &> /dev/null; then
echo "✓ LibreOffice available"
exit 0
fi
echo "✗ No PDF conversion tools found"
echo "Recommended: Install pandoc + wkhtmltopdf (~100MB)"
exit 1
Generate .docx files with JavaScript, then validate. Install: npm install -g docx
const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
TabStopType, TabStopPosition, Column, SectionType,
TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
VerticalAlign, PageNumber, PageBreak } = require('docx');
const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));
After creating the file, validate it. If validation fails, unpack, fix the XML, and repack.
python scripts/office/validate.py doc.docx
// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results