Name: Overview
Author: best

搜索技能.../

Overview | Skills Pool

# Text extraction with tracked changes
pandoc --track-changes=all document.docx -o output.md

# Raw XML access
python scripts/office/unpack.py document.docx unpacked/

# Priority 1: macOS native tools (fastest)
textutil -convert txt document.docx -output temp.txt
cupsfilter temp.txt > output.pdf

# Priority 2: pandoc + wkhtmltopdf (lightweight)
pandoc document.docx -o output.pdf

# Priority 3: LibreOffice (fallback)
python scripts/office/soffice.py --headless --convert-to pdf document.docx

# First convert to PDF (use strategy above), then:
pdftoppm -jpeg -r 150 document.pdf page

python scripts/accept_changes.py input.docx output.docx

macOS:

# Check availability
which textutil && which cupsfilter

# Convert docx → txt → pdf
textutil -convert txt document.docx -output temp.txt
cupsfilter temp.txt > output.pdf

Windows: Check for Microsoft Print to PDF (system built-in)
Linux: Check for system tools

pandoc + wkhtmltopdf (~100MB total):
```
# Check availability
which pandoc && which wkhtmltopdf

# Convert
pandoc document.docx -o output.pdf
```
Installation:
- pandoc: https://pandoc.org/installing.html
- wkhtmltopdf: https://wkhtmltopdf.org/downloads.html

macOS:

# Check if user has LibreOffice installed
if [ -d "/Applications/LibreOffice.app" ]; then
  /Applications/LibreOffice.app/Contents/MacOS/soffice --headless --convert-to pdf document.docx
else
  echo "LibreOffice not found. Download from: https://www.libreoffice.org/download/download/"
fi

Windows:

# Check installation
where soffice

# If not found, download portable version:
# https://www.libreoffice.org/download/portable-versions/

Linux:

# Check installation
which soffice

# Install if needed
sudo apt-get install libreoffice  # Debian/Ubuntu
sudo yum install libreoffice      # RHEL/CentOS

# Check platform
OS=$(uname -s)

# Check for native tools (macOS)
if [ "$OS" = "Darwin" ]; then
  if command -v textutil &> /dev/null && command -v cupsfilter &> /dev/null; then
    echo "✓ macOS native tools available"
    exit 0
  fi
fi

# Check for pandoc + wkhtmltopdf
if command -v pandoc &> /dev/null && command -v wkhtmltopdf &> /dev/null; then
  echo "✓ pandoc + wkhtmltopdf available"
  exit 0
fi

# Check for LibreOffice
if command -v soffice &> /dev/null; then
  echo "✓ LibreOffice available"
  exit 0
fi

echo "✗ No PDF conversion tools found"
echo "Recommended: Install pandoc + wkhtmltopdf (~100MB)"
exit 1

const { Document, Packer, Paragraph, TextRun, Table, TableRow, TableCell, ImageRun,
        Header, Footer, AlignmentType, PageOrientation, LevelFormat, ExternalHyperlink,
        InternalHyperlink, Bookmark, FootnoteReferenceRun, PositionalTab,
        PositionalTabAlignment, PositionalTabRelativeTo, PositionalTabLeader,
        TabStopType, TabStopPosition, Column, SectionType,
        TableOfContents, HeadingLevel, BorderStyle, WidthType, ShadingType,
        VerticalAlign, PageNumber, PageBreak } = require('docx');

const doc = new Document({ sections: [{ children: [/* content */] }] });
Packer.toBuffer(doc).then(buffer => fs.writeFileSync("doc.docx", buffer));

python scripts/office/validate.py doc.docx

// CRITICAL: docx-js defaults to A4, not US Letter
// Always set page size explicitly for consistent results

Task	Approach
Read/analyze content	`pandoc` or unpack for raw XML
Create new document	Use `docx-js` - see Creating New Documents below
Edit existing document	Unpack → edit XML → repack - see Editing Existing Documents below
Convert to PDF	Layered fallback: 1) Native tools (macOS: textutil+cupsfilter) 2) pandoc+wkhtmltopdf 3) LibreOffice - see Dependencies below

Task	Approach
Read/analyze content	`pandoc` or unpack for raw XML
Create new document	Use `docx-js` - see Creating New Documents below
Edit existing document	Unpack → edit XML → repack - see Editing Existing Documents below
Convert to PDF	Layered fallback: 1) Native tools (macOS: textutil+cupsfilter) 2) pandoc+wkhtmltopdf 3) LibreOffice - see Dependencies below

Overview

Overview

Quick Reference

Converting .doc to .docx

Overview

Overview

Quick Reference

Converting .doc to .docx

Reading Content

Converting to PDF

Converting to Images

Accepting Tracked Changes

Dependencies & Installation

Core Dependencies (Always Required)

Conversion Dependencies (Layered Fallback Strategy)

For .docx → PDF Conversion

For .doc → .docx Conversion

For Accepting Tracked Changes

Dependency Check Script

Creating New Documents

Setup

Validation

Page Size

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing