Use this skill for creating, editing, and reviewing DOCX files, including generation, formatting, content controls, tracked changes, comments, accessibility checks, redaction, rendering, and diff-based QA workflows.
Use this skill when you need to create or modify .docx files in this container environment and verify them visually.
You do not “know” a DOCX is satisfactory until you’ve rendered it and visually inspected page images.
DOCX text extraction (or reading XML) will miss layout defects: clipping, overlap, missing glyphs, broken tables, spacing drift, and header/footer issues.
Shipping gate: before delivering any DOCX, you must:
render_docx.py to produce page-<N>.png images (optionally also a PDF with --emit_pdf)If rendering fails, fix rendering first (LibreOffice profile/HOME) rather than guessing.
Rendered artifacts (PNGs and optional PDFs) are for internal QA only. Unless the user explicitly asks for intermediates, (e.g., when the task asks for a DOCX, deliver the DOCX — not page images or PDFs).
For generating new documents or major rewrite/repackages, follow the design standards below unless the user explicitly requests otherwise. The user's instructions always take precedence; otherwise, adhere to these standards.
When creating the document design, do not compromise on the content and make factual/technical errors. Do not produce something that looks polished but not actually what the user requested.
It is very important that the document is professional and aesthetically pleasing. As such, you should follow this general workflow to make your final delivered document:
While making and revising the DOCX, please adhere to and check against these quality reminders, to ensure the deliverable is visually high quality:
When the user asks to edit an existing document, preserve the original and make minimal, local changes:
# 1) Render any DOCX to PNGs (visual QA)
python render_docx.py input.docx --output_dir out
# 2) Remove reviewer comments (finalization)
python scripts/comments_strip.py input.docx --out no_comments.docx
# 3) Accept tracked changes (finalization)
python scripts/accept_tracked_changes.py input.docx --mode accept --out accepted.docx
# 4) Accessibility audit (+ optional safe fixes)
python scripts/a11y_audit.py input.docx
python scripts/a11y_audit.py input.docx --out_json a11y_report.json
python scripts/a11y_audit.py input.docx --fix_image_alt from_filename --out a11y_fixed.docx
# 5) Redact sensitive text (layout-preserving by default)
python scripts/redact_docx.py input.docx redacted.docx --emails --phones
This skill is organized for progressive discovery: start here, then jump into task- or OOXML-specific docs.
DOCS SKILL PACKAGE
Root:
Tasks:
OOXML:
Troubleshooting:
Scripts:
Core building blocks (importable helpers):
scripts/docx_ooxml_patch.py — low-level OOXML patch helper (tracked changes, comments, hyperlinks, relationships). Other scripts reuse this.scripts/fields_materialize.py — materialize SEQ/REF field display text for deterministic headless rendering/QA.High-leverage utilities (also importable, but commonly invoked as CLIs):
render_docx.py — canonical DOCX → PNG renderer (optional PDF via --emit_pdf; do not deliver intermediates unless asked).scripts/render_and_diff.py — render + per-page image diff between two DOCXs.scripts/content_controls.py — list / wrap / fill Word content controls (SDTs) for forms/templates.scripts/captions_and_crossrefs.py — insert Caption paragraphs for tables/figures + optional bookmarks around caption numbers.scripts/insert_ref_fields.py — replace [[REF:bookmark]] markers with real REF fields (cross-references).scripts/internal_nav.py — add internal navigation links (static TOC + Top/Bottom + figN/tblN jump links).scripts/style_lint.py — report common formatting/style inconsistencies.scripts/style_normalize.py — conservative cleanup (clear run-level overrides; optional paragraph overrides).scripts/redact_docx.py — layout-preserving redaction/anonymization.scripts/privacy_scrub.py — remove personal metadata + rsid* attributes.scripts/set_protection.py — restrict editing (read-only / comments / forms).scripts/comments_extract.py — extract comments to JSON (text, author/date, resolved flag, anchored snippets).scripts/comments_strip.py — remove all comments (final-delivery mode).Audits / conversions / niche helpers:
scripts/fields_report.py, scripts/heading_audit.py, scripts/section_audit.py, scripts/images_audit.py, scripts/footnotes_report.py, scripts/watermark_audit_remove.pyscripts/xlsx_to_docx_table.py, scripts/docx_table_to_csv.pyscripts/insert_toc.py, scripts/insert_note.py, scripts/apply_template_styles.py, scripts/accept_tracked_changes.py, scripts/make_fixtures.pyv7 additions (stress-test helpers):
scripts/watermark_add.py — add a detectable VML watermark object into an existing header.scripts/comments_add.py — add multiple comments (by paragraph substring match) and wire up comments.xml plumbing if needed.scripts/comments_apply_patch.py — append/replace comment text and mark/clear resolved state (w:done=1).scripts/add_tracked_replacements.py — generate tracked-change replacements (<w:del> + <w:ins>) in-place.scripts/a11y_audit.py — audit a11y issues; can also apply simple fixes via --fix_table_headers / --fix_image_alt.scripts/flatten_ref_fields.py — replace REF/PAGEREF field blocks with their cached visible text for deterministic rendering.
scripts/xlsx_to_docx_table.pyalso marks header rows as repeating headers (w:tblHeader) to improve a11y and multi-page tables.
Examples:
Note:
manifest.txtis machine-readable and is used by download tooling. It must contain only relative file paths (one per line).
This is a quick index so you can jump from a helper script to the right task guide.
style_lint.py, style_normalize.py → tasks/style_lint_normalize.mdapply_template_styles.py → tasks/templates_style_packs.mdsection_audit.py → tasks/sections_layout.mdheading_audit.py → tasks/headings_numbering.mdimages_audit.py, a11y_audit.py → tasks/images_figures.md, tasks/accessibility_a11y.mdcaptions_and_crossrefs.py → tasks/captions_crossrefs.mdxlsx_to_docx_table.py → tasks/tables_spreadsheets.mddocx_table_to_csv.py → tasks/tables_spreadsheets.mdfields_report.py, fields_materialize.py → tasks/fields_update.mdinsert_ref_fields.py, flatten_ref_fields.py → tasks/fields_update.md, tasks/captions_crossrefs.mdinsert_toc.py → tasks/toc_workflow.mdadd_tracked_replacements.py, accept_tracked_changes.py → tasks/clean_tracked_changes.mdcomments_add.py, comments_extract.py, comments_apply_patch.py, comments_strip.py → tasks/comments_manage.mdprivacy_scrub.py → tasks/privacy_scrub_metadata.mdredact_docx.py → tasks/redaction_anonymization.mdwatermark_add.py, watermark_audit_remove.py → tasks/watermarks_background.mdinternal_nav.py → tasks/navigation_internal_links.mdmerge_docx_append.py → tasks/multi_doc_merge.mdcontent_controls.py → tasks/forms_content_controls.mdset_protection.py → tasks/protection_restrict_editing.mdrender_and_diff.py, render_docx.py → tasks/compare_diff.md, tasks/verify_render.mdmake_fixtures.py → tasks/fixtures_edge_cases.mddocx_ooxml_patch.py → used across guides for targeted patchestasks/ — task playbooks (what to do step-by-step)ooxml/ — advanced OOXML patches (tracked changes, comments, hyperlinks, fields)scripts/ — reusable helper scriptsexamples/ — small runnable examplesRule of thumb: every meaningful edit batch must end with a render + PNG review. No exceptions. "80/20" here means: follow the simplest workflow that covers most DOCX tasks reliably.
Golden path (don’t mix-and-match unless debugging):
python-docx (paragraphs, runs, styles, tables, headers/footers).Use the packaged renderer (dedicated LibreOffice profile + writable HOME):
python render_docx.py /mnt/data/input.docx --output_dir /mnt/data/out
# If debugging LibreOffice:
python render_docx.py /mnt/data/input.docx --output_dir /mnt/data/out --verbose
# Optional: also write <input_stem>.pdf to --output_dir (for debugging/archival):
python render_docx.py /mnt/data/input.docx --output_dir /mnt/data/out --emit_pdf
Then inspect the generated page-<N>.png files.
Success criteria (render + visual QA):
Note: LibreOffice sometimes prints scary-looking stderr (e.g., error : Unknown IO error) even when output is correct. Treat the render as successful if the PNGs exist and look right (and if you used --emit_pdf, the PDF exists and is non-empty).
tasks/read_review.mdtasks/create_edit.mdtasks/accessibility_a11y.mdtasks/comments_manage.mdtasks/protection_restrict_editing.mdtasks/privacy_scrub_metadata.mdtasks/multi_doc_merge.mdtasks/style_lint_normalize.mdtasks/forms_content_controls.mdtasks/captions_crossrefs.mdtasks/redaction_anonymization.mdtasks/verify_render.mdtasks/fields_update.mdtasks/toc_workflow.mdtasks/navigation_internal_links.mdtasks/headings_numbering.mdtasks/sections_layout.mdtasks/images_figures.mdtasks/tables_spreadsheets.mdooxml/tracked_changes.mdooxml/comments.mdooxml/hyperlinks_and_fields.mdtroubleshooting/libreoffice_headless.mdtasks/clean_tracked_changes.mdtasks/compare_diff.mdtasks/templates_style_packs.mdtasks/watermarks_background.mdtasks/footnotes_endnotes.mdtasks/fixtures_edge_cases.md