Handle docx files using shell-based XML extraction and python-docx via run_shell when standard tools fail
Use this skill when:
read_file cannot extract content from .docx filesexecute_code_sandbox encounters failures when creating or modifying docx filesUse run_shell to unzip the .docx file (which is a ZIP archive) and extract the main document XML:
unzip -p input.docx word/document.xml > document.xml
Extract text content by parsing the XML. Use run_shell to execute Python code:
python3 << 'EOF'
import xml.etree.ElementTree as ET
tree = ET.parse('document.xml')
root = tree.getroot()
namespace = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
text_content = []
for elem in root.iter():
if elem.tag.endswith('t'):
if elem.text:
text_content.append(elem.text)
text = ''.join(text_content)
print(text)
EOF
For better text extraction that handles paragraphs:
python3 << 'EOF'
import xml.etree.ElementTree as ET
tree = ET.parse('document.xml')
root = tree.getroot()
ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
paragraphs = []
for p in root.findall('.//w:p', ns):
para_text = []
for t in p.findall('.//w:t', ns):
if t.text:
para_text.append(t.text)
if para_text:
paragraphs.append(''.join(para_text))
for para in paragraphs:
print(para)
EOF
When execute_code_sandbox fails for docx operations, use run_shell instead:
python3 << 'EOF'
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH
doc = Document()
# Add heading
doc.add_heading('Document Title', 0)
# Add paragraph
doc.add_paragraph('Your content here.')
# Add section with heading
doc.add_heading('Section Title', level=1)
doc.add_paragraph('Section content with multiple paragraphs.')
# Add table if needed
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'
# Save document
doc.save('output.docx')
print("Document created successfully")
EOF
python3 << 'EOF'
from docx import Document
from docx.shared import Pt
doc = Document()
# Title
title = doc.add_heading('Business Strategy Memo', 0)
title.alignment = 1 # Center
# Executive Summary
doc.add_heading('Executive Summary', level=1)
doc.add_paragraph('Brief overview of key points and recommendations.')
# Market Overview
doc.add_heading('Market Overview', level=1)
doc.add_paragraph('Analysis of current market conditions and trends.')
# Recommendations
doc.add_heading('Recommendations', level=1)
doc.add_paragraph('Actionable recommendations based on analysis.')
doc.save('Strategy_Memo.docx')
EOF
Complete example showing extraction and creation:
# Step 1: Extract content from existing docx
unzip -p existing.docx word/document.xml > doc.xml
# Step 2: Parse and transform content
python3 << 'EOF'
import xml.etree.ElementTree as ET
tree = ET.parse('doc.xml')
root = tree.getroot()
ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}
content = []
for p in root.findall('.//w:p', ns):
para_text = []
for t in p.findall('.//w:t', ns):
if t.text:
para_text.append(t.text)
if para_text:
content.append(''.join(para_text))
# Write extracted content to file for reference
with open('extracted.txt', 'w') as f:
for line in content:
f.write(line + '\n')
EOF
# Step 3: Create new docx with modified content
python3 << 'EOF'
from docx import Document
doc = Document()
doc.add_heading('Updated Document', 0)
with open('extracted.txt', 'r') as f:
for line in f:
if line.strip():
doc.add_paragraph(line.strip())
doc.save('updated.docx')
EOF
run_shell - Not execute_code_sandbox for docx operationsunzip -p - The -p flag outputs to stdout, useful for pipingw: namespace prefixpip install python-docx in the shell if not availableIf python-docx is not installed:
pip install python-docx
If unzip is not available:
# Use Python's zipfile module instead
python3 -c "import zipfile; z=zipfile.ZipFile('file.docx'); print(z.read('word/document.xml').decode('utf-8', errors='ignore'))"
If XML parsing fails:
.endswith('t') instead of full namespace matching for flexibilityerrors='ignore' parameter37:["$","$L3e",null,{"content":"$3f","frontMatter":{"name":"docx-shell-workaround","description":"Handle docx files using shell-based XML extraction and python-docx via run_shell when standard tools fail"}}]