技能档案

Docx Shell Workaround

Name: Docx Shell Workaround
Author: HKUDS

Handle docx files using shell-based XML extraction and python-docx via run_shell when standard tools fail

HKUDS5,421 星标2026年3月24日

职业
分类: 文档

技能内容

When to Use

Use this skill when:

read_file cannot extract content from .docx files
execute_code_sandbox encounters failures when creating or modifying docx files
You need a reliable fallback for docx file manipulation

Reading DOCX Files

Step 1: Extract document.xml using unzip

Use run_shell to unzip the .docx file (which is a ZIP archive) and extract the main document XML:

unzip -p input.docx word/document.xml > document.xml

Step 2: Parse XML with ElementTree via run_shell

Extract text content by parsing the XML. Use run_shell to execute Python code:

相关技能

Docx Shell Workaround | Skills Pool

python3 << 'EOF'
import xml.etree.ElementTree as ET

tree = ET.parse('document.xml')
root = tree.getroot()
namespace = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}

text_content = []
for elem in root.iter():
    if elem.tag.endswith('t'):
        if elem.text:
            text_content.append(elem.text)

text = ''.join(text_content)
print(text)
EOF

python3 << 'EOF'
import xml.etree.ElementTree as ET

tree = ET.parse('document.xml')
root = tree.getroot()
ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}

paragraphs = []
for p in root.findall('.//w:p', ns):
    para_text = []
    for t in p.findall('.//w:t', ns):
        if t.text:
            para_text.append(t.text)
    if para_text:
        paragraphs.append(''.join(para_text))

for para in paragraphs:
    print(para)
EOF

python3 << 'EOF'
from docx import Document
from docx.shared import Inches, Pt
from docx.enum.text import WD_ALIGN_PARAGRAPH

doc = Document()

# Add heading
doc.add_heading('Document Title', 0)

# Add paragraph
doc.add_paragraph('Your content here.')

# Add section with heading
doc.add_heading('Section Title', level=1)
doc.add_paragraph('Section content with multiple paragraphs.')

# Add table if needed
table = doc.add_table(rows=3, cols=3)
table.style = 'Table Grid'

# Save document
doc.save('output.docx')
print("Document created successfully")
EOF

python3 << 'EOF'
from docx import Document
from docx.shared import Pt

doc = Document()

# Title
title = doc.add_heading('Business Strategy Memo', 0)
title.alignment = 1  # Center

# Executive Summary
doc.add_heading('Executive Summary', level=1)
doc.add_paragraph('Brief overview of key points and recommendations.')

# Market Overview
doc.add_heading('Market Overview', level=1)
doc.add_paragraph('Analysis of current market conditions and trends.')

# Recommendations
doc.add_heading('Recommendations', level=1)
doc.add_paragraph('Actionable recommendations based on analysis.')

doc.save('Strategy_Memo.docx')
EOF

# Step 1: Extract content from existing docx
unzip -p existing.docx word/document.xml > doc.xml

# Step 2: Parse and transform content
python3 << 'EOF'
import xml.etree.ElementTree as ET

tree = ET.parse('doc.xml')
root = tree.getroot()
ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'}

content = []
for p in root.findall('.//w:p', ns):
    para_text = []
    for t in p.findall('.//w:t', ns):
        if t.text:
            para_text.append(t.text)
    if para_text:
        content.append(''.join(para_text))

# Write extracted content to file for reference
with open('extracted.txt', 'w') as f:
    for line in content:
        f.write(line + '\n')
EOF

# Step 3: Create new docx with modified content
python3 << 'EOF'
from docx import Document

doc = Document()
doc.add_heading('Updated Document', 0)

with open('extracted.txt', 'r') as f:
    for line in f:
        if line.strip():
            doc.add_paragraph(line.strip())

doc.save('updated.docx')
EOF

pip install python-docx

# Use Python's zipfile module instead
python3 -c "import zipfile; z=zipfile.ZipFile('file.docx'); print(z.read('word/document.xml').decode('utf-8', errors='ignore'))"

Docx Shell Workaround

When to Use

Reading DOCX Files

Step 1: Extract document.xml using unzip

Step 2: Parse XML with ElementTree via run_shell

Docx Shell Workaround

When to Use

Reading DOCX Files

Step 1: Extract document.xml using unzip

Step 2: Parse XML with ElementTree via run_shell

Step 3: Optional - More robust XML parsing

Creating DOCX Files

Use python-docx via run_shell (NOT execute_code_sandbox)

Example: Creating a structured business document

Full Workflow Example

Key Points

Troubleshooting

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing