Skill-Datei

Pdf To En Word

Name: Pdf To En Word
Author: FeatherHunter

PDF扫描件（含图片）转英文Word文档技能。支持内容识别、横版处理、中文翻译英文、手动构建docx，保持表格结构和格式布局。

FeatherHunter16 Sterne18.04.2026

Beruf: Schreibkräfte
Kategorien: Dokumente

Skill-Inhalt

pdf-to-en-word PDF转英文Word文档技能

技能概述

本技能提供从PDF扫描件/图片到英文Word文档（docx格式）的完整转换流程。核心流程：

PDF/图片 → 内容识别 → 中文翻译英文 → 构建英文Word文档

适用场景

将扫描版PDF/图片转换为可编辑英文Word文档
保持原有表格结构和格式布局
中文翻译为英文（国际注册标准需求）
对转换精度有严格要求（手动控制每个元素）

核心流程

必经阶段：第一阶段（图片内容识别）→ 第三阶段（中文翻译英文）→ 第四阶段（构建Word文档）

可选阶段：第二阶段（HTML原型）用于可视化预览

文档类型分流：

┌──────────────────────────────────────────────────┐
│                 输入图片/PDF                      │
└────────────────────┬─────────────────────────────┘
                     │
              ┌──────┴──────┐
              │ 单页？多页？ │
              └──┬───────┬──┘
                 │       │
          单页   │       │  多页
                 ▼       ▼
       直接执行阶段1→3→4    逐页子任务 + 合并组装

Verwandte Skills

Pdf To En Word | Skills Pool

Skill-Datei

Pdf To En Word

PDF扫描件（含图片）转英文Word文档技能。支持内容识别、横版处理、中文翻译英文、手动构建docx，保持表格结构和格式布局。

FeatherHunter16 Sterne18.04.2026

Beruf: Schreibkräfte
Kategorien: Dokumente

Skill-Inhalt

pdf-to-en-word PDF转英文Word文档技能

技能概述

本技能提供从PDF扫描件/图片到英文Word文档（docx格式）的完整转换流程。核心流程：

PDF/图片 → 内容识别 → 中文翻译英文 → 构建英文Word文档

适用场景

将扫描版PDF/图片转换为可编辑英文Word文档
保持原有表格结构和格式布局
中文翻译为英文（国际注册标准需求）
对转换精度有严格要求（手动控制每个元素）

核心流程

必经阶段：第一阶段（图片内容识别）→ 第三阶段（中文翻译英文）→ 第四阶段（构建Word文档）

可选阶段：第二阶段（HTML原型）用于可视化预览

文档类型分流：

┌──────────────────────────────────────────────────┐
│                 输入图片/PDF                      │
└────────────────────┬─────────────────────────────┘
                     │
              ┌──────┴──────┐
              │ 单页？多页？ │
              └──┬───────┬──┘
                 │       │
          单页   │       │  多页
                 ▼       ▼
       直接执行阶段1→3→4    逐页子任务 + 合并组装

Verwandte Skills

主会话（协调调度，不处理具体内容）
   │
   │  1. 将PDF拆分为单页图片
   │
   │  2. 并行派发所有子任务：
   ├──> 子任务1：Read图片 → 识别+翻译+生成XML → Write到文件 → 返回元数据摘要
   ├──> 子任务2：Read图片 → 识别+翻译+生成XML → Write到文件 → 返回元数据摘要
   ├──> 子任务3：Read图片 → 识别+翻译+生成XML → Write到文件 → 返回元数据摘要
   └──> ...
   （所有子任务并行执行，术语表统一传入）
   
   3. 脚本合并所有XML文件 → 组装为单个document.xml
   4. ZIP打包为最终docx

1. 拆分PDF → 得到图片路径列表 [page_1.png, page_2.png, ...]
   （此步骤不读取图片内容，只获取文件路径）

2. 创建临时目录 temp/ 存放中间文件

3. 并行派发所有子任务（Task工具），每个子任务传入图片路径+术语表（如有）：
   子任务1 Prompt: "图片路径: D:\...\page_1.png, 输出文件: temp/page_1.xml" + 术语表 + 子任务Prompt模板
   子任务2 Prompt: "图片路径: D:\...\page_2.png, 输出文件: temp/page_2.xml" + 术语表 + 子任务Prompt模板
   ...
   （所有子任务并行启动，每个子任务独立Read图片，写入文件，返回元数据摘要）
   ...
   （子任务自行用Read工具读取图片，将XML片段写入文件，只返回元数据摘要）

4. 主代理只接收子任务返回的元数据摘要：
   例如：{"tables": 2, "paragraphs": 5, "file": "temp/page_1.xml"}
   （XML内容在文件中，不进入主代理上下文）

5. 执行轻量验证（文件完整性、异常检测）

6. 调用合并脚本组装document.xml并ZIP打包

子任务必须将完整XML内容写入指定的输出文件（用Write工具）

子任务只返回精简元数据摘要，格式如下：

页面: X
表格数量: X
表格1: X行X列
表格2: X行X列
段落数量: X
输出文件: temp/page_X.xml

禁止在返回中包含XML内容（避免主代理上下文膨胀）

术语表由用户提供，不由AI自动生成。

规则：
1. 用户在启动任务时提供术语表文件（如有），格式为JSON：
   例如：{ "四环素": "Tetracycline", "胶囊": "Capsule", "批号": "Batch No." }

2. 如用户未提供术语表，则无术语表，AI自由翻译，只需符合医药领域专业标准

3. 术语表在所有子任务中统一传入，所有页面遵循相同翻译

4. 子任务不再返回术语信息，不动态积累术语表

子代理验证清单（在返回摘要前执行）：
注意：验证使用内存中已加载的图片和刚生成的XML，禁止重新Read文件

1. 对照图片检查XML内容：
   - 所有文字内容是否已包含（逐段对照）
   - 表格行列数是否与图片一致
   - 数字、日期、批号是否精确复制（非翻译）
2. 元数据自洽：
   - 统计的表格数与实际<w:tbl>标签数一致
   - 统计的段落数与实际<w:p>标签数一致
3. XML格式检查：
   - 所有标签正确闭合
   - 命名空间声明存在
4. 如发现问题，修正后重新写入文件

1. 文件完整性：确认所有temp/page_X.xml文件均已生成
2. 异常检测：如某页段落数=0且表格数=0，重新处理该页
3. 失败重试：如某页子任务失败或文件未生成，重新派发该页子任务

// merge_and_pack.js - 合并XML文件并打包为docx
const fs = require('fs');
const path = require('path');
const archiver = require('archiver');

// 1. 合并所有页面XML片段
const tempDir = './temp';
const pages = fs.readdirSync(tempDir)
  .filter(f => f.match(/^page_\d+\.xml$/))
  .sort((a, b) => {
    const numA = parseInt(a.match(/\d+/)[0]);
    const numB = parseInt(b.match(/\d+/)[0]);
    return numA - numB;
  });

const xmlParts = pages.map(p => fs.readFileSync(path.join(tempDir, p), 'utf8'));
const content = xmlParts.join('\n<w:p><w:r><w:br w:type="page"/></w:r></w:p>\n');

const documentXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
  <w:body>
    ${content}
    <w:sectPr>
      <w:pgSz w:w="11906" w:h="16838"/>
      <w:pgMar w:top="1134" w:right="1134" w:bottom="1134" w:left="1134"/>
    </w:sectPr>
  </w:body>
</w:document>`;

// 2. 创建目录结构
fs.mkdirSync('./word/_rels', { recursive: true });
fs.mkdirSync('./_rels', { recursive: true });
fs.mkdirSync('./docProps', { recursive: true });
fs.writeFileSync('./word/document.xml', documentXml);

// 3. 写入其他必需XML文件（从附录1模板复制）
fs.writeFileSync('./word/styles.xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:styles xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:docDefaults>
        <w:rPrDefault>
            <w:rPr>
                <w:rFonts w:ascii="Calibri" w:eastAsia="SimSun" w:hAnsi="Calibri" w:cs="Times New Roman"/>
                <w:sz w:val="24"/>
                <w:szCs w:val="24"/>
            </w:rPr>
        </w:rPrDefault>
    </w:docDefaults>
    <w:style w:type="paragraph" w:default="1" w:styleId="Normal">
        <w:name w:val="Normal"/>
    </w:style>
    <w:style w:type="table" w:styleId="TableGrid">
        <w:name w:val="Table Grid"/>
        <w:tblPr>
            <w:tblBorders>
                <w:top w:val="single" w:sz="4" w:color="000000"/>
                <w:left w:val="single" w:sz="4" w:color="000000"/>
                <w:bottom w:val="single" w:sz="4" w:color="000000"/>
                <w:right w:val="single" w:sz="4" w:color="000000"/>
                <w:insideH w:val="single" w:sz="4" w:color="000000"/>
                <w:insideV w:val="single" w:sz="4" w:color="000000"/>
            </w:tblBorders>
        </w:tblPr>
    </w:style>
</w:styles>`);

fs.writeFileSync('./word/settings.xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:settings xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:zoom w:percent="100"/>
    <w:defaultTabStop w:val="720"/>
    <w:characterSpacingControl w:val="doNotCompress"/>
    <w:compat>
        <w:compatSetting w:name="compatibilityMode" w:uri="http://schemas.microsoft.com/office/word" w:val="15"/>
    </w:compat>
</w:settings>`);

fs.writeFileSync('./word/fontTable.xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:fonts xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:font w:name="Calibri"><w:panose1 w:val="020F0502020204030204"/><w:charset w:val="00"/></w:font>
    <w:font w:name="SimSun"><w:panose1 w:val="02010600040101010101"/><w:charset w:val="86"/></w:font>
    <w:font w:name="Times New Roman"><w:panose1 w:val="02020603050405020304"/><w:charset w:val="00"/></w:font>
</w:fonts>`);

fs.writeFileSync('./word/_rels/document.xml.rels', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
</Relationships>`);

fs.writeFileSync('./[Content_Types].xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
    <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
    <Default Extension="xml" ContentType="application/xml"/>
    <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
    <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
    <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
    <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
    <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
    <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats.extended-properties+xml"/>
</Types>`);

fs.writeFileSync('./_rels/.rels', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/>
</Relationships>`);

fs.writeFileSync('./docProps/core.xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:dcterms="http://purl.org/dc/terms/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <dc:title>Converted Document</dc:title>
    <dc:creator>PDF-to-Word Converter</dc:creator>
    <cp:revision>1</cp:revision>
    <dcterms:created xsi:type="dcterms:W3CDTF">${new Date().toISOString()}</dcterms:created>
    <dcterms:modified xsi:type="dcterms:W3CDTF">${new Date().toISOString()}</dcterms:modified>
</cp:coreProperties>`);

fs.writeFileSync('./docProps/app.xml', `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">
    <Template>Normal.dotm</Template>
    <TotalTime>0</TotalTime>
    <Application>Microsoft Office Word</Application>
    <DocSecurity>0</DocSecurity>
    <AppVersion>16.0000</AppVersion>
</Properties>`);

// 4. ZIP打包为docx
const output = fs.createWriteStream('./output.docx');
const archive = archiver('zip', { zlib: { level: 9 } });
archive.pipe(output);

// 关系文件不压缩
archive.file('./[Content_Types].xml', { name: '[Content_Types].xml', store: true });
archive.file('./_rels/.rels', { name: '_rels/.rels', store: true });
archive.file('./word/_rels/document.xml.rels', { name: 'word/_rels/document.xml.rels', store: true });

// 其他XML文件压缩
archive.file('./word/document.xml', { name: 'word/document.xml' });
archive.file('./word/styles.xml', { name: 'word/styles.xml' });
archive.file('./word/settings.xml', { name: 'word/settings.xml' });
archive.file('./word/fontTable.xml', { name: 'word/fontTable.xml' });
archive.file('./docProps/core.xml', { name: 'docProps/core.xml' });
archive.file('./docProps/app.xml', { name: 'docProps/app.xml' });

archive.finalize();

node merge_and_pack.js
# 输出 output.docx


**子任务Prompt模板**：


### 第一阶段：图片内容识别

**目标**：准确提取图片中的文字、表格、布局信息

**方法论**：
1. **视觉结构分析**
   - 识别文档类型（表单、报告、日志等）
   - 分析布局层次（标题、正文、表格、页眉页脚）
   - 确定关键信息区域

2. **内容提取原则**
   - 对于印刷体：直接读取文字内容
   - 对于手写体：根据上下文合理推测，不确定处标注[?]
   - 表格：识别行列结构、单元格合并、背景色、边框样式
    - **页眉页脚识别**：
      - 页眉：每页顶部重复出现的公司名、文档标题、Logo等
      - 页脚：每页底部重复出现的页码、日期、版本号等
      - 处理方式：作为普通段落放在页面内容顶部/底部，字号用9pt
   - **非文字内容识别**：
     - 印章/签名：用[SEAL: XXX]、[SIGNATURE]标注位置
     - Logo：用[LOGO]标注位置
     - 流程图/化学式：用[DIAGRAM]标注，描述内容
   - **字号推断**：根据视觉大小判断文字级别
     - 大且加粗/居中 → 标题级别（16-18pt）
     - 中等大小 → 正文级别（11-12pt）
     - 较小且在表格内 → 表格内容级别（10pt）
     - 最小、灰色或底部 → 注释级别（9pt）

3. **交互确认策略**（灵活执行，非强制）
   - 关键信息（姓名、日期、数字）向用户确认
   - 模糊内容提供选项让用户选择处理方式
   - 确认格式偏好（手写风格 vs 标准印刷体）
   - *注：如用户明确要求"自由发挥/合理推测"，则可跳过确认直接处理*

4. **横版页面处理**
   - **检测**：识别图片方向（宽>高为横版）
   - **旋转**：横版图片旋转90°为竖版
   - **原则**：不拆分表格，保持原有结构完整
     - 原横版内容放得下一页，旋转后同样放得下
     - 表格变长但结构不变，Word自动换行
     - 内容完整性优先于美观度

**输出**：结构化内容数据（可存储为JSON或中间格式）

### 第二阶段：HTML原型构建（可选）

**目标**：创建可视化HTML，作为Word文档的"设计稿"（如只需快速生成Word可跳过此阶段）

**方法论**：
1. **页面规格设定**
   - 使用CSS `@page` 规则设置A4尺寸（210mm × 297mm）
   - 设置页边距（通常15-20mm）
   - 背景色模拟打印预览效果

2. **结构还原策略**
   - **文本块**：使用语义化HTML标签（h1-h6, p, span）
    - **表格**：使用`<table>`，设置固定列宽（单位mm），确保行列数与图片一致
   - **表单元素**：使用flex/grid布局还原填写区域
   - **下划线/填写线**：使用border-bottom或text-decoration

3. **样式控制原则**
    - 使用绝对单位（mm, pt）而非相对单位（px, em）
    - 字体选择：中文用宋体，英文用Calibri（最规范的Word文档字体）
    - 字号推断：根据图片中文字的视觉大小自动推断
      - 标题：视觉较大、加粗、居中 → 16-18pt
      - 正文：中等大小、常规字体 → 11-12pt
      - 表格内容：较小、紧凑 → 10pt
      - 注释/脚注：最小 → 9pt

**输出**：单文件HTML，可在浏览器预览，视觉效果与图片一致

### 第三阶段：中文翻译英文

**目标**：将中文内容翻译为英文（国际注册标准需求）

**核心原则**：
1. **关键数据精确复制，禁止翻译**：
   - 数字、日期、批号、有效期、剂量、规格、百分比
   - 化学式、分子式、代码、编号
   - 这些数据必须与原文完全一致，包括格式

2. **专业翻译**：
   - 使用LLM进行上下文感知翻译
   - 术语翻译准确，遵循术语表（如有）
   - 不确定的术语保留原文并括号标注英文

**方法论**：
1. **LLM动态翻译**（推荐方案）
   - 直接使用LLM进行上下文感知翻译
   - 专业术语由LLM根据上下文自动判断翻译
   - 通过术语表保证多页一致性
   - 实现示例见【技术附录：LLM翻译方案】

2. **布局适配**
   - 英文通常比中文长，需调整列宽
   - 使用自适应列宽：`<w:tcW w:w="0" w:type="auto"/>`
   - 减小字体大小（英文10pt通常足够）
   - 保持原有对齐方式

**输出**：翻译后的内容数据

### 第四阶段：手动构建Word文档

**目标**：从零构建符合OOXML规范的docx文件

**核心原理**：


**构建步骤**：

1. **创建目录结构**


2. **编写核心XML文件**
- 见【技术附录：XML文件模板】获取各文件完整内容
- **document.xml**：使用WordprocessingML语言描述内容
- **styles.xml**：定义文档样式（必须）
- **其他文件**：使用标准模板，修改元数据即可

3. **打包为ZIP**
- 见【技术附录：ZIP打包方案】
- 关键要求：关系文件（.rels）不压缩，其他XML文件DEFLATE压缩
- 文件扩展名改为`.docx`
- **关键**：保持UTF-8编码

**OOXML关键元素速查**：

| 元素 | 用途 | 示例 | 父元素 |
|------|------|------|--------|
| `<w:document>` | 根元素 | 包含整个文档 | - |
| `<w:body>` | 文档主体 | 包含段落和表格 | `<w:document>` |
| `<w:p>` | 段落 | 文本块、表格单元格内容 | `<w:body>`, `<w:tc>` |
| `<w:r>` | 文本运行 | 统一格式的文本片段 | `<w:p>` |
| `<w:rPr>` | 文本属性 | 字体、字号、加粗 | `<w:r>` |
| `<w:t>` | 文本内容 | 实际显示的文字 | `<w:r>` |
| `<w:tbl>` | 表格 | 数据表格容器 | `<w:body>` |
| `<w:tr>` | 表格行 | 表格的一行 | `<w:tbl>` |
| `<w:tc>` | 表格单元格 | 行内的单元格 | `<w:tr>` |
| `<w:tcPr>` | 单元格属性 | 宽度、背景色 | `<w:tc>` |
| `<w:jc>` | 对齐方式 | left/center/right | `<w:pPr>` |
| `<w:b>` | 加粗 | 粗体文本 | `<w:rPr>` |
| `<w:u>` | 下划线 | single/double | `<w:rPr>` |
| `<w:sz>` | 字号 | 以半点为单位（24=12pt） | `<w:rPr>` |
| `<w:color>` | 文字颜色 | `<w:color w:val="FF0000"/>` 红色 | `<w:rPr>` |
| `<w:spacing>` | 行间距 | `<w:spacing w:line="360"/>` 1.5倍行距 | `<w:pPr>` |
| `<w:ind>` | 缩进 | `<w:ind w:firstLineChars="200"/>` 首行缩进2字符 | `<w:pPr>` |
| `<w:shd>` | 背景/阴影 | `<w:shd w:fill="CCCCCC"/>` 灰色背景 | `<w:tcPr>` |
| `<w:tblBorders>` | 表格边框 | 定义所有边框 | `<w:tblPr>` |
| `<w:gridCol>` | 列宽定义 | `<w:gridCol w:w="2000"/>` | `<w:tblGrid>` |
| `<w:gridSpan>` | 水平合并 | 跨列数（gridSpan=2表示合并2列） | `<w:tcPr>` |
| `<w:vMerge>` | 垂直合并 | restart=起始单元格，无val=续接单元格 | `<w:tcPr>` |
| `<w:sectPr>` | 节属性 | 页面大小、边距、分节 | `<w:body>` |

**常用属性值速查**：

| 属性 | 值 | 说明 |
|------|-----|------|
| 行间距 `w:line` | 240 | 单倍行距（12pt） |
| 行间距 `w:line` | 360 | 1.5倍行距 |
| 行间距 `w:line` | 480 | 双倍行距 |
| 首行缩进 `w:firstLineChars` | 200 | 约2字符（200/100=2） |
| 颜色 `w:val` | 000000 | 黑色 |
| 颜色 `w:val` | FF0000 | 红色 |
| 颜色 `w:val` | 0000FF | 蓝色 |
| 颜色 `w:val` | CCCCCC | 浅灰色 |
| 颜色 `w:val` | FFFF00 | 黄色（高亮） |

## 技术附录

### 附录1：XML文件完整模板

#### 1. [Content_Types].xml（文件类型声明）
```xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
 <Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
 <Default Extension="xml" ContentType="application/xml"/>
 <Override PartName="/word/document.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
 <Override PartName="/word/styles.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
 <Override PartName="/word/settings.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
 <Override PartName="/word/fontTable.xml" ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
 <Override PartName="/docProps/core.xml" ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
 <Override PartName="/docProps/app.xml" ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
</Types>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument" Target="word/document.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/package/2006/relationships/metadata/core-properties" Target="docProps/core.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/extended-properties" Target="docProps/app.xml"/>
</Relationships>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
    <Relationship Id="rId1" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles" Target="styles.xml"/>
    <Relationship Id="rId2" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings" Target="settings.xml"/>
    <Relationship Id="rId3" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable" Target="fontTable.xml"/>
</Relationships>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:styles xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:docDefaults>
        <w:rPrDefault>
            <w:rPr>
                <w:rFonts w:ascii="Calibri" w:eastAsia="SimSun" w:hAnsi="Calibri" w:cs="Times New Roman"/>
                <w:sz w:val="24"/>
                <w:szCs w:val="24"/>
            </w:rPr>
        </w:rPrDefault>
    </w:docDefaults>
    <w:style w:type="paragraph" w:default="1" w:styleId="Normal">
        <w:name w:val="Normal"/>
    </w:style>
    <w:style w:type="table" w:styleId="TableGrid">
        <w:name w:val="Table Grid"/>
        <w:tblPr>
            <w:tblBorders>
                <w:top w:val="single" w:sz="4" w:color="000000"/>
                <w:left w:val="single" w:sz="4" w:color="000000"/>
                <w:bottom w:val="single" w:sz="4" w:color="000000"/>
                <w:right w:val="single" w:sz="4" w:color="000000"/>
                <w:insideH w:val="single" w:sz="4" w:color="000000"/>
                <w:insideV w:val="single" w:sz="4" w:color="000000"/>
            </w:tblBorders>
        </w:tblPr>
    </w:style>
</w:styles>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:settings xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:zoom w:percent="100"/>
    <w:defaultTabStop w:val="720"/>
    <w:characterSpacingControl w:val="doNotCompress"/>
    <w:compat>
        <w:compatSetting w:name="compatibilityMode" w:uri="http://schemas.microsoft.com/office/word" w:val="15"/>
    </w:compat>
</w:settings>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:fonts xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">
    <w:font w:name="Calibri">
        <w:panose1 w:val="020F0502020204030204"/>
        <w:charset w:val="00"/>
    </w:font>
    <w:font w:name="SimSun">
        <w:panose1 w:val="02010600040101010101"/>
        <w:charset w:val="86"/>
    </w:font>
    <w:font w:name="Times New Roman">
        <w:panose1 w:val="02020603050405020304"/>
        <w:charset w:val="00"/>
    </w:font>
</w:fonts>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties 
    xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <dc:title><!-- 文档标题 --></dc:title>
    <dc:subject><!-- 主题 --></dc:subject>
    <dc:creator><!-- 创建者 --></dc:creator>
    <cp:lastModifiedBy><!-- 最后修改者 --></cp:lastModifiedBy>
    <cp:revision>1</cp:revision>
    <dcterms:created xsi:type="dcterms:W3CDTF">2025-01-01T00:00:00Z</dcterms:created>
    <dcterms:modified xsi:type="dcterms:W3CDTF">2025-01-01T00:00:00Z</dcterms:modified>
</cp:coreProperties>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties">
    <Template>Normal.dotm</Template>
    <TotalTime>0</TotalTime>
    <Pages>1</Pages>
    <Words>0</Words>
    <Characters>0</Characters>
    <Application>Microsoft Office Word</Application>
    <DocSecurity>0</DocSecurity>
    <Lines>0</Lines>
    <Paragraphs>0</Paragraphs>
    <AppVersion>16.0000</AppVersion>
</Properties>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document 
    xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
    xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships">
    <w:body>
        <!-- 在这里插入内容：段落、表格等 -->
        
        <w:sectPr>
            <w:pgSz w:w="11906" w:h="16838"/>
            <w:pgMar w:top="1134" w:right="1134" w:bottom="1134" w:left="1134" 
                     w:header="720" w:footer="720" w:gutter="0"/>
        </w:sectPr>
    </w:body>
</w:document>

<w:p>
    <w:pPr>
        <w:jc w:val="center"/>  <!-- 居中对齐，可选 -->
    </w:pPr>
    <w:r>
        <w:rPr>
            <w:b/>  <!-- 加粗，可选 -->
            <w:sz w:val="24"/>  <!-- 字号12pt，可选 -->
        </w:rPr>
        <w:t>段落文本内容</w:t>
    </w:r>
</w:p>

<w:r>
    <w:rPr>
        <w:u w:val="single"/>
    </w:rPr>
    <w:t xml:space="preserve"> 填写内容 </w:t>
</w:r>

<w:tbl>
    <w:tblPr>
        <w:tblStyle w:val="TableGrid"/>
        <w:tblW w:w="9000" w:type="dxa"/>
        <!-- 边框定义（可选，也可在styles.xml中定义） -->
        <w:tblBorders>
            <w:top w:val="single" w:sz="4" w:color="000000"/>
            <w:left w:val="single" w:sz="4" w:color="000000"/>
            <w:bottom w:val="single" w:sz="4" w:color="000000"/>
            <w:right w:val="single" w:sz="4" w:color="000000"/>
            <w:insideH w:val="single" w:sz="4" w:color="000000"/>
            <w:insideV w:val="single" w:sz="4" w:color="000000"/>
        </w:tblBorders>
    </w:tblPr>
    <w:tblGrid>
        <!-- 定义列宽，单位：twips -->
        <w:gridCol w:w="1000"/>
        <w:gridCol w:w="1000"/>
        <!-- 更多列... -->
    </w:tblGrid>
    <!-- 表头行 -->
    <w:tr>
        <w:tc>
            <w:tcPr>
                <w:tcW w:w="1000" w:type="dxa"/>
                <w:vAlign w:val="center"/>
            </w:tcPr>
            <w:p>
                <w:pPr>
                    <w:jc w:val="center"/>
                </w:pPr>
                <w:r>
                    <w:rPr>
                        <w:b/>
                    </w:rPr>
                    <w:t>列标题</w:t>
                </w:r>
            </w:p>
        </w:tc>
        <!-- 更多单元格... -->
    </w:tr>
    <!-- 数据行... -->
</w:tbl>

<w:body>
    <!-- 第一页内容 -->
    <w:p>...</w:p>
    <w:tbl>...</w:tbl>
    <w:p>
        <w:r>
            <w:br w:type="page"/>  <!-- 分页符 -->
        </w:r>
    </w:p>
    <!-- 第二页内容 -->
    <w:p>...</w:p>
    <w:tbl>...</w:tbl>
    
    <!-- 最后一个sectPr（必须） -->
    <w:sectPr>
        <w:pgSz w:w="11906" w:h="16838"/>
        <w:pgMar w:top="1134" w:right="1134" w:bottom="1134" w:left="1134"/>
    </w:sectPr>
</w:body>

<w:tc>
    <w:tcPr>
        <w:gridSpan w:val="2"/>  <!-- 合并2列 -->
    </w:tcPr>
    <w:p><w:r><w:t>跨2列的内容</w:t></w:r></w:p>
</w:tc>

<!-- 第一行：合并起始 -->
<w:tc>
    <w:tcPr>
        <w:vMerge w:val="restart"/>
    </w:tcPr>
    <w:p><w:r><w:t>合并内容</w:t></w:r></w:p>
</w:tc>
<!-- 第二行：续接 -->
<w:tc>
    <w:tcPr>
        <w:vMerge/>  <!-- 无val，表示续接上方 -->
    </w:tcPr>
    <w:p/>
</w:tc>

# 在temp/目录下打包
cd temp && zip -r -0 -X ../output.docx "[Content_Types].xml" _rels word docProps
# -0: 不压缩（如需混合压缩，分步处理）
# -X: 排除系统隐藏文件

const archiver = require('archiver');
const fs = require('fs');

const output = fs.createWriteStream('output.docx');
const archive = archiver('zip', { zlib: { level: 9 } });

archive.pipe(output);

// 关系文件不压缩
archive.file('temp/[Content_Types].xml', { name: '[Content_Types].xml', store: true });
archive.file('temp/_rels/.rels', { name: '_rels/.rels', store: true });
archive.file('temp/word/_rels/document.xml.rels', { name: 'word/_rels/document.xml.rels', store: true });

// 其他XML文件压缩
archive.file('temp/word/document.xml', { name: 'word/document.xml' });
archive.file('temp/word/styles.xml', { name: 'word/styles.xml' });
archive.file('temp/word/settings.xml', { name: 'word/settings.xml' });
archive.file('temp/word/fontTable.xml', { name: 'word/fontTable.xml' });
archive.file('temp/docProps/core.xml', { name: 'docProps/core.xml' });
archive.file('temp/docProps/app.xml', { name: 'docProps/app.xml' });

archive.finalize();

import zipfile

with zipfile.ZipFile('output.docx', 'w', zipfile.ZIP_DEFLATED) as zf:
    # 关系文件不压缩
    zf.write('temp/[Content_Types].xml', '[Content_Types].xml', compress_type=zipfile.ZIP_STORED)
    zf.write('temp/_rels/.rels', '_rels/.rels', compress_type=zipfile.ZIP_STORED)
    # 其他XML文件压缩
    zf.write('temp/word/document.xml', 'word/document.xml')
    zf.write('temp/word/styles.xml', 'word/styles.xml')
    # ... 其余文件

// mm to twips
const twips = Math.round(mm * 1440 / 25.4);

// pt to half-points
const halfPoints = pt * 2;

前缀	命名空间URL	用途
`w`	`http://schemas.openxmlformats.org/wordprocessingml/2006/main`	核心WordprocessingML元素
`r`	`http://schemas.openxmlformats.org/officeDocument/2006/relationships`	关系（用于关联样式、字体等）
`cp`	`http://schemas.openxmlformats.org/package/2006/metadata/core-properties`	核心属性（标题、作者等）
`dc`	`http://purl.org/dc/elements/1.1/`	Dublin Core元数据
`dcterms`	`http://purl.org/dc/terms/`	Dublin Core术语
(无)	`http://schemas.openxmlformats.org/package/2006/content-types`	[Content_Types].xml根命名空间
(无)	`http://schemas.openxmlformats.org/package/2006/relationships`	.rels文件根命名空间
(无)	`http://schemas.openxmlformats.org/officeDocument/2006/extended-properties`	app.xml根命名空间

你是一个专业的文档翻译助手。请将以下中文内容翻译为英文。

核心原则：
1. 关键数据精确复制，禁止翻译：数字、日期、批号、有效期、剂量、规格、百分比、化学式
2. 专业术语翻译准确，不确定时保留原文并括号标注英文
3. 保持原有的格式和标点
4. 遵循术语表中的翻译（如有）

术语表：
{terminology}

待翻译内容：
{content}

请直接输出翻译结果，无需解释。

属性	元素	说明
`xml:space="preserve"`	`<w:t>`	保留文本中的空格，必须用于包含空格的文本
`encoding="UTF-8"`	XML声明 `<?xml?>`	所有XML文件必须在声明中指定UTF-8编码
`xmlns:w`	根元素	WordprocessingML命名空间声明

属性	用途	示例值
`w:val="single"`	单下划线	`<w:u w:val="single"/>`
`w:val="double"`	双下划线	`<w:u w:val="double"/>`
`w:jc="center"`	居中对齐	left/center/right/both
`w:type="auto"`	自动列宽	用于表格列宽自适应
`w:type="dxa"`	绝对单位（twips）	用于表格列宽固定值

元素类型	标注方式	处理策略
印章	`[SEAL: XXX公司印章]`	保留占位标注，提醒用户补充
签名	`[SIGNATURE: 张三]`	保留占位标注，提醒用户补充
Logo	`[LOGO: XXX公司]`	如有原图可嵌入，否则占位标注
流程图	`[DIAGRAM: 描述内容]`	用文字描述结构，提醒用户补充
化学式	保留原文	化学式不翻译，精确复制
条码/二维码	`[BARCODE/QRCODE]`	保留占位标注

Pdf To En Word

pdf-to-en-word PDF转英文Word文档技能

技能概述

适用场景

核心流程

Pdf To En Word

pdf-to-en-word PDF转英文Word文档技能

技能概述

适用场景

核心流程

单页文档流程

多页文档流程

页面元数据摘要

2. _rels/.rels（包级关系）

3. word/_rels/document.xml.rels（文档级关系）

4. word/styles.xml（样式定义）

5. word/settings.xml（文档设置）

6. word/fontTable.xml（字体表）

7. docProps/core.xml（核心属性）

8. docProps/app.xml（应用属性）

附录2：document.xml结构参考

附录3：多页文档与合并单元格

多页文档处理

合并单元格处理

附录4：ZIP打包方案

附录5：单位转换工具

附录6：命名空间速查表

附录7：LLM动态翻译方案

附录8：关键属性清单

质量控制要点

1. 保真度检查

2. 格式一致性

3. 兼容性测试

常见问题与解决方案

问题1：Word打开报错"文件损坏"

问题2：文档内容显示异常

问题3：表格边框不显示

问题4：文档打开后空白

扩展能力

模板化

图片嵌入

非文字内容处理

最佳实践

工具依赖

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing