LLM-enforced PDF to DOCX paper summaries: read each PDF, write Chinese review-style sections (研究背景/研究内容/主要结论) with anti-repetition QA, extract figures/tables, and render into a provided template docx (e.g., 示例.docx). Use for batch-processing academic PDFs into high-quality, logic-driven Word summaries.
Generate per-paper Word summaries that match a provided template .docx, while forcing the agent to: (1) read the PDF, (2) write a logical Chinese review (not sentence-by-sentence translation), and (3) pass QA before producing the final docx.
Inputs:
示例.docx)pdfs/<paper>.pdf)out/<paper-id>/)Steps:
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/extract_paper_pack.py" \
--pdf 'pdfs/<paper>.pdf' \
--assets-dir 'out/<paper-id>/assets' \
--out-json 'out/<paper-id>/paper_pack.json'
Create docs/plans/<paper-id>.json following:
references/config-format.mdreferences/writing-standard.md (this is the “示例.docx” writing standard)Hard rules (do not violate):
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/qa_config.py" \
--config 'docs/plans/<paper-id>.json' \
--pack 'out/<paper-id>/paper_pack.json'
If QA fails, revise the JSON (do not “paper over” errors with generic filler).
python "${CODEX_HOME:-$HOME/.codex}/skills/pdf-docx-paper-summary-llm/scripts/build_paper_docx.py" \
--template '示例.docx' \
--config 'docs/plans/<paper-id>.json' \
--pdf 'pdfs/<paper>.pdf' \
--out-docx 'out/<paper-id>/<paper-id>_summary.docx' \
--assets-dir 'out/<paper-id>/assets'
Process a directory of PDFs by looping the per-PDF workflow. In batch mode, do not move on to the next paper until the current paper passes qa_config.py.
Recommended output layout:
docs/plans/<paper-id>.json
out/<paper-id>/paper_pack.json
out/<paper-id>/assets/FigXX.png
out/<paper-id>/<paper-id>_summary.docx