Audit site log / maintenance log PDFs and generate per-PDF Word reports. Use when the user wants to batch process 工程日志, 养护日志, 施工日志, 巡检日志, 园林/绿化项目 PDF台账, detect duplicate construction photos, detect clear text-image mismatches with a conservative standard, especially weekday/date errors, and write a .docx report next to each PDF.
Use this skill to batch audit project log PDFs and output one Word report per PDF.
分析报告.docx next to each source PDFUse a strict standard. Only report issues that are clearly supported.
Report:
Do not report vague risks by default. Examples that should usually not be reported unless clearly proven:
Each report contains only two sections:
重复图片问题明确的图文不符问题Rules:
未发现...Before duplicate detection, ignore obvious non-photo assets such as repeated header/footer/template images.
Default thresholds:
If a PDF has unusual layout, adjust thresholds carefully rather than disabling filtering completely.
Use the bundled script:
./scripts/analyze_pdf_site_logs.py
python3 scripts/analyze_pdf_site_logs.py "/path/to/file.pdf"
python3 scripts/analyze_pdf_site_logs.py "/path/to/folder" --recursive
python3 scripts/analyze_pdf_site_logs.py "/path/to/folder" --recursive \
--min-width 500 --min-height 350 --min-bytes 50000
python3 scripts/analyze_pdf_site_logs.py "/path/to/folder" --recursive \
--json-out /tmp/pdf_audit_summary.json
python3 scripts/analyze_pdf_site_logs.py "/path/to/folder" --recursive \
--sample-check 5 --json-out /tmp/pdf_audit_summary.json
分析报告.docxsample_checks with random re-check resultsIf a report claims many pages share the same image, first suspect repeated template graphics, not bad hashing.
Check:
If needed, rerun with stricter thresholds.
When confidence matters:
This workflow is tuned for工程/养护/施工类日志 PDF. If the user wants deeper visual mismatch inspection beyond date errors, that should be treated as a stricter second-pass review, not default batch output.
Edit PDFs with natural-language instructions using the nano-pdf CLI.