技能檔案

Synoptic: PDF Document Analysis and Visualization

Name: Synoptic: PDF Document Analysis and Visualization
Author: Davinci-Meg

Analyze PDF documents and visualize their structure, citations, relationships, and summaries. Use this skill whenever the user wants to understand a PDF's structure, see how sections relate to each other, visualize citation networks, get section-by-section summaries, or see information density across pages. Trigger on phrases like "analyze this PDF", "show me the structure", "visualize this document", "summarize this paper", "how is this document organized", "show citation map", "what are the key sections", or when a user uploads/references a PDF and wants to understand its content at a structural level.

Davinci-Meg0 星標2026年2月25日

職業
分類: 文件

技能內容

Complete Workflow

Step 1: PDF Text Extraction

Extract text from the input PDF using pdftotext (from poppler-utils) as the primary method:

pdftotext -layout <input.pdf> extracted.txt

If pdftotext is not installed or fails, fall back to the Python extraction script:

python scripts/extract_pdf.py <input.pdf> extracted.txt

Output: extracted.txt — page-delimited plain text of the entire PDF.

Step 2: Local Statistics Calculation (No API Needed)

Run the page statistics script to compute per-page metrics locally:

相關技能

Synoptic: PDF Document Analysis and Visualization | Skills Pool

python scripts/page_stats.py extracted.txt page_stats.json

{
  "title": "Document title",
  "structure": {
    "sections": [
      {
        "id": "sec-1",
        "title": "Section Title",
        "level": 1,
        "page": 1,
        "charCount": 1234,
        "summary": "Brief summary of the section content.",
        "children": [
          {
            "id": "sec-1-1",
            "title": "Subsection Title",
            "level": 2,
            "page": 2,
            "charCount": 567,
            "summary": "Brief summary of the subsection.",
            "children": []
          }
        ]
      }
    ]
  },
  "citations": {
    "references": [
      {
        "id": "ref-1",
        "label": "[1]",
        "title": "Referenced work title",
        "authors": "Author A, Author B",
        "year": 2023
      }
    ],
    "inTextCitations": [
      {
        "referenceId": "ref-1",
        "page": 3,
        "context": "Surrounding sentence where the citation appears."
      }
    ]
  },
  "relationships": {
    "edges": [
      {
        "from": "sec-1",
        "to": "sec-2",
        "type": "prerequisite",
        "label": "Section 1 introduces concepts used in Section 2"
      }
    ]
  },
  "summary": {
    "overall": "A 2-3 sentence overall summary of the document.",
    "keywords": ["keyword1", "keyword2", "keyword3"],
    "sections": [
      {
        "id": "sec-1",
        "summary": "Summary of this section."
      }
    ]
  }
}

python scripts/generate_report.py page_stats.json analysis.json synoptic_report.html

start synoptic_report.html      # Windows
open synoptic_report.html       # macOS
xdg-open synoptic_report.html   # Linux

Synoptic: PDF Document Analysis and Visualization

Complete Workflow

Step 1: PDF Text Extraction

Step 2: Local Statistics Calculation (No API Needed)

Synoptic: PDF Document Analysis and Visualization

Complete Workflow

Step 1: PDF Text Extraction

Step 2: Local Statistics Calculation (No API Needed)

Step 3: AI Analysis (Claude Performs This Directly)

Step 4: HTML Report Generation

Step 5: Open Report

PDF Size Handling

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing