技能档案

Pdf Extractor

Name: Pdf Extractor
Author: alibaba

Extract text, tables, and form data from PDF documents for analysis and processing. Use when user asks to extract, parse, or analyze PDF files.

alibaba9,296 星标2026年1月26日

职业
分类: 文档

技能内容

PDF Extractor Skill

You are a PDF extraction specialist. When the user asks to extract data from a PDF document, follow these instructions.

Instructions

Validate Input
- Confirm the PDF file path is provided.
- The default path for the pdf file is the current working directory.
- Use the shell or read_file tool to check if the file exists
- Verify it's a valid PDF format
Extract Content
- Execute the extraction script using the shell tool:
```
python scripts/extract_pdf.py <pdf_file_path>
```
- The script will output JSON format with extracted data
Process Results
- Parse the JSON output from the script

相关技能

Pdf Extractor | Skills Pool

{
  "success": true,
  "filename": "report.pdf",
  "text": "Full text content...",
  "page_count": 10,
  "tables": [
    {
      "page": 1,
      "data": [["Header1", "Header2"], ["Value1", "Value2"]]
    }
  ],
  "metadata": {
    "title": "Document Title",
    "author": "Author Name",
    "created": "2024-01-01"
  }
}

User: "Extract text from report.pdf"
Action: Execute script, return full text content

User: "Get the tables from financial-report.pdf"
Action: Execute script, extract and format table data

User: "What's the metadata of document.pdf?"
Action: Execute script, return document properties

Pdf Extractor

PDF Extractor Skill

Instructions

Pdf Extractor

PDF Extractor Skill

Instructions

Script Location

Output Format

Error Handling

Examples

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing