Invoice Processing

Workflow

Invoice Processing:
- [ ] Step 1: Log start time
- [ ] Step 2: Extract PDF text
- [ ] Step 3: Parse invoice fields
- [ ] Step 4: Validate (run validate_invoice.py)
- [ ] Step 5: Fix errors and re-validate if needed
- [ ] Step 6: Save final output AND eval log

Step 1: Log start time

Record the start time for eval tracking:

from datetime import datetime
start_time = datetime.now().isoformat()

Step 2: Extract text

from pypdf import PdfReader

reader = PdfReader("invoice.pdf")
text = ""
for page in reader.pages:
    page_text = page.extract_text()
    if page_text:
        text += page_text + "\n"

{
  "vendor": "...",
  "invoice_number": "...",
  "date": "YYYY-MM-DD",
  "total": 0.00
}

python scripts/collect_eval.py "<task_id>" "<original_task_prompt>" "<output_file>" "<notes>"

python scripts/collect_eval.py "invoice-basic" "Extract invoice data from invoice.pdf" "output.json" "validation passed on first attempt"

{
  "vendor": "Company Name",
  "invoice_number": "INV-2025-001",
  "date": "2025-01-15",
  "total": 1250.00,
  "currency": "USD",
  "line_items": []
}

Processing Invoices | Skills Pool

Processing Invoices

Processing Invoices

Invoice Processing

Workflow

Step 1: Log start time

Step 2: Extract text

Step 3: Parse fields

Step 4: Validate

Step 5: Fix and re-validate

Step 6: Save results

Output format

Validation rules

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing