Name: Invoice Processing
Author: qte77

Source: $ARGUMENTS (defaults: ./inbox/ → ./output/invoices.xlsx)

Deterministic pipeline: scan documents → extract fields → validate → write spreadsheet.

Workflow

Discover documents — glob $SOURCE_DIR for *.pdf, *.png, *.jpg, *.jpeg
Extract fields — for each document read/OCR and extract:
- vendor (company name)
- date (normalize to YYYY-MM-DD)
- amount (numeric, preserve currency symbol)
- invoice_number (as printed; null if absent)
- category (auto-classify: travel, office, software, meals, other)
Validate — flag anomalies before writing:
- Duplicate invoice_number across documents
- amount = 0 or negative
- date more than 90 days in the past or any future date

Invoice Processing