Extract structured data from invoices and receipts (PDFs and images). Output JSON, CSV, or build a running expense ledger. Use when someone shares an invoice to process, asks to track expenses, categorize spending, or prepare tax documents.
Turn invoices and receipts into structured expense data. Extract from PDFs and images, auto-categorize spending, and maintain a running CSV ledger.
Hybrid approach: A Python script handles PDF text extraction and ledger management, while you (the agent) parse the invoice content — LLMs understand varied formats far better than regex.
pip install pdfplumber
# Fallback: PyPDF2 (auto-used if pdfplumber unavailable)
Script: scripts/extract.py (relative to this skill directory)
Config: expense-config.json (same directory)
python3 scripts/extract.py pdf <file-path>
Read the output text, parse it into structured JSON (see schema below), then confirm with the user before adding to ledger.
Use the image tool with a prompt like:
"Extract all invoice/receipt data from this image. Return vendor, invoice number, date, line items, subtotal, tax, total, and currency."
Parse the result into structured JSON, then confirm with the user before adding to ledger.
Always present extracted data for user review before writing to the ledger:
📋 Invoice Extracted
Vendor: Amazon
Date: 2026-04-01
Invoice #: INV-2026-001
Description: Office supplies — keyboard and monitor
Total: €539.96 (incl. €100.97 tax)
Category: office (auto)
Add to ledger? (yes/edit/skip)
Format output for the current channel — adapt formatting to match what the platform supports. See references/formatting.md for platform-specific examples.
On confirmation, write the JSON to a temp file and run:
python3 scripts/extract.py ledger add /tmp/invoice-entry.json
Or pipe via stdin:
echo '<json>' | python3 scripts/extract.py ledger add -
If the user says "edit", modify the requested fields and re-confirm. If "skip", discard.
python3 scripts/extract.py batch <folder-path>
pdf command, images via image tool)Show this summary after processing all files:
📦 Batch Results — 8 files processed
1. Amazon EU S.a.r.l. — €191.84 — office
2. Tesco — €25.26 — food
3. DigitalOcean LLC — €35.81 — software
4. Insomnia Coffee — €9.84 — food
5. ACME Solutions Ltd — €3,867.11 — uncategorized ⚠️
... (errors shown separately)
Total: €4,129.86 across 5 entries (1 error)
Add all to ledger? (yes/edit/skip)
On confirmation, add all entries at once. If the user wants to edit, modify specific entries and re-confirm.
View entries with optional filters:
python3 scripts/extract.py ledger view [filters]
--from DATE Entries from this date (YYYY-MM-DD)
--to DATE Entries up to this date
--category CAT Filter by category name
--vendor VENDOR Filter by vendor (partial match)
--format json|csv Output format (default: json)
Edit an entry:
python3 scripts/extract.py ledger edit --id N --vendor "New Name"
python3 scripts/extract.py ledger edit --id N --total 250.00 --category software
python3 scripts/extract.py ledger edit --id N --date 2026-04-02
Editable fields: --vendor, --total, --date, --description, --category, --currency, --subtotal, --tax. Multiple fields in one command. Auto-recalculates the dedup hash.
Delete an entry:
python3 scripts/extract.py ledger delete --id N
Removes the entry, renumbers remaining IDs, creates a backup.
Undo last add:
python3 scripts/extract.py ledger undo
Removes the most recently added entry (highest ID). One-level undo only.
Category summaries:
python3 scripts/extract.py ledger summary [--period week|month|year]
Structure all extracted invoice data as:
{
"vendor": "Amazon",
"invoiceNumber": "INV-2026-001",
"date": "2026-04-01",
"dueDate": "2026-04-30",
"description": "Office supplies — keyboard and monitor",
"lineItems": [
{"description": "Mechanical Keyboard", "quantity": 1, "unitPrice": 89.99},
{"description": "USB-C Monitor", "quantity": 1, "unitPrice": 349.00}
],
"subtotal": 438.99,
"tax": 100.97,
"total": 539.96,
"currency": "EUR",
"category": "office"
}
Required for ledger: vendor, total, date
Optional: everything else — the script handles missing fields gracefully
Auto-categorizes based on keyword matching in expense-config.json. Checks vendor name and description against category keywords (case-insensitive).
python3 scripts/extract.py categories
Users can customize by editing the config. Suggest adding new keywords when a vendor doesn't match.
Export ledger entries in platform-specific CSV formats for direct import into accounting software.
python3 scripts/extract.py ledger export --platform <name> [filters] [--output FILE]
Filters: --from DATE, --to DATE, --category CAT, --vendor VENDOR
| Platform | Use Case | Notes |
|---|---|---|
xero | Bills/Expenses import | DD/MM/YYYY dates, includes AccountCode & TaxRate |
freeagent | Out-of-pocket expenses | No header row, needs claimantName in config |
wave | Bank transactions | Negative amounts for expenses |
generic | Excel/Google Sheets | Full detail, clean format |
# Export all entries for Xero
python3 scripts/extract.py ledger export --platform xero
# Export April expenses to a file
python3 scripts/extract.py ledger export --platform xero --from 2026-04-01 --to 2026-04-30 --output /tmp/xero-export.csv
# Filter by category for FreeAgent
python3 scripts/extract.py ledger export --platform freeagent --category travel --output /tmp/freeagent-travel.csv
Define custom export formats in expense-config.json under exportPresets:
{
"exportPresets": {
"my-accounting": {
"columns": ["date", "vendor", "amount", "category", "notes"],
"headerRow": true,
"dateFormat": "%m/%d/%Y",
"amountHandling": "positive",
"fieldMapping": {
"date": "date",
"vendor": "vendor",
"amount": "total",
"category": "category",
"notes": "description"
}
}
}
}
The fieldMapping maps CSV column names → ledger field names. Use: --platform my-accounting
If no --output is specified, CSV goes to stdout. For file attachments:
--output /tmp/invoice-export-<platform>-<timestamp>.csvMEDIA:<path-to-csv>Here's your Xero import file (12 entries, April 2026).
MEDIA:/tmp/invoice-export-xero-20260406.csv
If the user names a platform that isn't built-in and isn't in their custom presets:
web_search to find "[platform name] CSV import format expenses"fieldMapping from our ledger fields to their columnsexpense-config.json under exportPresetsThe config file (expense-config.json) lives in the skill root directory. See references/configuration.md for the full config reference.
# Use a custom config
python3 scripts/extract.py --config /path/to/config.json <command>
--force to overrideFor edge cases (encrypted PDFs, scanned/image-only PDFs, dependency errors), see references/notes.md.
defaults.dateFormat