Extract structured data from invoices, receipts, and bills using DeepRead. Pre-built schemas for vendor, line items, totals, tax, due dates. 97%+ accuracy with human-in-the-loop flags. Free 2,000 pages/month.
Extract structured data from invoices, receipts, purchase orders, and bills. Submit a PDF or image, get back typed JSON with vendor, line items, totals, tax, and due dates — with confidence flags telling you exactly which fields need human review.
This skill instructs the agent to POST documents to
https://api.deepread.techand poll for results. No system files are modified.
Submit an invoice PDF and get structured JSON like this:
{
"vendor": {"value": "Acme Corp", "hil_flag": false, "found_on_page": 1},
"invoice_number": {"value": "INV-2026-0042", "hil_flag": false, "found_on_page": 1},
"invoice_date": {"value": "2026-03-15", "hil_flag": false, "found_on_page": 1},
"due_date": {"value": "2026-04-15", "hil_flag": false, "found_on_page": 1},
"subtotal": {"value": 1150.00, "hil_flag": false, "found_on_page": 1},
"tax": {"value": 100.00, "hil_flag": false, "found_on_page": 1},
"total": {"value": 1250.00, "hil_flag": false, "found_on_page": 1},
"currency": {"value": "USD", "hil_flag": false, "found_on_page": 1},
"payment_terms": {"value": "Net 30", "hil_flag": true, "reason": "Inferred from dates"},
"line_items": {"value": [
{"description": "Consulting services - March", "quantity": 40, "unit_price": 25.00, "amount": 1000.00},
{"description": "Software license", "quantity": 1, "unit_price": 150.00, "amount": 150.00}
], "hil_flag": false, "found_on_page": 1}
}
Fields with hil_flag: true need human review. Everything else is high-confidence and can be auto-processed.
open "https://www.deepread.tech/dashboard/?utm_source=clawhub"
Save it:
export DEEPREAD_API_KEY="sk_live_your_key_here"
Use this pre-built schema for invoices. It covers the most common fields across invoice formats:
{
"type": "object",
"properties": {
"vendor": {"type": "string", "description": "Company or vendor name on the invoice"},
"vendor_address": {"type": "string", "description": "Vendor's full mailing address"},
"invoice_number": {"type": "string", "description": "Invoice number or reference ID"},
"invoice_date": {"type": "string", "description": "Date the invoice was issued (YYYY-MM-DD)"},
"due_date": {"type": "string", "description": "Payment due date (YYYY-MM-DD)"},
"po_number": {"type": "string", "description": "Purchase order number if referenced"},
"bill_to": {"type": "string", "description": "Name and address of the entity being billed"},
"subtotal": {"type": "number", "description": "Subtotal before tax and discounts"},
"tax": {"type": "number", "description": "Total tax amount"},
"discount": {"type": "number", "description": "Total discount applied"},
"total": {"type": "number", "description": "Total amount due including tax"},
"currency": {"type": "string", "description": "Currency code (USD, EUR, GBP, etc.)"},
"payment_terms": {"type": "string", "description": "Payment terms (Net 30, Due on receipt, etc.)"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string", "description": "Item or service description"},
"quantity": {"type": "number", "description": "Quantity"},
"unit_price": {"type": "number", "description": "Price per unit"},
"amount": {"type": "number", "description": "Line total"}
}
},
"description": "List of line items on the invoice"
}
}
}
import requests
import json
import time
API_KEY = "sk_live_YOUR_KEY"
BASE = "https://api.deepread.tech"
headers = {"X-API-Key": API_KEY}
# Invoice schema
schema = json.dumps({
"type": "object",
"properties": {
"vendor": {"type": "string", "description": "Company or vendor name"},
"invoice_number": {"type": "string", "description": "Invoice number"},
"invoice_date": {"type": "string", "description": "Date issued (YYYY-MM-DD)"},
"due_date": {"type": "string", "description": "Payment due date (YYYY-MM-DD)"},
"subtotal": {"type": "number", "description": "Subtotal before tax"},
"tax": {"type": "number", "description": "Total tax amount"},
"total": {"type": "number", "description": "Total amount due"},
"currency": {"type": "string", "description": "Currency code (USD, EUR, etc.)"},
"line_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"description": {"type": "string"},
"quantity": {"type": "number"},
"unit_price": {"type": "number"},
"amount": {"type": "number"}
}
},
"description": "Line items"
}
}
})
# Submit invoice
with open("invoice.pdf", "rb") as f:
job = requests.post(
f"{BASE}/v1/process",
headers=headers,
files={"file": f},
data={"schema": schema},
).json()
job_id = job["id"]
print(f"Processing invoice: {job_id}")
# Poll for results
delay = 5
while True:
time.sleep(delay)
result = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=headers).json()
if result["status"] == "completed":
data = result["result"]["data"]
# Auto-process high-confidence fields
for field, value in data.items():
if isinstance(value, dict):
if value.get("hil_flag"):
print(f" REVIEW: {field} = {value['value']} ({value.get('reason')})")
else:
print(f" OK: {field} = {value['value']}")
break
elif result["status"] == "failed":
print(f"Failed: {result.get('error')}")
break
delay = min(delay * 1.5, 15)
# Submit invoice with schema
JOB_ID=$(curl -s -X POST https://api.deepread.tech/v1/process \
-H "X-API-Key: $DEEPREAD_API_KEY" \
-F "[email protected]" \
-F 'schema={"type":"object","properties":{"vendor":{"type":"string","description":"Company name"},"invoice_number":{"type":"string","description":"Invoice number"},"total":{"type":"number","description":"Total due"},"due_date":{"type":"string","description":"Due date"},"line_items":{"type":"array","items":{"type":"object","properties":{"description":{"type":"string"},"amount":{"type":"number"}}},"description":"Line items"}}}' \
| python3 -c "import sys,json; print(json.load(sys.stdin)['id'])")
echo "Processing: $JOB_ID"
# Poll
while true; do
sleep 5
RESULT=$(curl -s "https://api.deepread.tech/v1/jobs/$JOB_ID" -H "X-API-Key: $DEEPREAD_API_KEY")
STATUS=$(echo "$RESULT" | python3 -c "import sys,json; print(json.load(sys.stdin)['status'])")
echo " Status: $STATUS"
[ "$STATUS" = "completed" ] || [ "$STATUS" = "failed" ] && break
done
echo "$RESULT" | python3 -c "import sys,json; print(json.dumps(json.load(sys.stdin)['result']['data'], indent=2))"
For processing multiple invoices, submit them in parallel and collect results:
import requests
import json
import time
from concurrent.futures import ThreadPoolExecutor
API_KEY = "sk_live_YOUR_KEY"
BASE = "https://api.deepread.tech"
headers = {"X-API-Key": API_KEY}
schema = json.dumps({...}) # Use the invoice schema above
def process_invoice(file_path):
with open(file_path, "rb") as f:
job = requests.post(
f"{BASE}/v1/process",
headers=headers,
files={"file": f},
data={"schema": schema},
).json()
job_id = job["id"]
delay = 5
while True:
time.sleep(delay)
result = requests.get(f"{BASE}/v1/jobs/{job_id}", headers=headers).json()
if result["status"] in ("completed", "failed"):
return result
delay = min(delay * 1.5, 15)
# Process 10 invoices in parallel
invoice_files = ["invoice_01.pdf", "invoice_02.pdf", "invoice_03.pdf"]
with ThreadPoolExecutor(max_workers=5) as pool:
results = list(pool.map(process_invoice, invoice_files))
for r in results:
if r["status"] == "completed":
data = r["result"]["data"]
vendor = data.get("vendor", {}).get("value", "Unknown")
total = data.get("total", {}).get("value", 0)
print(f" {vendor}: ${total}")
Connect your own OpenAI, Google, or OpenRouter key via the dashboard. All invoice processing routes through your provider — zero DeepRead LLM costs, page quota skipped.
Set it up: https://www.deepread.tech/dashboard/byok
clawhub install uday390/deepread-ocrclawhub install uday390/deepread-form-fillclawhub install uday390/deepread-piiclawhub install uday390/deepread-agent-setupclawhub install uday390/deepread-byokGet started free: https://www.deepread.tech/dashboard/?utm_source=clawhub