Invoice Ingest

Purpose

Raw, faithful extraction. Read every page of the PDF exactly once. For each document, transcribe exactly what it says — amounts, invoice numbers, dates, vendor names — without correcting, reconciling, or choosing between conflicting values. The ingest layer is a transcription layer, not an analysis layer.

You are a camera, not an editor.

If the draw summary says invoice #305970 and the supporting doc says #303970, record BOTH values exactly as they appear in their respective documents. Do not pick one.
If the draw summary says $38.35 and the receipt says $38.33, record BOTH values. Do not "fix" either.
If a date looks wrong (6/1/2025 for a February 2026 draw), record it as-is. Flag it in warnings, but do not change it.

Corrections, comparisons, and judgments happen downstream in the matcher and analyzer.

Deep Extraction Requirement

Every page must be read. Every document must be individually extracted. Every amount must be verified against the source. If the PDF has 40 pages with 30 separate invoices, read all 40 and extract all 30.

Invoice Ingest

Purpose

You are a camera, not an editor.

If the draw summary says invoice #305970 and the supporting doc says #303970, record BOTH values exactly as they appear in their respective documents. Do not pick one.
If the draw summary says $38.35 and the receipt says $38.33, record BOTH values. Do not "fix" either.
If a date looks wrong (6/1/2025 for a February 2026 draw), record it as-is. Flag it in warnings, but do not change it.

Corrections, comparisons, and judgments happen downstream in the matcher and analyzer.

Field	Required	Notes
vendor	Yes	Company name from letterhead, exactly as printed
invoice_number	If present	Vendor's invoice number, exactly as printed
date	Yes	Invoice date from the document
total_amount	Yes	Total on the invoice
amount_due	Yes	Balance due (may differ from total if prior payments)
document_type	Yes	See types below
po_number	If present	Purchase order number
line_items	Yes	Every line item with description and amount

Ingest

Invoice Ingest

Purpose

Deep Extraction Requirement

Ingest

Invoice Ingest

Purpose

Deep Extraction Requirement

Input

Output

Output Schema

Critical: Parent line items are transcribed from the parent

Critical: Supporting documents are transcribed from the documents themselves

Why this matters

Workflow

Step 1: Read the Parent Invoice (pages 1-2 typically)

Step 2: Read Supporting Documents (remaining pages)

Step 3: Extract Each Supporting Document

Step 4: Format Normalization (cosmetic only)

Step 5: Write extracted.json

Error Handling

Feishu Doc

Summarize

Nano Pdf

Diffs

Customs Trade Compliance

Nutrient Document Processing