Document parsing pipeline for financial statements (PDFs, images, CSVs) using OpenRouter vision models. Use this skill when working with statement uploads, parsing, confidence scoring, or supported institutions.
Core Definition: Parsing financial statements using vision models with confidence scoring.
flowchart TB
A[Upload PDF/Image/CSV] --> S[Store to Object Storage]
S --> P[Create PARSING Statement]
P --> B{File Type}
B -->|PDF/Image| C["OpenRouter Vision Model"]
B -->|CSV| D[Structured Parser]
C --> E[Extract JSON]
D --> E
E --> F{Confidence Score}
F -->|≥85| G[Auto-Accept]
F -->|60-84| H[Review Queue]
F -->|<60| I[Manual Entry]
G --> J[(PostgreSQL)]
H --> J
| Factor | Weight | Criteria |
|---|
| Balance Check | 40% | opening + Σtxn ≈ closing (±0.1) |
| Field Completeness | 30% | Required fields present |
| Format Consistency | 20% | Valid date/amount formats |
| Transaction Count | 10% | Reasonable (1-500) |
Thresholds:
| Institution | Format | Tier |
|---|---|---|
| DBS/POSB | v1 | |
| CMB (China Merchants Bank) | v1 | |
| Maybank | v1 | |
| Wise | PDF/CSV | v1 |
| Brokerage (generic) | PDF/CSV | v1 |
| Insurance (generic) | v1 | |
| OCBC | Extended | |
| MariBank | Extended | |
| GXS | Extended |
To prevent floating-point errors:
float for amount fields. MUST use DecimalDECIMAL(18,2)parsing longer than 30 minutes are marked rejectedapps/backend/src/models/statement.pyapps/backend/src/schemas/extraction.pyapps/backend/src/services/extraction.pyapps/backend/src/services/validation.pyapps/backend/src/services/storage.py