AI provider integration for invoice scanning. ProviderInterface contract, OpenAI/Gemini/Mistral implementations, extraction prompts, schema validation, confidence scoring, PDF/image processing, base64 encoding.
All AI providers implement ProviderInterface:
interface ProviderInterface {
public function getName(): string;
public function isAvailable(): bool;
public function analyzeDocument(
string $content,
string $mimeType,
string $prompt,
string $systemPrompt
): string;
}
| Provider | Class | Default Model | API |
|---|---|---|---|
| OpenAI | OpenAIProvider | gpt-5-mini | Chat Completions |
| Gemini | GeminiProvider | gemini-2.5-flash |
| GenerateContent |
| Mistral | MistralProvider | mistral-small-latest | Chat Completions |
| Custom | OpenAICompatibleProvider | User-configured | OpenAI-compatible |
Lib/Provider/{Name}Provider.php implementing ProviderInterfaceExtractionService::getProvider() switchAiScanSettings::getDefaults() ({name}_api_key, {name}_model)XMLView/AiScanConfig.xmlTranslation/*.jsonFile upload → MIME detection → Content encoding → Prompt assembly → API call → JSON parse → Schema validation → Supplier matching → Return
| MIME Type | Encoding | Method |
|---|---|---|
image/jpeg, image/png, image/webp | Base64 data URI | data:{mime};base64,{content} |
application/pdf | Text extraction first | pdftotext via shell, fallback to base64 |
application/octet-stream | Detected by extension | Route to image or PDF handling |
pdftotext — fast, lightweight, works for text-based PDFsSettings('AiScan', 'extraction_prompt')ExtractionService::getDefaultSystemPrompt()ExtractionService{{FILE_NAME}}, {{MIME_TYPE}}The AI must return this JSON structure:
{
"document_type": "invoice|receipt|proforma|credit_note|unknown",
"supplier": {
"name": "string",
"tax_id": "string",
"email": "string|null",
"phone": "string|null",
"website": "string|null",
"address": "string|null"
},
"customer": { "name": "string", "tax_id": "string", "address": "string|null" },
"invoice": {
"number": "string (required)",
"issue_date": "YYYY-MM-DD (required)",
"due_date": "YYYY-MM-DD|null",
"currency": "ISO 4217 3-letter",
"subtotal": "number",
"tax_amount": "number",
"withholding_amount": "number",
"total": "number (required)",
"summary": "string|null",
"payment_terms": "string|null"
},
"taxes": [{ "name": "string", "rate": "number", "base": "number", "amount": "number" }],
"lines": [{
"description": "string",
"quantity": "number",
"unit_price": "number",
"discount": "number",
"tax_rate": "number",
"line_total": "number",
"sku": "string|null"
}],
"confidence": {
"supplier_name": "0.0-1.0",
"supplier_tax_id": "0.0-1.0",
"invoice_number": "0.0-1.0",
"issue_date": "0.0-1.0",
"total": "0.0-1.0",
"lines": "0.0-1.0"
},
"warnings": ["string"]
}
SchemaValidator::validate() checks and normalizes:
invoice.number, invoice.issue_date, invoice.total| Input Format | Normalized To |
|---|---|
31/12/2024, 12-31-2024, 31.12.2024 | 2024-12-31 |
1.234,56 (European) | 1234.56 |
1,234.56 (US) | 1234.56 |
€, $, £, ¥ | EUR, USD, GBP, JPY |
subtotal + tax_amount - withholding_amount ≈ total (tolerance for rounding)Per-field confidence (0.0 to 1.0):
Display rules: see usability-accessibility skill for visual requirements.
eval, no SQL from extracted data)json_decode with error checking, reject non-JSON responsesrequest_timeoutAiScanLog only when enabledLib/Provider/ProviderInterface.php — Provider contractLib/Provider/OpenAIProvider.php — OpenAI implementationLib/Provider/GeminiProvider.php — Gemini implementationLib/Provider/MistralProvider.php — Mistral implementationLib/Provider/OpenAICompatibleProvider.php — Custom endpointLib/ExtractionService.php — Extraction orchestrationLib/SchemaValidator.php — Schema validation and normalizationLib/AiScanSettings.php — Settings with provider defaults