extract structured contact data from one or many business card images and prepare a review-friendly csv. use when a user uploads multiple business card photos, wants batch processing across a folder of images, needs field normalization, or asks for a csv that humans can verify and correct before import.
Batch-process business card images and return a CSV that is optimized for manual review.
Prefer a flat, deterministic extraction workflow over prose. The primary deliverable is a CSV. A short preview table or summary is optional.
Always keep this column order:
source_file,record_id,name,company,title,mobile,phone,phone_ext,fax,email,address,website,tax_id,country,language_detected,confidence,review_status,notes
The CSV is often opened in Microsoft Excel, which has two common pitfalls:
.csv file to disk, output UTF-8 with BOM (a.k.a. utf-8-sig). Excel on Windows commonly relies on the BOM to auto-detect UTF-8.=, +, -, or @ as formulas. This can happen for phone numbers like +886....') in the cell value:
mobile, phone, phone_ext, fax, tax_id' at export time. Do not otherwise change the printed value; keep punctuation/spacing as printed., as delimiter and always include the header row.", escape it per CSV rules.\n is fine). Ensure there is a trailing newline at EOF.source_file: original filename if available; otherwise use a stable label such as card_001.record_id: deterministic row id such as BC-0001, BC-0002.name: person name only. Do not include honorifics unless printed as part of the name.company: company or organization.title: job title / department.mobile: mobile/cell number only.phone: main office line only.phone_ext: extension only, without labels like ext or 分機.fax: fax number only.email: lowercase when safe to normalize.address: full mailing address in original language.website: website/domain as printed.tax_id: business registration / tax id / 統編 if present.country: fill only if explicit or strongly implied by the address/phone format; otherwise leave blank.language_detected: zh-Hant, zh-Hans, en, ja, ko, mixed, or another short label that best matches the card.confidence: high, medium, or low for the overall row.review_status: default to needs_review for every row.notes: record uncertainty, ambiguity, duplicate values, partially visible text, or OCR-like issues.phone, phone_ext, mobile, and fax.phone, additional ambiguity in notes, and lower confidence.source_file or notes.notes rather than guessing.The CSV is intended for human validation and correction.
review_status to needs_review.notes to explain exactly what needs checking, for example:
name unclearpossible mobile/phone swapemail partially obscuredcard rotated and low contrastconfidence=low whenever key identity fields are uncertain.csv block.uncertain company suffixwebsite inferred from printed domain spacingfax label present but number partially cut offbilingual card; english title chosen as printed