Extract and process data from claim forms (expense claims, insurance claims, medical claims, staff reimbursements, petty cash claims). Reads PDF/Excel/image claim forms, extracts claimant details, line items, amounts, and outputs structured Excel summaries. Use when the user mentions claim forms, expense claims, reimbursement claims, claim extraction, or needs to process/digitize claim documents.
Skill for extracting and processing claim forms in an accounting context. Handles expense claims, insurance claims, medical claims, staff reimbursements, petty cash claims, and mileage claims.
pip install pdfplumber openpyxl python-docx tabulate
/claim-form-extraction <filepath.pdf>)import pdfplumber
import re
def extract_claim_pdf(filepath):
with pdfplumber.open(filepath) as pdf:
full_text = ""
all_tables = []
for page in pdf.pages:
text = page.extract_text()
if text:
full_text += text + "\n"
tables = page.extract_tables()
all_tables.extend(tables)
claim_data = {
"claimant_name": extract_field(full_text, r"(?:Name|Claimant|Employee)\s*[:\-]?\s*(.+)"),
"employee_id": extract_field(full_text, r"(?:Employee\s*(?:ID|No|Number)|Staff\s*(?:ID|No))\s*[:\-]?\s*(\S+)"),
"department": extract_field(full_text, r"(?:Department|Dept|Division)\s*[:\-]?\s*(.+)"),
"claim_date": extract_field(full_text, r"(?:Date|Claim\s*Date|Submission\s*Date)\s*[:\-]?\s*([\d/\-\.]+)"),
"claim_number": extract_field(full_text, r"(?:Claim\s*(?:No|Number|Ref)|Reference)\s*[:\-]?\s*(\S+)"),
"claim_type": extract_field(full_text, r"(?:Type|Category|Claim\s*Type)\s*[:\-]?\s*(.+)"),
"claim_period": extract_field(full_text, r"(?:Period|Claim\s*Period|For\s*(?:the\s*)?(?:Month|Period))\s*[:\-]?\s*(.+)"),
}
line_items = extract_line_items(all_tables, full_text)
total_pattern = r"(?:Total|Grand\s*Total|Amount\s*Claimed|Total\s*Claim)\s*[:\-]?\s*(?:RM\s*)?([\d,]+\.?\d*)"
total_match = re.search(total_pattern, full_text, re.IGNORECASE)
claim_data["total_claimed"] = float(total_match.group(1).replace(",", "")) if total_match else None
claim_data["approved_by"] = extract_field(full_text, r"(?:Approved\s*[Bb]y|Authorised\s*[Bb]y|HOD)\s*[:\-]?\s*(.+)")
claim_data["approval_date"] = extract_field(full_text, r"(?:Approval\s*Date|Date\s*Approved)\s*[:\-]?\s*([\d/\-\.]+)")
return {"header": claim_data, "line_items": line_items}
/claim-form-extraction <filepath.xlsx>)Extract claim details from Excel-based claim forms using openpyxl with pattern-matching for label-value pairs and tabular line items.
/claim-form-extraction batch <folder>)Process multiple claim forms from a folder and consolidate into a single summary.
/claim-form-extraction summary)Output extracted claims as a professionally formatted Excel workbook with:
| Claim Type | Typical Fields | Notes |
|---|---|---|
| Expense Claim | Date, description, category, amount, receipt no. | Most common; meals, transport, supplies |
| Mileage Claim | Date, from/to, distance (km), rate, amount | Rate varies by company |
| Petty Cash Claim | Date, description, voucher no., amount | Usually small amounts with cash receipts |
| Medical Claim | Date, clinic/hospital, treatment, amount | May have annual limits |
| Insurance Claim | Policy no., incident date, description, amount | Often multi-page with supporting docs |
| Overtime Claim | Date, hours, rate, description | May need manager approval |