CSV transaction parsing, categorization, and tax deduction identification for financial planning
Parse bank and credit card CSV exports, categorize transactions, and flag tax-relevant items.
Most bank exports follow common formats. Detect columns automatically:
| Bank Style | Date Column | Description | Amount | Notes |
|---|---|---|---|---|
| Chase | "Transaction Date" | "Description" | "Amount" | Negative = charge |
| Amex | "Date" | "Description" | "Amount" | Positive = charge |
| Generic | First date-like col | Longest string col | Numeric col | Varies |
Assign each transaction to exactly one category:
Use substring matching on description (case-insensitive):
# Housing
mortgage, rent, hoa, property tax, homeowner
# Utilities
electric, power, gas company, water, internet, comcast, att, verizon, tmobile
# Transportation
shell, exxon, chevron, bp, uber, lyft, parking, toll
# Food
grocery, kroger, heb, walmart grocer, whole foods, restaurant, mcdonald, starbucks, doordash, grubhub
# Medical
pharmacy, cvs, walgreens, doctor, hospital, dental, optom, therapy, labcorp
# Business:Software
aws, azure, google cloud, digitalocean, github, cloudflare, namecheap
# Charitable
donation, charity, united way, red cross, church, tithe
Flag transactions with tax relevance:
| Flag | Criteria | Tax Form |
|---|---|---|
deductible | Charitable donations, state/local taxes | Schedule A |
business_expense | Any Business:* category | Schedule C |
hsa | HSA contributions or qualified medical | Form 8889 |
education | Tuition, student loan interest | Form 8863/1098-E |
home_office | Internet, utilities (pro-rated if WFH) | Form 8829 |
estimated_tax | IRS/state tax payments | Form 1040-ES |
python3 scripts/categorize.py transactions.csv --output categorized.json
python3 scripts/categorize.py transactions.csv --output categorized.json --year 2025
{
"summary": {
"total_income": 85000.00,
"total_expenses": 42000.00,
"by_category": {"Food": -5200.00, "Housing": -18000.00},
"tax_flags": {"deductible": -3200.00, "business_expense": -8500.00}
},
"transactions": [
{
"date": "2025-01-15",
"description": "STARBUCKS #1234",
"amount": -5.75,
"category": "Food",
"tax_flags": [],
"confidence": "high"
}
],
"uncategorized": []
}
uncategorized list — add keywords or manually assignbusiness_expense flags — ensure legitimacydeductible totals with tax softwarefinance/paperless-ops)--split flag to separate by cardholder if supported