Extract metadata from bank-account and credit-card statement PDFs, output reviewable CSV/XLSX indexes, and build month-by-month compliance matrices for discovery productions. Use when Codex needs to process statement folders, capture issuer/account/date/Bates metadata from the first page with limited second-page fallback, pause for human cleanup of the exported CSVs, and then generate bank and credit-card compliance matrices for a specified year range.
Use this skill to turn statement productions into two deliverables:
Use scripts/run_matter.py as the main entry point. Keep the workflow in two phases so a paralegal can clean the extracted CSVs before matrix generation.
python <skill-dir>\scripts\run_matter.py --phase extract --bank-folder "<bank-folder>" --credit-folder "<credit-folder>" --output-dir "<output-folder>" --bates-regex "<matter-regex>"
This creates:
<output-folder>\bank\bank_statement_index.csv<output-folder>\bank\bank_statement_index.xlsx<output-folder>\credit\credit_card_statement_index.csv<output-folder>\credit\credit_card_statement_index.xlsxpython <skill-dir>\scripts\run_matter.py --phase matrices --output-dir "<output-folder>" --year-start 2020 --year-end 2026
This creates:
<output-folder>\bank_compliance_matrices.md<output-folder>\credit_card_compliance_matrices.md<output-folder>\compliance_matrices.xlsx--phase all runs extraction and then stops with a reminder to clean the CSVs before matrix generation. It does not auto-build matrices.
Bank extraction uses:
scripts/extract_bank_statements.pyCredit-card extraction uses:
scripts/extract_credit_card_statements.pyBoth extractors:
ocr_image.ps1 when text extraction is insufficient--bates-regexTreat the cleaned CSVs as the source of truth for compliance matrices.
Conservative cleanup is appropriate:
Account Holder(s) Name(s)Do not normalize across accounts unless the relationship is unambiguous.
Matrix generation uses:
scripts/generate_compliance_matrices.pyThe matrix builder:
Expect the user to provide:
If the user provides only one statement folder type, run only that extraction piece.
If imports fail, install:
python -m pip install pypdf openpyxl pymupdf pillow
These scripts rely on Windows PowerShell for OCR fallback.
scripts/run_matter.pyUse as the main workflow entry point for paralegals.
scripts/extract_bank_statements.pyUse for bank-account productions.
scripts/extract_credit_card_statements.pyUse for credit-card productions.
scripts/generate_compliance_matrices.pyUse after CSV review to generate markdown and Excel matrices.
scripts/ocr_image.ps1Use only as the OCR helper called by the Python extractors.