Processes Coles grocery invoices to extract structured data and predict future orders. Use when user uploads/pastes invoice content, asks to analyze grocery purchases, or wants shopping predictions.
Analyze Coles grocery store invoices using Python scripts to convert PDFs, extract structured data, and predict future orders with budget forecasts.
Activate when the user:
Before using the scripts, ensure dependencies are installed:
# Create virtual environment (optional but recommended)
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
Required packages: , ,
pymupdf4llmpandasprophetThe processing pipeline consists of 3 main scripts:
Place Coles invoice PDFs in the input_invoices/ directory.
python 01_convert.py
This converts each PDF in input_invoices/ to a Markdown file in the same folder using pymupdf4llm.
python 03_extract_data.py
Parses the Markdown invoices and extracts:
Output: output_extracted/extracted_data.json
python 04_predict_orders.py
Analyzes purchase history and:
The extraction script parses Markdown looking for:
Invoice Metadata:
**Invoice number:** #123456**Invoice date:** 7 December 2024**Invoice time:** 14:30:00Product Categories:
Categories appear as bold headers (e.g., **Dairy**, **Bakery**, **Meat & Seafood**)
Product Line Items:
Format: [Product Name](link) Ordered Picked UnitPrice TotalPrice
Example:
[Coles Full Cream Milk 3L](https://...) 2 2 $4.65 $9.30
Extracted fields:
{
"filename": "ea[REDACTED]_044712.md",
"invoice_number": "[REDACTED]",
"invoice_date": "7 December 2024",
"invoice_time": "14:30:00",
"categories": [
{
"name": "Dairy",
"items": [
{
"product": "Coles Full Cream Milk 3L",
"weight": "3L",
"link": "https://...",
"ordered": "2",
"picked": "2",
"unit_price": "$4.65",
"total_price": "$9.30"
}
]
}
]
}
Order #1 - Approx Date: 2025-12-15 - Total Est. Cost: $95.50
Product | Qty | Unit $ | Total $
--------------------------------------------------------------------------------
Coles Full Cream Milk 3L... | 2 | $4.65 | $9.30
--- Estimated Monthly Budget ---
2025-December: $785.80
2026-January: $738.55
2026-February: $692.40
ea12345_67890.md become ea[REDACTED]_67890.md