Guide users through running the message routing sub-pattern discovery analysis using data_probe. Use when the user wants to analyze support ticket messages, discover sub-patterns, run the message routing pipeline, configure a new analysis run, or interpret analysis results. Covers config setup, running the script, and interpreting outputs.
The input CSV and output CSVs contain real member messages (PHI).
classifications.csv, glp1_classifications.csv, glp1_discovery_classifications.csv)sub_pattern_report.md, string_matching_report.md, discoveries.json, validation.json, glp1_summary.json, glp1_discovery_batches.json, glp1_raw_batches.json — these contain only aggregated data (no PHI)Discovers sub-patterns within each ticket category (e.g., "35% of Account/Billing is cancellation requests") by:
The analysis needs omada_llm_utils access. Ensure .envrc.private exists in the data_probe project root:
export ENV=staging
export STAGING_LLM_GATEWAY_ACCESS_ID=your_access_id
export STAGING_LLM_GATEWAY_HMAC_KEY=your_hmac_key
Then run direnv allow or source .envrc.private before executing.
The input CSV must have these columns (configured in the YAML):
| Column | Role |
|---|---|
member_message_id | Row identifier |
ticket_subject | Category grouping key |
ticket_category | support or safety |
member_message_text | Primary text for LLM |
coach_note_text | Secondary text (coach's context) |
The user runs these commands themselves in their terminal. Do not execute these.
cd /Users/jonathan.wrobel/workspace/data_probe
source .envrc.private
python scripts/run_message_routing.py --name "my_run_name" --top-n 4
cd /Users/jonathan.wrobel/workspace/data_probe
source .envrc.private
# Step 1: Classify all messages as GLP-1 yes/no/possibly
python scripts/run_message_routing.py --mode classify-glp1 --name "glp1_v1"
# Step 2: Discover sub-issue types from "yes" messages (samples 300)
python scripts/run_message_routing.py --mode discover-glp1 --from-run "glp1_v1" --name "glp1_subs"
# Test run (limit to 100 messages)
python scripts/run_message_routing.py --mode classify-glp1 --name "test" --limit 100
| Flag | Purpose | Example |
|---|---|---|
--mode | discover, classify-glp1, or discover-glp1 | --mode classify-glp1 |
--name | Label the run (output subfolder name) | --name "glp1_v1" |
--top-n N | Analyze only top N categories by priority | --top-n 4 |
--categories "Cat1" "Cat2" | Analyze specific categories | --categories "Scale - Coach Escalation" |
--batch-size N | Messages per LLM call (default: 25) | --batch-size 15 |
--limit N | Limit total messages processed (test runs) | --limit 100 |
--from-run NAME | For discover-glp1: source classify-glp1 run | --from-run "glp1_v1" |
--sample N | For discover-glp1: sample size (default: 300) | --sample 300 |
--skip-validation | Skip heuristic keyword testing (discover mode only) | --skip-validation |
--config PATH | Use alternate config file | --config configs/custom.yaml |
Edit configs/message_routing.yaml to change:
input.base_path and input.files — point to the actual data CSVllm.target_categories — reorder or filter categoriesllm.batch_size — adjust for token limitsoutput.directory — base output path--mode discover)output/message_routing/<run_name>/
├── sub_pattern_report.md # SME-facing (safe to read)
├── string_matching_report.md # Engineering-facing (safe to read)
├── classifications.csv # Per-message detail (PHI — DO NOT READ)
├── discoveries.json # Raw LLM discovery data (safe to read)
└── validation.json # Keyword precision/recall data (safe to read)
--mode classify-glp1)output/message_routing/<run_name>/
├── glp1_classifications.csv # Per-message with message text (PHI — DO NOT READ)
├── glp1_summary.json # Aggregate counts and percentages (safe to read)
├── glp1_raw_batches.json # Raw LLM batch results (safe to read)
└── checkpoints/glp1_classification/ # Per-batch checkpoints for resume
--mode discover-glp1)output/message_routing/<run_name>/
├── glp1_discovery_classifications.csv # Per-message with sub-category (PHI — DO NOT READ)
├── glp1_discovery_batches.json # Raw LLM discovery results (safe to read)
└── checkpoints/glp1_discovery/ # Per-batch checkpoints for resume
The sub_pattern_report.md has a table per category:
| # | Sub-Pattern | Count | % of Category | Description | SME Decision |
|---|---|---|---|---|---|
| 1 | Cancellation Request | 1,927 | 35.0% | Member wants to cancel | TBD |
The SME fills in the last column: HOTL (auto-route + auto-respond), HITL (auto-route, coach responds), or Manual.
The string_matching_report.md shows whether sub-patterns can be detected via keyword matching without LLM at runtime. A sub-pattern is "string-matchable" if combined keywords achieve >= 90% precision and >= 80% recall.
data_probe/analyses/message_routing/
├── __init__.py
├── formatters.py # MessageRoutingFormatter — format_items() + format_items_with_category()
├── prompts.py # Discovery, merge, and GLP-1 classification prompts
├── domain_context.py # GLP-1 program context + signal phrases loader
├── glp1_signal_phrases.txt # Signal phrases for GLP-1 detection (one per line)
├── heuristic_validator.py # Keyword precision/recall testing
└── report_generator.py # Reports: SME, string matching, classifications, GLP-1 summary
scripts/run_message_routing.py (three modes: discover, classify-glp1, discover-glp1)configs/message_routing.yamldata_probe.core.llm_client.LLMClient for LLM calls| Issue | Fix |
|---|---|
STAGING_LLM_GATEWAY_ACCESS_ID missing | Create/source .envrc.private |
| LLM timeout on large categories | Reduce --batch-size or split with --categories |
| Output folder already exists | Contents are overwritten; use a new --name |
| 100% precision/recall everywhere | Dataset too small — sub-patterns need overlap to test false positives |