Structured data research: search sources, extract structured data, archive raw sources, maintain canonical tracker pages, deduplicate. Parameterized via YAML recipes for investor updates, donations, company updates, or any email-to-structured-data pipeline.
Structured research pipeline: search sources, extract structured data, archive raw, deduplicate, update canonical trackers, backlink entities.
One skill for any email-to-structured-data pipeline. The only differences between tracking investor updates, expenses, and company metrics are the search queries, extraction schemas, and tracker page format. All three use the same 7-phase pipeline with parameterized recipes.
Ask the user what they want to track. Either:
Recipes are YAML files at ~/.gbrain/recipes/{name}.yaml. Use gbrain research init
to scaffold a new one.
Brain first (maybe we already have this data). Then:
Deterministic first (regex patterns from recipe), LLM fallback. Log every LLM fallback for future regex improvement (fail-improve loop). Skip marketing, newsletters, noise based on recipe's classification rules.
EXTRACTION INTEGRITY RULE:
This prevents a known hallucination bug where batch-processed amounts were 13/13 wrong from LLM working memory while saved files were correct.
put_raw_data for email bodies, API responsesfile_upload for PDF attachments, documents.redirect.yaml pointers for large files in storageBefore adding to tracker:
Three example recipes ship with GBrain (see ~/.gbrain/recipes/):
Brain page at the recipe's tracker_page path with markdown tables:
### 2026
| Date | Company | MRR | ARR | Growth | Status |
|------|---------|-----|-----|--------|--------|
| 2026-04-01 | Example Co | $188K | $2.3M | +14.7% MoM | [Source](link) |
Each entry links to its raw source. Running totals at the bottom of each section.
References skills/conventions/quality.md for citation and back-linking rules.