Continuous bootstrapping pipeline for the Flavors vault. Discovers unresolved mentions, creates missing notes via web-research, repairs links, and runs quality analysis — looping until convergence. Trigger phrases: bootstrap flavors vault, fill missing flavor notes, run flavor pipeline, discover and fill flavor stubs.
Runs the deterministic bootstrap orchestrator (bootstrap_round.py) over the Flavors vault. One round snapshots the vault, resolves all link violations and mention candidates, compiles note standards, reconciles duplicates, and optionally launches LLM fill agents from a validated manifest. Rounds repeat until all blocking queues reach zero.
Agent kickoff target: point orchestration agents at this skill first. The default handoff is:
flavor-bootstrapprepare_fill_batches.pybuild_fill_wave.py --wave <next-wave>_system/current-fill-wave.jsonAll commands run from the Flavors vault root: /home/mango/workspace/Obsidian/Flavors
# Phase 0: analyze unresolved link targets before the next round
PYTHONPATH=. python3.11 flavor_graph/analyze_link_violations.py
# After reviewing _system/link-violation-analysis.md, append safe aliases and
# rerun deterministic passes:
PYTHONPATH=. python3.11 flavor_graph/analyze_link_violations.py --apply-aliases
# Build prioritized fill batches from the current manifest:
PYTHONPATH=. python3.11 flavor_graph/prepare_fill_batches.py
# Build one concrete operator packet for the next fill wave:
PYTHONPATH=. python3.11 flavor_graph/build_fill_wave.py --wave high_ref
# Inspect research cache status:
PYTHONPATH=. python3.11 flavor_graph/research_cache.py stats
# Read a cached research payload for one title:
PYTHONPATH=. python3.11 flavor_graph/research_cache.py get --title "Soda water"
# Run one bootstrap round (deterministic passes only):
cd /home/mango/workspace/Obsidian/Flavors
PYTHONPATH=. python3.11 flavor_graph/bootstrap_round.py
# Run until convergence (up to 5 rounds):
PYTHONPATH=. python3.11 flavor_graph/bootstrap_round.py --until-convergence
# Run with all auto-fixes disabled (audit only):
PYTHONPATH=. python3.11 flavor_graph/bootstrap_round.py --no-fix-links --no-fix-schema
# Inspect current quality state without modifying anything:
PYTHONPATH=. python3.11 run_checks.py --note-quality --tier-report
# Review the unresolved-link analysis artifact before fill waves:
python3.11 -c "import json; print(json.dumps(json.load(open('_system/link-violation-analysis.json')), indent=2)[:4000])"
# View what's in the create-manifest (notes ready for LLM fill):
python3.11 -c "import json; print(json.dumps(json.load(open('_system/bootstrap-create-manifest.json')), indent=2)[:4000])"
# View the manual review queue:
python3.11 -c "import json; print(json.dumps(json.load(open('_system/bootstrap-manual-review.json')), indent=2)[:4000])"
Fill is a manual LLM operator step. After running the deterministic round, read the create-manifest and launch fill agents using the protocol below. The orchestrator does not launch fill agents itself.
Run this once before the next fill cycle whenever unresolved links spike or the create-manifest looks suspiciously small.
PYTHONPATH=. python3.11 flavor_graph/analyze_link_violations.py_system/link-violation-analysis.mdSafe Alias Proposals section looks correctPYTHONPATH=. python3.11 flavor_graph/analyze_link_violations.py --apply-aliasesPYTHONPATH=. python3.11 run_checks.py --bootstrap-roundPYTHONPATH=. python3.11 flavor_graph/prepare_fill_batches.pyPYTHONPATH=. python3.11 flavor_graph/build_fill_wave.py --wave <generic_parents|high_ref|medium_ref|low_ref> [--offset N] [--limit N]Interpret the analysis buckets like this:
Reference/Ingredient aliases.md; rerun deterministic passes immediately[[Soda water]], [[Port wine]], etc.)This phase should happen before large LLM fill waves. Otherwise brand products can be hidden in duplicate buckets and garnish noise can pollute the manifest.
Before launching any research subagents, build a concrete wave packet. This removes repeated manual manifest triage and prompt assembly.
PYTHONPATH=. python3.11 flavor_graph/prepare_fill_batches.pyPYTHONPATH=. python3.11 flavor_graph/build_fill_wave.py --wave high_ref_system/current-fill-wave.mdresearch-free subagent per entry in _system/current-fill-wave.jsonRecommended default packet sizes:
generic_parents: 8high_ref: 8medium_ref: 10low_ref: 12The current wave packet includes:
prompt_stub text for the research subagentThis is the default batching workflow for all fill work.
Before launching workers for a packet, check _system/current-fill-wave.json.
Each entry now includes:
cache.status — fresh, stale, or missingorchestrator_action — reuse_cached_research or launch_workerskill_chain — ordered free → budget → paid fallback path using the skills available in this OpenCode instanceCache policy:
cache.status == fresh, reuse the cached payload and skip web researchcache.status == stale, optionally reuse as a draft but prefer a fresh worker passcache.status == missing, launch the worker specified by research_planPersist returned research with:
PYTHONPATH=. python3.11 flavor_graph/research_cache.py put \
--title "Ingredient Name" \
--skill research-free \
--mode quick \
--worker-tier small \
--payload-json '{"wiki_link":"...","summary_facts":["..."],"flavor_descriptors":["..."],"components":["..."]}'
The orchestrator should treat the cache as the default reuse layer between rounds.
Use the main bootstrap model only for orchestration:
question tool for human tail decisionsPush the note-level nitty gritty down to smaller or cheaper workers.
Default split:
research-free quick/standard for most generic parents, syrups, bitters, fruit notes, and common product notesresearch-cheap standard for nuanced producer-specific spirits, amaros, vermouths, sherries, and liqueursresearch-premium only when medium still cannot surface usable sensory differentiation or trustworthy sourcesCost policy for this instance:
research-free uses Big Pickle and is freeresearch-cheap uses DeepSeek Chat and is budget-tierresearch-premium uses Zen models and is not freeSo the orchestrator should always prefer:
research-freeresearch-cheapresearch-premiumThe wave packet now carries this recommendation per entry as research_plan so the orchestrator does not need to reason about model size every time.
Each round runs these steps in order:
Snapshot — Parse the entire vault once into a shared VaultSnapshot. Captures per-note: path, type, aliases, section map, all wikilinks with section context, prose text. Emits _system/vault-snapshot.json.
Registry — Build a typed canonical resolver from snapshot data + synonym files. Handles case, punctuation, hyphen, plural/singular normalisation, and expected-type filtering by section context.
Link integrity — Classify every link and prose mention: canonical valid / non-canonical (auto-fix) / wrong-type (blocked) / unlinked mention (auto-add) / unresolved (queued) / ambiguous (manual review). Auto-fix scope: prefix cleanup, .md suffix cleanup, case normalisation, alias rewrites, prose-link insertion. Emits _system/link-integrity-report.json.
Candidate pipeline — For each unresolved mention, classify as: existing canonical note → link fix only; alias → link rewrite only; duplicate candidate → reconciler; valid variant → keep both; true new note → create-manifest; stoplist/noise → discard; ambiguous → manual review. No empty stub files are created at any point. Emits _system/bootstrap-candidates.json, _system/bootstrap-create-manifest.json, _system/bootstrap-manual-review.json.
Duplicate reconciler — Classify duplicate groups, choose canonical deterministically (highest content, deepest folder, populated status preferred), migrate aliases, rewrite all inbound links vault-wide, delete duplicates. Runs before the note compiler so deleted duplicates don't inflate the failure count. Variants kept only when taxonomy + prose clearly separate them; ambiguous pairs go to manual review. Emits _system/duplicate-reconciler-report.json.
Note standards compiler — Full-vault structural lint: type/folder correctness, required section presence by type, forbidden structure by type (e.g. flavor notes must have no frontmatter), Wiki link: presence. Auto-fix scope: creates missing frontmatter, repairs wrong type/status/tags fields, inserts missing *Recorded on* datestamp. Does NOT perform link canonicalization — that is B3's exclusive job. Emits _system/note-compiler-report.json.
Convergence check — Evaluate all blocking queues. Print round summary. If converged, stop. If not, report remaining counts and surface the create-manifest for operator-driven fill.
Fill is a manual operator step that happens between deterministic rounds. After the round completes, if bootstrap-create-manifest.json is non-empty, the operator reads it and launches fill agents per the Fill Agent Protocol below. After fill, re-run bootstrap_round.py to validate and detect any new violations introduced by the filled notes.
The pipeline has converged when all blocking queues are zero:
| Queue | Convergence value |
|---|---|
link_violations.unresolved | == 0 |
link_violations.wrong_type | == 0 |
mention_candidates.unresolved | == 0 |
note_compiler.failures | == 0 |
duplicate_groups.auto_resolvable | == 0 |
Acceptable non-zero: manual_review.count — surfaced to the user, does not block convergence.
_system/bootstrap-manual-review.json accumulates items that the orchestrator cannot resolve deterministically:
To action manual review items:
_system/bootstrap-manual-review.jsonbootstrap_round.py — the resolved item will no longer appearSurface all open manual review items to the user after each run.
Human-in-the-loop requirement:
When the queue is small enough to review directly, the agent must use the question tool to ask the human what to do with each remaining tail item. Present exactly these choices:
This question-tool round is mandatory before closing a session with remaining tail items.
When bootstrap_round.py --fill runs, it reads the create-manifest and launches fill agents. Each agent receives a typed batch of manifest entries and must follow this protocol:
Batch sizes (ceiling per agent):
For each manifest entry:
Book search first (ingredients and compounds — mandatory).
Before any web research, check what the reference library says. Use book_search MCP directly:
book_search(query="<name> flavor taste aroma", limit=6) — all booksbook_search(query="<name> smells aroma", book_id="nose_dive", limit=4)book_id="smugglers_cove" and book_id="cocktail_codex"book_id="liquid_intelligence"book_get_page(book_id, page, context_pages=1) for the full passage.[Book Title, p.N]: ...) in the # Notes section.Web research. Start with the research-free skill. Escalate only if it is too thin.
research-free quick mode (3 searchers)research-free standard moderesearch-cheap only when the free pass returns no usable sensory detail or no trustworthy product coverageresearch-premium only for stubborn high-value items (complex amaros, obscure fortified wines, hard-to-source producer notes)A result is too thin when it fails any of these:
Wiki link:Write the note. Follow the type-specific fill prompt exactly. Sensory-first prose. Inline wikilinks throughout. Use obsidian_write_note on the path specified in the manifest entry.
Validate immediately. After writing each note, run the per-note schema validator:
PYTHONPATH=. python3.11 run_checks.py --note-standards --paths "Flavors/<relative-path>"
Then manually verify: correct type in frontmatter, *Recorded on YYYY-MM-DD* line present (ingredient/compound only), required sections present, no frontmatter on flavor notes. Do not run the full bootstrap round per-note — --note-standards is a single-note schema check only. Fix any reported issues before moving to the next note.
Log. Append one activity log entry to Flavors/_system/llm-activity-log.md covering the full batch. Log any fill failures to Flavors/_system/bootstrap-issues.md.
Type-specific fill invariants:
*Recorded on* line, no anchor sections. Only valid format is ## Flavor description: followed by a paragraph. Read Flavors/Flavors/Minty Flavor.md as the canonical example.The B4 candidate pipeline classifies every unresolved mention before any note is created. Only true new notes reach the create-manifest:
When the manifest contains many entries, fill in this order:
Practical default order now:
generic_parentshigh_refmedium_reflow_refSignals that increase priority:
When the quality gate re-queues a note, do not blindly regenerate it.
Instead:
Typical upgrade moves:
# Classification# Clone Recipe / # Composition for composed ingredients (bitters, syrups, liqueurs, tinctures) — this link is the ingredient's embedding source; the pipeline computes the ingredient's vector from the linked recipe by weighted composition, identical to cocktail embedding; tasting notes on the ingredient note are documentation onlyUse these as fill quality calibration — do not copy wording from them:
Flavors/Ingredients/Bitter Ingredients/Angostura bitters.mdFlavors/Ingredients/Liqueur Ingredients/Pierre Ferrand Dry Curaçao.mdFlavors/Ingredients/Spirit Ingredients/Plymouth gin.mdFlavors/Ingredients/Liqueur Ingredients/Bénédictine.mdFlavors/Ingredients/Amaro.mdAfter each fill batch:
run_checks.py --note-standards --paths ....PYTHONPATH=. python3.11 run_checks.py --link-integrity --fix if the batch introduced or rewrote links broadly.## Flavor profile and # Components are not padded with low-value links.# Clone Recipe / # Composition linkage when applicable — the linked recipe must have a populated ingredient list with ratios for the embedding to be correct.Flavors/_system/stub-index.md.Flavors/_system/llm-activity-log.md.Print after each round:
=== Flavor Bootstrap — Round N Complete ===
Link Integrity
Unresolved violations: A (blocking)
Wrong-type violations: B (blocking)
Auto-fixed: C
Candidate Pipeline
Unresolved mentions: D (blocking)
Added to create-manifest: E
Added to manual review: F
Note Standards
Failures: G (blocking)
Auto-fixed: H
Duplicate Reconciler
Auto-resolvable groups: I (blocking)
Deleted duplicates: J
Added to manual review: K
Create Manifest (for next operator fill wave)
New notes queued: L (ingredients: l1, compounds: l2, flavors: l3)
Manual Review Queue
Open items: N (non-blocking — requires human decision)
Converged: YES / NO
Remaining blocking total: A + B + D + G + I
Next action: [none — converged] | [run fill wave from create-manifest, then re-run] | [action manual review items, then re-run]
Append to Flavors/_system/bootstrap-issues.md whenever you encounter:
| Category | Examples |
|---|---|
false_positive | Candidate that is not a real ingredient/compound/flavor |
resolver_miss | Name that should have resolved via alias but came back unresolved |
fill_quality | Web research returned thin content; fill prompt produced malformed note |
link_repair | Compiler missed a bidirectional edge; prose mention not caught |
synonym_gap | Common synonym not in compound-synonyms.md; unnecessary stub created |
schema | Unexpected note structure broke a compiler pass |
convergence | Same violations reappearing across rounds; quality not improving |
Entry format:
## YYYY-MM-DD Round N — <short title>
- **Category:** <category>
- **Subject:** <note name, script, or "general">
- **Observed:** <one or two sentences>
- **Impact:** <notes or links affected>
- **Suggested fix:** <concrete action>
- **Status:** open | fixed | wont-fix
After the final convergence round, scan for open entries with low-effort fixes (synonym row, stoplist entry) and apply them inline before finishing. Surface remaining open entries to the user.
Navigation/ pages.llm-activity-log.md at the end of any round that modifies content.bootstrap-issues.md as you encounter them — do not batch at end.| File | Purpose |
|---|---|
Flavors/flavor_graph/bootstrap_round.py | B7 — round orchestrator (primary entrypoint) |
Flavors/flavor_graph/vault_snapshot.py | B1 — shared vault parse pass |
Flavors/flavor_graph/node_registry.py | B2 — type-scoped canonical resolver |
Flavors/flavor_graph/link_integrity.py | B3 — section-aware link compiler |
Flavors/flavor_graph/candidate_pipeline.py | B4 — no-stub candidate classification |
Flavors/flavor_graph/validate_note_standards.py | B5 — full-vault note standards compiler |
Flavors/flavor_graph/duplicate_reconciler.py | B6 — duplicate detection and deletion |
Flavors/Reference/How to add new ingredients.md | Entry point — note types, both workflows, wikilink rules |
Flavors/Reference/Ingredient fillout prompt.md | Ingredient fill template and rules |
Flavors/Reference/Compound fillout prompt.md | Compound fill template and rules |
Flavors/Reference/Flavor fillout prompt.md | Flavor fill template and rules |
Flavors/Reference/Recipe fillout prompt.md | Ingredient-defining recipe fill template (clone recipes, liqueurs, bitters, syrups) |
Flavors/Reference/Cocktail recipe fillout prompt.md | Cocktail recipe fill template |
Flavors/Reference/compound-synonyms.md | IUPAC↔common name synonym table |
Flavors/_system/bootstrap-create-manifest.json | Validated new notes ready for LLM agents (B4 output) |
Flavors/_system/bootstrap-candidates.json | All candidates with classification (B4 output) |
Flavors/_system/bootstrap-manual-review.json | Ambiguous cases requiring human decision (B4 output) |
Flavors/_system/bootstrap-round-metrics.json | Per-round convergence metrics (B7 output) |
Flavors/_system/link-integrity-report.json | Link violations (B3 output) |
Flavors/_system/note-compiler-report.json | Note schema failures (B5 output) |
Flavors/_system/duplicate-reconciler-report.json | Duplicate resolution log (B6 output) |
Flavors/_system/vault-snapshot.json | Shared vault parse (B1 output) |
Flavors/_system/llm-activity-log.md | Session audit log |
Flavors/_system/bootstrap-issues.md | Pipeline issues and observability log |
Projects/Areas/Build/Code/Flavors/Roadmap.md | Authoritative design philosophy and milestone status |
Flavors/CLAUDE.md | Vault invariants and authoring rules |
Location: This skill lives at
Flavors/.opencode/skills/flavor-bootstrap/SKILL.md. To invoke from the main workspace:use_skill("flavor-bootstrap").