Pull course sources from Notion and Box Drive, fetch or ingest web and Playwright-captured HTML, extract text from local PDFs, and emit agent-ready source bundles (extracted.md + metadata) for Marcus and specialists.
DEPRECATED: This skill has been superseded by Texas (
skills/bmad-agent-texas/), a full memory agent with extraction validation, cross-validation, fallback chains, and a Marcus delegation contract. The scripts in this directory are retained for backward compatibility but will be removed once all callers are migrated to Texas.
Consolidates reference material into a single consumable bundle so Irene, Gary, and other agents do not depend on copy-paste or stale context. Supports:
NotionClient (NOTION_API_KEY)BOX_DRIVE_PATH) file listing and text readsextract_pdf_text() / wrangle_local_pdf() via pypdf (text-native PDFs; scanned/OCR out of scope unless added later)fetch_url / summarize_url_for_envelope on https://gamma.app/docs/... links. They return Cloudflare interstitials and are the wrong integration surface.bmad-agent-gamma + gamma-api-mastery / Gamma MCP): export PDF or PNG, then wrangle_local_pdf() on the export, or Playwright save + wrangle_playwright_saved_html().GammaDocsURLNotSupportedError with remediation text.require_local_source_files([...]) or check verify_local_source_paths() so missing SME PDFs fail loudly (no silent empty bundles).course-content/.This is a skill, not a dedicated agent. Marcus invokes it early in a run; specialists receive extracted.md paths in their envelopes.
NotionClient.append_paragraphs| Path | Purpose |
|---|---|
./references/bundle-format.md | Output layout (extracted.md, metadata.json, raw/) |
./references/playwright-assisted-capture.md | MCP → save HTML → wrangle_playwright_saved_html |
./references/notion-and-box.md | Env vars and conventions |
./scripts/source_wrangler_operations.py | HTML + PDF text extraction, URL fetch (with Gamma guard), bundle IO, Box listing, local preflight |
scripts/api_clients/notion_client.py | Notion API |
.env holds NOTION_API_KEY, BOX_DRIVE_PATHcourse-content/staging/source-bundles/{run_or_slug}/course-content/staging/ad-hoc/source-bundles/{run_or_slug}/ so ingest artifacts stay in the scratch tree (bmad-agent-marcus → references/mode-management.md)mode_state.json itself — Marcus (or the invoking agent) chooses output_dir for write_source_bundle() from the active modalityresources/exemplars/ only when human promotes a capture to a named exemplarMarcus adds to context envelopes:
source_bundle_path or explicit path to extracted.mdprovenance summary from metadata.json for transparencyIrene/Gary use content as supplemental input_text / user_constraints / additionalInstructions — not a substitute for the slide brief contract.
Edit PDFs with natural-language instructions using the nano-pdf CLI.