Skill-Datei

Pipeline Orchestrator

Name: Pipeline Orchestrator
Author: EnergySystemsGroup

Single entry point for all Meridian manual funding pipeline operations. Parses any pipeline request, assesses database state, determines where to start in the 7-phase pipeline, spawns Agent Teams for discovery (thoroughness via cross-checking) and Task tool agents for processing (deterministic batch work). Reports progress after each phase with anomaly warnings. Use for any pipeline-related command.

EnergySystemsGroup0 Sterne14.04.2026

Beruf
Kategorien: LLM & AI

Skill-Inhalt

1. Mission

Meridian is a Policy & Funding Intelligence Platform. This manual pipeline discovers funding opportunities (grants, rebates, incentives, tax credits) from sources that don't have APIs — utilities, county governments, state agencies, foundations.

Goal: Find EVERY relevant opportunity with complete specs. Thoroughness is non-negotiable — a missed source means missed opportunities for our sales team.

Business context: Meridian is a GC/ESCO (general contractor / energy services company). Our clients are commercial entities: municipalities, school districts, hospitals, businesses, government agencies. We find funding that lets these clients hire us for construction and installation projects.

Model Selection Rule

All spawned agents MUST use model: "sonnet" EXCEPT the analysis-agent.

Agent Type	Model	Why
source-registry-agent	sonnet	Web search + propose — doesn't need deep reasoning

Verwandte Skills

Pipeline Orchestrator | Skills Pool

#	Phase	Reads	Writes	Execution
1	Source Registry	Web search	`funding_sources` + `source_program_urls`	Agent Team (3 search teammates + cross-check)
2	Program Discovery	`source_program_urls`	`funding_programs`	Two rounds: Scout Team (find URLs) → Extractor Team (extract + store)
3	Opportunity Discovery	`funding_programs` (smart schedule)	`manual_funding_opportunities_staging`	Agent Team (N teammates per source group)
4	Extraction	Staging `extraction_status='pending'`	Staging `extraction_data`, `raw_content`	Task tool: extraction-agent (batches of 20)
5	Analysis	Staging `analysis_status='pending'`	Staging `analysis_data`	Task tool: analysis-agent (batches of 20)
6	Storage	Staging `storage_status='pending'`	`funding_opportunities` (pending_review) + coverage areas	Task tool: storage-agent
7	Review & Publish	`promotion_status='pending_review'`	`promotion_status` flip	Read-only reporter → directs to `/admin/review` UI

User Says	Start Phase	Auto-Chain
"Run pipeline for [STATE] [TYPE]"	1	1→2→3→4→5→6
"Register sources: [STATE] [TYPE]"	1	1 only
"Find all sources in [STATE]"	1	1 only (multi-type — see Section 3.5)
"Discover programs for [X]"	2	2 only (or 2→3→4→5→6 if "and process")
"Find opportunities for [X]"	3	3→4→5→6
"Process staging" / "Run staging"	4	4→5→6
"Extract pending"	4	4 only
"Analyze pending"	5	5 only
"Store pending"	6	6 only
"Review pending" / "Publish approved"	7	7 only
"Check staging status" / "Pipeline status for [X]"	—	Read-only report

Funder Type	When to Include
Utility	Almost always — every state has utilities with rebate programs
County	When scope includes local government or specific counties
Municipality	When scope includes cities/local or specific metro areas
State	When scope is broad ("all sources") or user mentions state agencies
Foundation	When scope is broad or user mentions grants/philanthropy
Tribal	Only if the state has tribal lands/tribal utilities
Federal	Only if user specifically mentions federal or scope is national

User: "Find all sources in Nevada"

TeamCreate(team_name="source-discovery-NV", ...)

Spawn all teammates in parallel:
  → utility-regulatory  (strategies 2+3)   ┐
  → utility-aggregator  (strategies 4+6)   ├ Utility trio
  → utility-direct      (strategies 1+5)   ┘
  → county-direct       (strategies 1+5+7) ┐ County pair
  → county-aggregator   (strategy 4)       ┘
  → muni-direct         (strategies 1+5+7) ┐ Municipality pair
  → muni-aggregator     (strategy 4)       ┘
  → state-direct        (strategies 5+7)   ┐ State pair
  → state-aggregator    (strategy 4)       ┘
  → foundation-aggregator (strategies 4+6) ┐ Foundation pair
  → foundation-direct   (strategy 1+7)     ┘
  = 11 teammates in one team, all searching in parallel

-- Check 1: Sources exist for this scope?
-- Example for Arizona utilities (substitute actual state_code and type):
SELECT COUNT(*) as source_count
FROM funding_sources
WHERE state_code = 'AZ' AND type = 'Utility';

-- Check 2a: Catalog URLs exist for these sources? (Phase 2 prerequisite)
SELECT COUNT(*) as url_count,
  COUNT(DISTINCT spu.source_id) as sources_with_urls
FROM source_program_urls spu
JOIN funding_sources fs ON fs.id = spu.source_id
WHERE fs.state_code = 'AZ' AND fs.type = 'Utility';

-- Check 2b: Programs exist for these sources?
SELECT COUNT(*) as program_count
FROM funding_programs fp
JOIN funding_sources fs ON fs.id = fp.source_id
WHERE fs.state_code = 'AZ' AND fs.type = 'Utility';

-- Check 3: Programs due for checking? (smart schedule)
SELECT COUNT(*) as due_count
FROM funding_programs fp
JOIN funding_sources fs ON fs.id = fp.source_id
WHERE fs.state_code = 'AZ'
  AND fp.status IN ('active', 'unknown')
  AND fp.next_check_at <= NOW()
  AND NOT EXISTS (
    SELECT 1 FROM funding_opportunities fo
    WHERE fo.status = 'Open'
    AND fo.close_date IS NOT NULL
    AND (fo.program_id = fp.id
      OR (fo.funding_source_id = fp.source_id
          AND fo.title ILIKE '%' || fp.name || '%'))
  );

-- Check 4: Staging pipeline counts
SELECT
  COUNT(*) FILTER (WHERE extraction_status = 'pending') as pending_extraction,
  COUNT(*) FILTER (WHERE extraction_status = 'complete' AND analysis_status = 'pending') as pending_analysis,
  COUNT(*) FILTER (WHERE analysis_status = 'complete' AND storage_status = 'pending') as pending_storage,
  COUNT(*) FILTER (WHERE storage_status = 'error') as errors
FROM manual_funding_opportunities_staging;

-- Check 5: Review queue
SELECT
  COUNT(*) FILTER (WHERE promotion_status = 'pending_review') as pending_review,
  COUNT(*) FILTER (WHERE promotion_status = 'promoted') as promoted,
  COUNT(*) FILTER (WHERE promotion_status = 'rejected') as rejected
FROM funding_opportunities
WHERE promotion_status IS NOT NULL;

STEP 1: Create ONE team for all funder types in this run
─────────────────────────
TeamCreate(
  team_name="source-discovery-FL",
  description="Phase 1: Discover FL County + State funding sources"
)

STEP 2: Spawn ALL teammates IN PARALLEL (one message, all Task calls together)
─────────────────────────
// IMPORTANT: model: "sonnet" for ALL Phase 1 teammates (not opus)
// County pair:
Task(
  subagent_type="source-registry-agent",
  model="sonnet",
  team_name="source-discovery-FL",
  name="county-direct",
  prompt="You are the COUNTY-DIRECT teammate. Execute strategies 1+5+7.
          State: FL

          ## SCOPE
          You are searching for type='County' sources ONLY.
          If you find an entity that should be a different type (Utility, Municipality,
          State, etc.), DO NOT propose it. Note it in your report as 'out-of-scope
          entity found: [name], suggested type: [X]' and the orchestrator will decide.
          Also search for regional Councils of Governments (COGs) that administer
          CDBG/HOME on behalf of small counties — propose these as type='Other'.

          Read SEARCH-REFERENCE.md for detailed instructions.

          ## CRITICAL: DO NOT WRITE TO THE DATABASE
          You are a PROPOSER, not a writer. Search, verify, and propose entities.
          Send your proposed entity list to team-lead via SendMessage.
          The orchestrator will validate and INSERT.

          When done, broadcast your entity list to the team for cross-checking.
          Cross-check with county-aggregator's findings."
)

Task(
  subagent_type="source-registry-agent",
  model="sonnet",
  team_name="source-discovery-FL",
  name="county-aggregator",
  prompt="You are the COUNTY-AGGREGATOR teammate. Execute strategy 4.
          State: FL

          ## SCOPE
          You are searching for type='County' sources ONLY.
          [same scope block as above]

          ## CRITICAL: DO NOT WRITE TO THE DATABASE
          [same no-write block as above]

          Cross-check with county-direct's findings."
)

// State pair (same pattern with type='State' in SCOPE block):
Task(
  subagent_type="source-registry-agent",
  model="sonnet",
  team_name="source-discovery-FL",
  name="state-direct",
  prompt="... [same pattern, SCOPE says type='State' ONLY] ..."
)

Task(
  subagent_type="source-registry-agent",
  model="sonnet",
  team_name="source-discovery-FL",
  name="state-aggregator",
  prompt="... [same pattern, SCOPE says type='State' ONLY] ..."
)

STEP 3: Wait for teammates to search and cross-check
─────────────────────────
Teammates will:
  a) Execute their assigned search strategies (the slow part — 10-15 min)
  b) Broadcast their proposed entity list to the team via SendMessage
  c) Cross-check other teammates' proposals (flag overlaps, stale entities, type issues)
  d) Send FINAL PROPOSED LIST to team-lead with confidence levels

Each proposed entity includes:
  - name (official name from the entity's own website)
  - website URL
  - type
  - description
  - proposed catalog URLs (for source_program_urls)
  - confidence (HIGH/MEDIUM/LOW)
  - name_source ("from footer", "from About Us page", "from breadcrumb", "from search result only")

STEP 4: Orchestrator validates and writes (the fast part — 3-5 min)
─────────────────────────
For each proposed entity, the orchestrator:

  a) DEDUP CHECK — by URL AND normalized name:
     SELECT id, name, website FROM funding_sources
     WHERE state_code = $1
       AND (website ILIKE '%' || $domain || '%'
            OR LOWER(name) ILIKE '%' || $normalized_name || '%')
       AND name NOT LIKE '[DEPRECATED-%';
     Also check source_program_urls for URL matches.

  b) TYPE VALIDATION — does the proposed type match the run scope? If a teammate
     proposed type='Utility' in a County-scoped run, the orchestrator flags it as
     out-of-scope (log it, don't insert).

  c) NAME SPOT-CHECK — for any entity where name_source is "from search result only"
     (not verified against official website), do a quick WebFetch of the website URL
     and check page title / breadcrumb / footer for the official name. Correct if needed.

  d) INSERT — if passes all checks:
     INSERT INTO funding_sources (name, website, type, sectors, state_code, pipeline, description)
     VALUES (..., 'manual', ...);
     INSERT INTO source_program_urls (source_id, url) VALUES (...);

  e) LOG — write to claude_change_log:
     INSERT INTO claude_change_log (table_name, operation, pipeline_phase, batch_id, record_count, change_reason)
     VALUES ('funding_sources', 'INSERT', 'source_registry', $batch_id, 1, 'Registered: [name]');

STEP 5: Clean up
─────────────────────────
- Send shutdown_request to all teammates
- TeamDelete to clean up
- Report: "Phase 1 complete: X sources registered, Y catalog URLs, Z out-of-scope flagged"
- Proceed to Phase 2

Funder Type	Team Size	Teammates & Strategy Groups
Utility	3	`regulatory` (strategies 2+3), `aggregator` (strategies 4+6), `direct` (strategies 1+5)
Tribal	3	`regulatory` (strategy 3/EIA), `aggregator` (strategy 4), `direct` (strategies 1+5)
County	2	`direct` (strategies 1+5+7), `aggregator` (strategy 4)
Municipality	2	`direct` (strategies 1+5+7), `aggregator` (strategy 4)
State	2	`direct` (strategy 5+7), `aggregator` (strategy 4)
Foundation	2	`aggregator` (strategies 4+6), `direct` (strategy 1+7)
Federal	2	`direct` (strategy 5), `aggregator` (strategy 4)

User: "Find county and state sources in Florida"

TeamCreate(team_name="source-discovery-FL", description="Phase 1: FL County + State sources")

Spawn all teammates in parallel (one message):
  → county-direct    (strategies 1+5+7, type=County)     ┐ County pair
  → county-aggregator (strategy 4, type=County)           ┘ cross-checks
  → state-direct     (strategies 5+7, type=State)        ┐ State pair
  → state-aggregator  (strategy 4, type=State)            ┘ cross-checks
  = 4 teammates, all searching in parallel

User: "Find all sources in Georgia"

TeamCreate(team_name="source-discovery-GA", description="Phase 1: All GA sources")

Spawn all teammates in parallel:
  → utility-regulatory  (strategies 2+3)   ┐
  → utility-aggregator  (strategies 4+6)   ├ Utility trio
  → utility-direct      (strategies 1+5)   ┘
  → county-direct       (strategies 1+5+7) ┐ County pair
  → county-aggregator   (strategy 4)       ┘
  → state-direct        (strategies 5+7)   ┐ State pair
  → state-aggregator    (strategy 4)       ┘
  = 7 teammates total in one team

Group Name	Strategy Numbers	Focus
`regulatory`	Strategies 2 + 3	PUC databases, EIA federal data
`aggregator`	Strategies 4 + 6	DSIRE, EnergySage, ACEEE, foundation databases
`direct`	Strategies 1 + 5 + 7	Direct listing search, agency search, taxonomy-driven search

Parameter	Rule
Sources per batch	3 at a time (group by funder type within batches)
Scouts per batch	1 scout per 2-3 catalog URLs + 1 searcher per source
Max teammates	Target 6-12 per team (3 sources × ~3-4 teammates each)
Extractors	1 per ~10 programs (sized after Round 1 results)
Source ordering	Within funder type, order by DB entry time (`created_at`)

STEP 0: Query sources + catalog URL counts for the scope
─────────────────────────
SELECT fs.id, fs.name, fs.type,
  (SELECT COUNT(*) FROM source_program_urls spu WHERE spu.source_id = fs.id) as url_count
FROM funding_sources fs
WHERE fs.state_code = 'FL' AND fs.type = 'County'
AND (fs.programs_last_searched_at IS NULL
     OR fs.programs_last_searched_at < NOW() - INTERVAL '90 days')
ORDER BY fs.created_at;

→ Plan batches: 3 sources at a time

STEP 1: Create scout team for batch
─────────────────────────
TeamCreate(
  team_name="program-discovery-FL-county-batch-1",
  description="Phase 2 Round 1: Find programs for FL County sources (batch 1)"
)

STEP 2: Create shared tasks
─────────────────────────
TaskCreate(subject="Scout catalog URLs for programs", ...)
TaskCreate(subject="Supplementary web search for missed programs", ...)
TaskCreate(subject="Cross-check and deduplicate program lists", ...)

STEP 3: Spawn ALL scout teammates IN PARALLEL
─────────────────────────
// For each source in batch, spawn scouts for its catalog URLs:
// Source 1: Alachua County (3 catalog URLs → 1 scout + 1 searcher)
Task(
  subagent_type="program-discovery-agent",
  team_name="program-discovery-FL-county-batch-1",
  name="alachua-scout",
  prompt="mode: scout
          Source: Alachua County Office of Sustainability (source_id: abc-123)
          Catalog URLs to crawl:
            1. https://sustainability.alachuacounty.us/programs
            2. https://sustainability.alachuacounty.us/grants
            3. https://dsire.org/alachua-county
          Crawl each URL, follow links 1 level deep, identify funding programs.
          DB reads: mcp__postgres__query
          Report your program list to the team when done.
          Cross-check with alachua-searcher's findings."
)

Task(
  subagent_type="program-discovery-agent",
  team_name="program-discovery-FL-county-batch-1",
  name="alachua-searcher",
  prompt="mode: scout, role: searcher
          Source: Alachua County Office of Sustainability (source_id: abc-123)
          Do supplementary web search for programs NOT on the catalog pages.
          Search queries:
            - 'Alachua County sustainability rebate programs'
            - 'Alachua County energy efficiency incentives 2026'
            - 'Alachua County grants environment'
          DB reads: mcp__postgres__query
          Report your program list to the team when done.
          Cross-check with alachua-scout's findings."
)

// Source 2: Broward County (2 catalog URLs → 1 scout + 1 searcher)
Task(
  subagent_type="program-discovery-agent",
  team_name="program-discovery-FL-county-batch-1",
  name="broward-scout",
  prompt="mode: scout
          Source: Broward County ... (source_id: def-456)
          Catalog URLs: [...]
          ..."
)

Task(
  subagent_type="program-discovery-agent",
  team_name="program-discovery-FL-county-batch-1",
  name="broward-searcher",
  prompt="mode: scout, role: searcher
          Source: Broward County ... (source_id: def-456)
          ..."
)

// Source 3: similar pattern
// = 6 teammates for 3 sources (2 per source: 1 scout + 1 searcher)

STEP 4: Wait for scouts to complete and cross-check
─────────────────────────
Scouts will:
  a) Crawl their assigned URLs / run web searches
  b) Report program lists to the team
  c) Cross-check with their pair partner (scout ↔ searcher per source)
  d) Converge on a merged program list per source

STEP 5: Collect scout results and clean up
─────────────────────────
- Orchestrator collects all program lists from scouts
- Deduplicates across sources (same program might appear under multiple sources)
- Sends shutdown_request to all teammates
- TeamDelete to clean up
- Repeat for next batch until all sources processed

STEP 1: Create extractor team
─────────────────────────
TeamCreate(
  team_name="program-extraction-FL-county",
  description="Phase 2 Round 2: Extract and store FL County programs"
)

STEP 2: Spawn extractor teammates with explicit assignments
─────────────────────────
// Divide programs among extractors (~10 programs each)
Task(
  subagent_type="program-discovery-agent",
  team_name="program-extraction-FL-county",
  name="extractor-1",
  prompt="mode: extractor
          DB writes: source .env.local && psql \"$DEV_CLAUDE_URL\"
          DB reads: mcp__postgres__query

          Extract and store these programs:
          1. Green Business Program — https://sustainability.alachuacounty.us/programs/green-business
             source_id: abc-123, source_name: Alachua County Office of Sustainability
          2. Energy Audit Program — https://sustainability.alachuacounty.us/programs/energy-audit
             source_id: abc-123, source_name: Alachua County Office of Sustainability
          3. Tree Planting Grants — https://sustainability.alachuacounty.us/programs/tree-grants
             source_id: abc-123, source_name: Alachua County Office of Sustainability
             PDFs: [https://sustainability.alachuacounty.us/docs/tree-grant-app.pdf]
          ... (up to ~10 programs)

          For each: visit URL, extract structured fields, dedup check, INSERT/UPDATE
          funding_programs. Set status='active', next_check_at=NOW(), pipeline='manual'.
          Update programs_last_searched_at on each source when all its programs are done."
)

Task(
  subagent_type="program-discovery-agent",
  team_name="program-extraction-FL-county",
  name="extractor-2",
  prompt="mode: extractor
          ... (next batch of ~10 programs)"
)

// = 1 extractor per ~10 programs

STEP 3: Wait for extractors to complete
─────────────────────────
Extractors will:
  a) Visit each program URL
  b) Extract structured data (7+ fields)
  c) Dedup against existing funding_programs
  d) INSERT/UPDATE funding_programs
  e) Report results to team lead

STEP 4: Collect results and clean up
─────────────────────────
- Orchestrator collects extraction reports
- Sends shutdown_request to all extractors
- TeamDelete to clean up
- Reports: "Phase 2 complete: X programs (Y new, Z updated) across N sources"

Task(subagent_type="program-discovery-agent",
     prompt="mode: scout. Run standalone for [source]. Crawl all catalog URLs AND do web search...")

Task(subagent_type="program-discovery-agent",
     prompt="mode: extractor. Extract these programs: [list]. DB writes: ...")

STEP 0: Pre-flight (orchestrator runs directly)
─────────────────────────
// Smart scheduling query — get eligible programs
mcp__postgres__query:
SELECT fp.id, fp.name, fp.description, fp.program_urls,
       fp.status as program_status, fp.source_id,
       fs.name as source_name, fs.state_code, fs.type
FROM funding_programs fp
JOIN funding_sources fs ON fs.id = fp.source_id
WHERE fp.status IN ('active', 'unknown')
  AND fp.next_check_at <= NOW()
  -- Scope filter (substitute actual values):
  AND fs.state_code = 'AZ' AND fs.type = 'Utility'
  AND NOT EXISTS (
    SELECT 1 FROM funding_opportunities fo
    WHERE fo.status = 'Open'
    AND fo.close_date IS NOT NULL
    AND (fo.program_id = fp.id
      OR (fo.funding_source_id = fp.source_id
          AND fo.title ILIKE '%' || fp.name || '%'))
  )
ORDER BY fp.source_id, fp.name;

// Report count and stop if zero eligible.

STEP 1: Create team + spawn checkers (model: "sonnet")
─────────────────────────
TeamCreate(team_name="opportunity-check-AZ-utility", ...)

// Group programs by source (~10-15 per checker)
Task(
  subagent_type="opportunity-discovery-agent",
  model="sonnet",
  team_name="opportunity-check-AZ-utility",
  name="checker-aps",
  prompt="You are an opportunity discovery REPORTER (not a writer).
          Read: .claude/skills/opportunity-discovery/SKILL.md

          ## CRITICAL: DO NOT WRITE TO THE DATABASE
          Report your findings via SendMessage. The orchestrator writes.

          Your assigned programs: [list with IDs, URLs]

          For each program, report ALL of these fields:
            status, window_type, open_date, close_date, application_url,
            funding_status, funding_note, has_details, guidelines_url,
            suggested_next_check, next_check_reason, new_urls_found, flags

          Send your complete report to team-lead when done."
)

STEP 2: Collect checker reports
─────────────────────────
Checkers will:
  a) Crawl each program's URLs
  b) Assess application status (Open / Upcoming / Nothing)
  c) Assess funding status (verified_active / presumed_active / limited_funding /
     oversubscribed / exhausted)
  d) Assess window type (dated / rolling / cycle_based)
  e) Report ALL findings to team lead via SendMessage (NO DB writes)

STEP 3: Orchestrator validates and writes (the critical step)
─────────────────────────
For each program in the checker reports, the orchestrator applies this logic:

  === RULE: Open + exhausted should NEVER coexist on a stored opportunity ===

  IF checker reports funding_status=exhausted:
    → Check: does an Open opportunity exist for this program_id?
    → IF YES: UPDATE it to status='Closed', funding_status='exhausted',
      funding_note=[evidence], funding_verified_at=NOW()
    → IF NO: do nothing (no staging record — money is gone)
    → Set next_check_at = NOW() + 30 days (short cycle to catch refunding)
    → Log: "Closed [title] due to funding exhaustion"
    → Do NOT compare hashes — the page changed (exhaustion language appeared),
      but we don't want to re-stage it, we want to close it.

  IF checker reports status=Open AND funding_status != exhausted:
    → Check: does an Open opportunity already exist for this program_id?
    → IF YES AND window_type=rolling: compare source_hash.
      - Hash UNCHANGED: just UPDATE funding_verified_at=NOW() and
        funding_status=[checker's assessment]. No new staging record.
      - Hash CHANGED: content was updated (new amounts, new eligibility, etc.)
        — create a new staging record to re-extract the updated content.
    → IF YES AND window_type=dated with different dates: new round — create staging record
    → IF NO existing opportunity: create new staging record with all fields
    → Set next_check_at per the scheduling table in opportunity-discovery SKILL.md

  IF checker reports status=Upcoming:
    → IF has_details=true: create staging record (capture the details early)
    → IF has_details=false: do NOT create staging record, just set
      next_check_at = NOW() + 30 days (check monthly until details appear)

  IF checker reports status=Nothing:
    → Do NOT create staging record
    → Set next_check_at per checker's suggestion

  For ALL programs: UPDATE funding_programs.last_checked_at = NOW()
  For ALL programs: UPDATE funding_programs.next_check_at = [from report]

STEP 4: Clean up and chain
─────────────────────────
- Shutdown all checkers, TeamDelete
- Report: "Phase 3 complete: X staged, Y closed-exhausted, Z unchanged, W nothing-found"
- If auto-chaining to Phase 4: check staging counts and proceed

Task(subagent_type="opportunity-discovery-agent",
     prompt="Standalone mode. Check all programs matching [SCOPE] for open/upcoming
             opportunities. Run pre-flight yourself, then process all eligible programs.
             DB writes: psql \"$DEV_CLAUDE_URL\" -c \"...\"")

-- Count pending (via mcp__postgres__query)
SELECT COUNT(*) FROM manual_funding_opportunities_staging
WHERE extraction_status = 'pending';

If count > 0: spawn extraction agents sequentially (1 per batch of 20)

Task(subagent_type="extraction-agent",
     prompt="Phase 4: Extract pending staging records.

             Skill file: .claude/skills/extraction/SKILL.md
             Taxonomy file: lib/constants/taxonomies.js (MUST read before extraction)

             Process:
             1. Read taxonomies
             2. Query staging WHERE extraction_status='pending' ORDER BY id LIMIT 20
             3. For each record: claim → fetch URLs → compute source_hash → extract → update
             4. Output batch report

             Database reads: mcp__postgres__query
             Database writes: source .env.local && psql \"$DEV_CLAUDE_URL\"")

After each agent completes, re-check pending count. If more remain, spawn another. Expected report format: complete/skipped/error counts per record.

If count == 0: skip, report "No pending extraction records"

-- Count pending (via mcp__postgres__query)
SELECT COUNT(*) FROM manual_funding_opportunities_staging
WHERE extraction_status = 'complete' AND analysis_status = 'pending';

If count > 0: spawn analysis agents sequentially (1 per batch of 20)

Task(subagent_type="analysis-agent",
     prompt="Phase 5: Analyze extracted staging records.

             Skill file: .claude/skills/analysis/SKILL.md
             V2 reference files (MUST read before analysis):
               - lib/agents-v2/core/analysisAgent/contentEnhancer.js
               - lib/agents-v2/core/analysisAgent/scoringAnalyzer.js
               - lib/agents-v2/core/analysisAgent/parallelCoordinator.js
             Taxonomy file: lib/constants/taxonomies.js (MUST read before analysis)

             Process:
             1. Read taxonomies + V2 analysis files
             2. Query staging WHERE extraction_status='complete'
                AND analysis_status='pending' ORDER BY id LIMIT 20
             3. For each record: claim → content enhancement (6 fields)
                → deterministic scoring → merge → update as 'complete'
             4. Output batch report with score distribution

             Note: actionableSummary uses the 'How to Win' prompt from Skill Section 3b.
             Note: Filtering is NOT the agent's job — orchestrator handles it post-batch.

             Database reads: mcp__postgres__query
             Database writes: source .env.local && psql \"$DEV_CLAUDE_URL\"")

After each agent completes, re-check pending count. If more remain, spawn another. Expected report format: complete/error counts per record + score distribution.

After ALL analysis agents complete, run the filter SQL:

-- Orchestrator runs this via psql (NOT the analysis agent)
UPDATE manual_funding_opportunities_staging
SET analysis_status = 'filtered',
    analysis_error = 'Filtered: finalScore ' ||
      (analysis_data->'scoring'->>'finalScore') || ' below threshold 2'
WHERE analysis_status = 'complete'
  AND (analysis_data->'scoring'->>'finalScore')::numeric < 2;

Report: "Filtered X of Y records (finalScore < 2)"

If count == 0: skip, report "No pending analysis records"

-- Count pending (via mcp__postgres__query)
SELECT COUNT(*) FROM manual_funding_opportunities_staging
WHERE analysis_status = 'complete' AND storage_status = 'pending';

If count > 0: spawn storage agent (1 agent per batch of 20)

Task(subagent_type="storage-agent",
     prompt="Store analyzed records to production.
             REQUIRED — read these V2 reference files first:
               - lib/agents-v2/core/storageAgent/dataSanitizer.js
               - lib/services/locationMatcher.js
               - lib/agents-v2/core/storageAgent/utils/fieldMapping.js
             Query staging WHERE analysis_status='complete'
               AND storage_status='pending' ORDER BY id LIMIT 20.
             For each record:
               1. Sanitize fields per dataSanitizer functions
               2. UPSERT to funding_opportunities with:
                  - promotion_status = 'pending_review'
                  - api_source_id = NULL
                  - api_opportunity_id = 'manual'
                  - program_id from staging.program_id
                  - funding_source_id from staging.source_id
               3. Link coverage areas from eligible_locations
               4. Update staging: storage_status='complete',
                  opportunity_id=<returned UUID>, stored_by='storage-agent'
             TEXT FIELDS MUST BE COPIED VERBATIM — no truncation.
             Dollar-quote with $STOR$...$STOR$.
             DB writes: source .env.local && psql \"$<ENV_VAR>\"")

After each agent completes, re-check count. If more pending, spawn another. Report: "Stored X records (Y new, Z updated). Coverage areas linked: N."

If count == 0: skip, report "No pending storage records"

═══ PHASE [N] COMPLETE: [Phase Name] ═══
  Records processed: X (Y new, Z updated)
  Errors: N (details in staging table)
  Warnings: [list any flags]
  → Starting Phase [N+1]: [Next Phase Name]...

Situation	Action
Scope is genuinely ambiguous	ASK — one focused clarification question, then proceed (see Section 3)
Zero results from any phase	STOP — ask user before continuing
Unusually low count (1 program for major utility)	WARN — continue but flag in summary
URL failures > 30% for a source	WARN — flag source as potentially stale
All programs have open opportunities	REPORT — "all programs currently covered, nothing new to check"
Staging has errors from previous run	WARN — "X errors from previous run. Retry these?"
Large batch (100+ records entering a phase)	CONFIRM — "100+ records. Proceed?"
Agent Team timeout or error	FALLBACK — switch to sequential Task tool
Cross-check finds discrepancy between teammates	FLAG — include in summary for user review
Source returns significantly different count than last run	WARN — "PG&E had 15 programs last run, now 8. Some may have been discontinued."

═══════════════════════════════════════════
PIPELINE COMPLETE: [Scope Description]
═══════════════════════════════════════════
Phase 1 — Sources:       X registered (Y new, Z enriched)
Phase 1 — Catalog URLs:  X discovered
Phase 2 — Programs:      X discovered (Y new, Z updated)
Phase 3 — Opportunities: X found (Y Open, Z Upcoming)
Phase 4 — Extracted:     X of Y (Z errors)
Phase 5 — Analyzed:      X of Y (avg score: N.N)
Phase 6 — Stored:        X of Y (promotion_status='pending_review')
Phase 6 — Coverage:      X areas linked

Flags:
- [any warnings accumulated during pipeline]

Next step: "Review pending" to approve for publication
═══════════════════════════════════════════

batch_id = "run-YYYYMMDD-HHMM"  (e.g., "run-20260206-1430")

INSERT INTO claude_change_log
  (table_name, operation, pipeline_phase, batch_id, record_count, change_reason)
VALUES
  ('funding_sources', 'INSERT', 'source_registry', 'run-20260206-1430', 8,
   'Registered 8 utility sources for Arizona (5 new, 3 enriched)');

SELECT pipeline_phase, table_name, operation, record_count, change_reason
FROM claude_change_log
WHERE batch_id = 'run-20260206-1430'
ORDER BY executed_at;

SELECT update_opportunity_statuses();

Pipeline Orchestrator

1. Mission

Model Selection Rule

Pipeline Orchestrator

1. Mission

Model Selection Rule

2. Pipeline Map

3. Request Parsing

Scope Parsing

Ask When Uncertain

3.5. Multi-Type Scope Resolution

Determining Applicable Funder Types

Multi-Type Spawning Pattern

Combining Results

4. Database State Assessment

5. Prerequisite Logic

6. Agent Team Spawning (Discovery Phases 1-3)

CRITICAL: ALWAYS Use TeamCreate for Discovery Phases

Concrete Tool Call Pattern — Single-Writer (Orchestrator Writes)

Team Sizing Per Funder Type

Multi-Type Spawning — Single Team with Paired Teammates

Cross-Check Protocol

Strategy Group → Strategy Mapping

Phase 1 — Source Registry Teams

Phase 2 — Program Discovery Teams (Two-Round Pattern)

Batch Sizing Rules

Round 1 — Scout Team (Find Program URLs)

Between Rounds — Orchestrator Deduplication

Round 2 — Extractor Team (Extract + Store)

Fallback (ONLY if TeamCreate fails)

Phase 3 — Opportunity Discovery Teams (Single-Writer Pattern)

Phase 3 Fallback (ONLY if TeamCreate fails)

General Fallback (ONLY if TeamCreate fails)

7. Task Tool Spawning (Processing Phases 4-6)

Batch Processing Pattern

Phase 4 — Extraction

Phase 5 — Analysis

Phase 6 — Storage

8. Phase Reporting

9. Judgment Rules

10. Final Summary

11. Audit Logging

12. Maintenance & Auto-Close

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api