Preamble (run first) | Skills Pool

Skill ファイル

Preamble (run first)

Build and maintain the centralized parent company mapping table. Resolves company names and codes to their ultimate parent using parallel.ai enrichment, with ownership timelines, tickers, and network affiliations. One table, all data sources. Code-based joins, not name-based. Drives the Soria platform through `mcp__soria__*` tools and the `parallel.ai` API. Use when asked to "parent map", "who owns", "map these companies", "consolidate companies", "resolve parent", or when a new data source needs parent company rollup for dives. Use after /ingest (companies exist in warehouse), before /dive (dbt intermediate/marts need parent join). Parallel to /map (value mapping). (soria-stack)

Soria-Inc0 スター2026/04/18

職業
カテゴリ: Backend

スキル内容

Preamble (run first)

mkdir -p ~/.soria-stack/artifacts
echo "SKILL: parent-map"
echo "---"
echo "Checking for prior artifacts..."
ls -t ~/.soria-stack/artifacts/ingest-*.md ~/.soria-stack/artifacts/parent-map-*.md 2>/dev/null | head -5

Read ETHOS.md. Key principles: #9 (historical names stay historical), #10 (validate with eyes).

Skill routing (always active)

Need to scrape/extract source data first → invoke /ingest
Need to normalize values (not parent companies) → invoke /map
Parent mapping done, ready to build a dive → invoke /dive
Parent mapping done, ready to verify → invoke /verify

/parent-map — "Who owns this company?"

You are building the single source of truth for parent company assignments across all data sources. Every dive that shows company-level data joins to this table. Get it wrong and market share, concentration metrics, and company trend lines are all broken.

関連 Skill

input_name          -- Company name as it appears in source data
code                -- Numeric/string join key (group code, cocode, contract ID, CCN)
code_type           -- What the code is: naic_group, naic_cocode, ma_contract, cms_ccn, etc.
current_parent_name -- Raw parent from parallel.ai
canonical_parent    -- Cleaned display name for dives (value-mapped)
ticker              -- Stock ticker if public, NULL if private
affiliation         -- Network: BCBS, Delta Dental, Kaiser Permanente, or NULL
sources             -- Citation URLs from parallel.ai
ownership_timeline  -- JSON array of ownership changes with dates
source              -- Where mapping came from: parallel.ai, manual, naic_demographics
created_at          -- When mapping was created

-- NAIC: two-stage join (group first, cocode fallback)
LEFT JOIN staging.stg_parent_mapping m_group
  ON d.group_code = m_group.code AND m_group.code_type = 'naic_group'
LEFT JOIN staging.stg_parent_mapping m_cocode
  ON d.cocode = m_cocode.code AND m_cocode.code_type = 'naic_cocode'
-- Result: COALESCE(m_group.canonical_parent, m_cocode.canonical_parent, d.company_name)

-- MA Enrollment: direct join on contract
LEFT JOIN staging.stg_parent_mapping m
  ON d.contract_number = m.code AND m.code_type = 'ma_contract'

-- SNF/HHA Cost Reports: join on CCN
LEFT JOIN staging.stg_parent_mapping m
  ON d.provider_ccn = m.code AND m.code_type = 'cms_ccn'

-- Generic: any dataset with a code
LEFT JOIN staging.stg_parent_mapping m
  ON d.{code_column} = m.code AND m.code_type = '{type}'

mcp__soria__warehouse_query(sql="
WITH src_entities AS (
  SELECT DISTINCT
    {code_column} AS code,
    '{code_type}' AS code_type,
    {name_column} AS entity_name
  FROM {source_table}
  WHERE {code_column} IS NOT NULL
)
SELECT
  COUNT(*) AS total,
  COUNT(CASE WHEN m.code IS NOT NULL THEN 1 END) AS matched,
  COUNT(CASE WHEN m.code IS NULL THEN 1 END) AS unmapped
FROM src_entities e
LEFT JOIN staging.stg_parent_mapping m
  ON e.code = m.code AND e.code_type = m.code_type
")

-- Grouped companies: one per group_code (preferred)
SELECT DISTINCT group_code AS code, 'naic_group' AS code_type, group_name
FROM staging.stg_naic_demographics
WHERE group_code IS NOT NULL AND group_code != '0'

-- Ungrouped: one per cocode (fallback)
SELECT DISTINCT cocode AS code, 'naic_cocode' AS code_type, full_company_name
FROM staging.stg_naic_demographics
WHERE group_code IS NULL OR group_code = '0'

SELECT DISTINCT contract_number AS code, 'ma_contract' AS code_type, parent_organization
FROM staging.stg_ma_monthly_enrollment_by_plan

SELECT DISTINCT provider_ccn AS code, 'cms_ccn' AS code_type, facility_name
FROM staging.stg_snf_cost_reports

{
  "current_parent_name": {
    "type": "string",
    "description": "Current ultimate parent company name as of 2025"
  },
  "ticker": {
    "type": ["string", "null"],
    "description": "Stock ticker if publicly traded, null if private"
  },
  "ownership_timeline": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "parent_name": {"type": "string"},
        "from_date": {"type": ["string", "null"], "description": "YYYY-MM-DD or YYYY"},
        "to_date": {"type": ["string", "null"], "description": "YYYY-MM-DD or YYYY, null if current"},
        "event": {"type": "string", "description": "founding, rename, acquisition, merger, spinoff, current, independent"}
      }
    }
  }
}

Good:  "Anthem Inc - U.S. health insurance holding company, NAIC group 671. Rebranded to Elevance Health in 2022."
Good:  "H5216 - CMS Medicare Advantage contract operated by Humana Insurance Company in Kentucky."
Bad:   "Anthem Inc"
Bad:   "H5216"

# Create task group
group = client.beta.taskGroup.create({})

# Add runs in chunks of 100
for chunk in chunks(inputs, 100):
    client.beta.taskGroup.addRuns(group.taskgroup_id, {
        "default_task_spec": {"output_schema": schema},
        "inputs": [{"input": c, "processor": "base"} for c in chunk]
    })

# Poll for completion
while True:
    status = client.beta.taskGroup.retrieve(group.taskgroup_id)
    if not status.is_active:
        break
    sleep(30)

# Stream results
results = client.beta.taskGroup.getRuns(group.taskgroup_id,
    include_input=True, include_output=True)

canonical = {
    'UnitedHealth Group Incorporated': 'UnitedHealth Group',
    'UnitedHealth Group, Inc.': 'UnitedHealth Group',
    'The Cigna Group (Cigna Corporation)': 'The Cigna Group',
    'Elevance Health, Inc.': 'Elevance Health',
    ...
}

for each canonical_parent:
    if ALL_CAPS and len > 4: flag
    if len > 40: flag (verbose parallel output not cleaned)
    if contains '(': flag (parenthetical)
    if starts with 'No ' or 'Not ': flag (failed lookup)
    if contains 'd/b/a': flag

bcbs_keywords = ['blue cross', 'blue shield', 'bcbs', 'anthem', 'elevance',
                  'carefirst', 'premera', 'cambia', 'wellmark', 'highmark',
                  'capital blue', 'guidewell', 'florida blue']
delta_keywords = ['delta dental']
kaiser_keywords = ['kaiser']

AETNA GRP → CVS Health
Anthem Inc Grp → Elevance Health
WELLCARE GRP → Centene
Spectrum Hlth GRP → Corewell Health

Save as CSV with full schema.

Upload via the standard scraper flow:

mcp__soria__scraper_upload_urls(scraper="parent_mapping", count=1)
# upload the CSV to the returned presigned URL
mcp__soria__scraper_confirm_uploads(scraper="parent_mapping")

Schema mappings (CSV):

mcp__soria__schema_mappings(action="read", group_id="{id}")
mcp__soria__schema_mappings(action="update", group_id="{id}", mappings={...})

Publish to warehouse with force:

mcp__soria__warehouse_manage(action="publish", group_id="{id}", force=True)

Re-run dbt for any downstream staging/intermediate that reads from stg_parent_mapping (dbt run --select +stg_parent_mapping+).

mcp__soria__warehouse_query(sql="
WITH src_entities AS (
  SELECT DISTINCT
    {code_column} AS code,
    '{code_type}' AS code_type
  FROM {source_table}
  WHERE {code_column} IS NOT NULL
)
SELECT
  COUNT(*) AS total,
  COUNT(CASE WHEN m.code IS NOT NULL THEN 1 END) AS matched
FROM src_entities e
LEFT JOIN staging.stg_parent_mapping m
  ON e.code = m.code AND e.code_type = m.code_type
")

-- Deduplicate: one parent per code+code_type
ded_parent_mapping AS (
  SELECT code, code_type, canonical_parent, ticker, affiliation
  FROM staging.stg_parent_mapping
  QUALIFY ROW_NUMBER() OVER (
    PARTITION BY code, code_type ORDER BY input_name
  ) = 1
),

-- Join to source data
jnd_add_parent AS (
  SELECT s.*,
    m.canonical_parent AS parent_company,
    m.affiliation,
    m.ticker
  FROM {source_cte} s
  LEFT JOIN ded_parent_mapping m
    ON s.{code_column} = m.code AND m.code_type = '{code_type}'
)

jnd_add_parent AS (
  SELECT j.*,
    COALESCE(
      m_group.canonical_parent,
      m_cocode.canonical_parent,
      j.full_company_name
    ) AS parent_company,
    COALESCE(m_group.affiliation, m_cocode.affiliation) AS affiliation,
    COALESCE(m_group.ticker, m_cocode.ticker) AS ticker
  FROM source j
  LEFT JOIN ded_parent_mapping m_group
    ON j.group_code = m_group.code AND m_group.code_type = 'naic_group'
    AND j.group_code IS NOT NULL AND j.group_code != '0'
  LEFT JOIN ded_parent_mapping m_cocode
    ON j.cocode = m_cocode.code AND m_cocode.code_type = 'naic_cocode'
)

cat > ~/.soria-stack/artifacts/parent-map-$(date +%Y%m%d-%H%M%S).md << 'ARTIFACT'
# Parent Mapping: [Data Source]

## Scope
[Code type, source table, total entities]

## Coverage
[Already mapped, newly mapped, failures collapsed to self-named]

## Parallel.ai Run
[Task group ID, success rate, cost]

## Reconciliation
[Canonical parents created, affiliations tagged, name variants added]

## Match Rate
[100% achieved? If not, what's unmatched and why]

## dbt Model
[Which intermediate/marts model was wired up, join pattern used]

## Outcome
Status: [DONE | DONE_WITH_CONCERNS | BLOCKED | NEEDS_CONTEXT]
Lesson: [What was interesting or unexpected]
ARTIFACT