Name: Growth Team Data Layer
Author: jameswpike-cmyk

Search skills.../

Growth Team Data Layer | Skills Pool

gcloud auth list

bq show --format=prettyjson hs-ai-production:hai_dev.fact_fellow_perf 2>&1 | head -5

gcloud auth login --enable-gdrive-access
gcloud auth application-default login

User asks about...
│
├─ "eligible/available fellows" or "who can work on X" or CSV export
│   → STOP. Read references/eligibility.md NOW. Use the Standard Output Query.
│
├─ fellow matching (education, domain, skills, experience)
│   → STOP. Read references/eligibility.md § "Education & Background Queries".
│   → Apply dual verification (hai_profiles_dim + resumes). Both sources required.
│
├─ fellow/reviewer performance, approval rates, TIC, AHT
│   → Read references/query-patterns.md § "Approval Rates" or § "Reviewer Performance"
│   → Tables: fact_fellow_perf, fact_reviewer_perf, fact_task_activity
│
├─ onboarding funnel
│   → Read references/query-patterns.md § "Funnel Analysis & Drop-Off"
│   → Table: fact_project_funnel
│
├─ funnel flags, Census sync, Fivetran sync, drip comms, Iterable segments, screener flags
│   → STOP. Read references/onboarding-funnel-drip-campaign-setup.md NOW.
│   → Event-driven sync: one row per (fellow × milestone), keyed by deterministic event_id.
│   → Ask user: project_id, slug, assessment y/n, and whether project uses Otter/Feather screener.
│
├─ paid marketing spend, impressions, clicks, CTR, CPM, cross-channel spend
│   → STOP. Read references/fact-paid-marketing.md NOW. Unified table across LinkedIn, Meta, Reddit, Google.
│   → Use `fact_paid_marketing` for spend/impressions/clicks queries. No microcurrency conversion needed.
│
├─ Indeed effectiveness, Indeed job title performance, Indeed cost per applicant
│   → STOP. Read references/fact-paid-marketing.md § "Indeed Effectiveness — Spend × Conversions" NOW.
│   → Join diamond_growth_indeed (spend) with diamond_growth_ashby (conversions, filter source = Indeed).
│
├─ ad campaign performance (Meta, Google, LinkedIn, Reddit, Indeed, ZipRecruiter)
│   → See Team Workflows § "Marketing: Ad campaign performance" below for table pointers.
│
├─ attribution, UTM, sign-ups by channel, cost per sign-up, funnel conversion by campaign
│   → STOP. Read references/fact-hai-attribution.md NOW. Best attribution table — enriched campaign/ad names, Indeed backfill.
│
├─ Reddit ads (spend, impressions, subreddit targeting, conversions)
│   → STOP. Read references/reddit-ads-tables.md NOW. Spend is in microcurrency (÷ 1,000,000).
│   → Default table: campaign_report joined to campaign for names.
│
├─ engagement, engagement score, engagement bucket
│   → STOP. Read references/engagement-score.md NOW.
│   → Classifies fellows into no/low/medium/high engagement tiers from fact_project_funnel.
│
├─ lifecycle comms, email communications, push notifications, fellows invited/onboarding emails
│   → STOP. Read references/lifecycle-comms.md NOW. No profile_id — join via user_id or email.
│   → **Very large table (~13B rows).** Always filter by sent_at date range.
│
├─ Otter / Feather
│   → STOP. Read references/otter-tables.md NOW. Different identity model (email, not profile_id).
│   → Read references/query-patterns-otter.md for all Otter SQL patterns.
│
├─ referrals, referral incentives, referral payouts, referral-to-project mapping
│   → See Team Workflows § "Marketing: Referrals" below.
│   → Tables: hai_public.referrals + hs-ai-sandbox.hai_dev.referrals_project_match (join on incentive ID).
│
└─ anything else (UTM, Ashby, resume lookup)
    → See Team Workflows below, then check references/query-patterns.md

Situation	Read this file FIRST
"Who is eligible/available?"	references/eligibility.md — MANDATORY. Contains the full Standard Output Query (4 CTEs + SELECT + JOINs + WHERE). You must apply every criterion. Do not skip any.
Education, degrees, background	references/eligibility.md — contains the Dual-Source Rule: you must query BOTH `hai_profiles_dim` AND `hai_public.resumes`.
Approval rates, fellow counts	references/query-patterns.md § "Approval Rates by Project", § "Active Fellow Counts"
Onboarding funnel	references/query-patterns.md § "Funnel Analysis & Drop-Off"
Funnel flags, Census/Fivetran sync, drip comms queries, screener flags	references/onboarding-funnel-drip-campaign-setup.md — MANDATORY. Event-driven sync templates (one row per fellow × milestone). Ask user for project_id, slug, assessment y/n, and Otter y/n before generating.
Resume search (keywords, experience, education)	references/query-patterns.md § "Resume Keyword Search", § "Resume Experience & Project Extraction"
Reviewer performance (R1/R2)	references/query-patterns.md § "Reviewer Performance (R1/R2)"
Task lifecycle, comments, block values	references/query-patterns.md § "Task Lifecycle Analysis", § "Comment / Quality Analysis", § "Block Values Analysis"
Otter/Feather campaigns	references/query-patterns-otter.md — approval rates, campaign health, cross-table joins + references/otter-tables.md for schemas
Paid marketing spend, impressions, clicks (cross-channel, including Indeed)	references/fact-paid-marketing.md — `fact_paid_marketing` (LinkedIn, Meta, Reddit, Google) + `diamond_growth_indeed` (Indeed). Spend in USD. UNION ALL pattern included.
Indeed effectiveness, job title ROI, cost per applicant	references/fact-paid-marketing.md § "Indeed Effectiveness — Spend × Conversions" — join `diamond_growth_indeed` (spend) with `diamond_growth_ashby` (conversions).
Attribution, UTM source, sign-ups by campaign, cost per sign-up/FO/allocated	references/fact-hai-attribution.md — best attribution table. Enriched campaign/ad/adset names, Indeed backfill, funnel stage definitions, cost metric formulas.
Fellow engagement score / engagement tiers	references/engagement-score.md — classifies fellows into no/low/medium/high engagement from `fact_project_funnel` email open + funnel milestones.
Lifecycle comms, email comms, push notifications, fellows invited	references/lifecycle-comms.md — standalone reference. No `profile_id` — join via `user_id` or `email_address`. ~13B rows — always filter by `sent_at`.
Reddit ads (spend, targeting, conversions)	references/reddit-ads-tables.md for schemas, joins, and query patterns. Spend is microcurrency.
Column names or types	references/fact-tables.md or references/dimension-tables.md. If still unsure, run `bq show --format=prettyjson PROJECT:SCHEMA.TABLE`.

Want this column?	It's NOT in	It IS in	Join on
`email`	`hai_profiles_dim`	`hai_public.profiles`	`profile_id = profiles.id`
`full_name`	`hai_profiles_dim`	`hai_public.profiles`	`profile_id = profiles.id`
`status`	`hai_profiles_dim`	`hai_public.profiles`	`profile_id = profiles.id`

# CORRECT — single quotes, backticks just work
bq query --use_legacy_sql=false --format=csv '
SELECT * FROM `hs-ai-production.hai_dev.fact_fellow_perf` LIMIT 10
'

# WRONG — double quotes require escaping backticks, which breaks
bq query --use_legacy_sql=false --format=csv "
SELECT * FROM \`hs-ai-production.hai_dev.fact_fellow_perf\` LIMIT 10
"

bq query --use_legacy_sql=false --format=csv <<'EOF'
SELECT * FROM `hs-ai-production.hai_public.profiles`
WHERE status = 'verified'
EOF

# CORRECT — job runs on hs-ai-production, references handshake-production table
bq query --use_legacy_sql=false '
SELECT * FROM `handshake-production.hai_dev.fact_comments` LIMIT 10
'

# WRONG — explicitly sets project to handshake-production
bq query --project_id=handshake-production --use_legacy_sql=false '...'

-- CORRECT
WHERE DATE(timestamp_col) >= DATE '2026-01-01'
WHERE timestamp_col >= TIMESTAMP('2026-01-01')

-- WRONG — will error: "No matching signature for operator >="
WHERE timestamp_col >= DATE '2026-01-01'

-- For resumes (multiple per profile), take latest:
QUALIFY ROW_NUMBER() OVER (PARTITION BY r.profileId ORDER BY r.updated_at DESC) = 1

-- CORRECT — matches company + position in the SAME job entry
FROM `hs-ai-production.hai_public.resumes` r,
UNNEST(JSON_EXTRACT_ARRAY(r.parsed_data, '$.experience')) AS exp
WHERE LOWER(JSON_EXTRACT_SCALAR(exp, '$.company')) LIKE '%google%'
  AND LOWER(JSON_EXTRACT_SCALAR(exp, '$.position')) LIKE '%software engineer%'

-- WRONG — "google" could be in skills, "software engineer" in a different job
WHERE LOWER(TO_JSON_STRING(r.parsed_data)) LIKE '%google%'
  AND LOWER(TO_JSON_STRING(r.parsed_data)) LIKE '%software engineer%'

Setting	Value
BQ Projects	`handshake-production`, `hs-ai-production`, `hs-ai-sandbox`
Schemas	`hai_dev` (curated fact/dim), `hai_public` (raw platform), `hai_facebook_ads`, `hai_google_ads_google_ads`, `hai_external_linkedin_ads`, `reddit_ads`
Refresh	Hourly via Airflow
Region	US only

#	Column	Source
1	`profile_id`	`hai_profiles_dim` or `profiles`
2	`email`	`hai_public.profiles`
3	`first_name`	`hai_profiles_dim` or `profiles`
4	`last_name`	`hai_profiles_dim` or `profiles`
5	`status`	`hai_public.profiles`
6	`current_onboarding_stage`	`hai_public.profiles`
7	`resume_url_in_product`	`hai_profiles_dim`
8	`highest_education_level`	`hai_profiles_dim`
9	`domain`	`hai_profiles_dim`
10	`subdomain`	`hai_profiles_dim`

#	Column	Source	Purpose
11	`available`	Computed from eligibility CTEs (see `references/eligibility.md`)	Final availability verdict: `Available - Idle`, `Available - Project Paused`, or `Unavailable - Active`
12	`current_project`	`fact_fellow_status`	Shows what project the fellow is on (context for availability)
13	`last_activity`	`fact_fellow_status`	Last activity date (idle if 20+ days ago)
14	`otter_ringfenced`	`fact_fellow_status`	TRUE if Otter activity in last 30 days
15	`on_hold`	`CASE WHEN oh.profile_id IS NOT NULL THEN TRUE ELSE FALSE END`	TRUE if fellow is on the on-hold sheet
16	`opt_cpt`	`survey_opt` CTE	TRUE if fellow requires OPT/CPT sponsorship
17	`country_code`	`hai_public.profiles`	Must be US for eligibility

ls /Users/*/Library/CloudStorage/GoogleDrive-* 2>/dev/null

mkdir -p "<gdrive_path>/My Drive/claude-bq"

printf '# source_tables: <tables>\n# query_date: YYYY-MM-DD\n# query: <summary>\n' > "<path>/YYYY-MM-DD_<description>.csv"

bq query --format=csv --use_legacy_sql=false <<'EOF' >> "<path>/YYYY-MM-DD_<description>.csv"
SELECT ...
EOF

wc -l < "<path>/YYYY-MM-DD_<description>.csv"

File	Contents	Read when...
references/eligibility.md	Full eligibility filter + education dual-source rule	User asks about eligible/available fellows, or any education query
references/fact-tables.md	Column schemas for fact tables (fellow perf, tasks, reviewer perf, block values)	You need exact column names, types, or join keys
references/otter-tables.md	Column schemas for 5 Otter/Feather tables	You need Otter table schemas
references/dimension-tables.md	Schemas for hai_profiles_dim, hai_user_growth_dim, and hai_public tables (resumes, profiles)	You need profile dimensions or resume data
references/reddit-ads-tables.md	Column schemas for 24 Reddit Ads tables (Fivetran sync)	You need Reddit ad performance, conversions, or targeting data
references/fact-paid-marketing.md	Unified daily ad-level spend/impressions/clicks across LinkedIn, Meta, Reddit, Google + Indeed (`diamond_growth_indeed`)	Paid marketing spend, cross-channel spend comparison, CTR, CPM, Indeed job spend
references/fact-hai-attribution.md	Best attribution table — enriched UTM/campaign/ad/adset names, Indeed backfill, funnel stage definitions, cost metric formulas	Attribution, sign-ups by channel/campaign, cost per sign-up/FO/allocated/activated
references/lifecycle-comms.md	Schema, efficiency rules, and query patterns for `lifecycle_communication_messages` (~13B rows)	Lifecycle comms, email/push engagement, onboarding emails, HAI communications
references/query-patterns.md	Fellow search, resume patterns, funnel analysis, active counts, cross-table joins	You're writing a fellow search, resume, or funnel query
references/query-patterns-otter.md	Otter/Feather SQL patterns: approval rates, campaign health, cross-table joins	You're writing any Otter or Feather SQL
references/onboarding-funnel-drip-campaign-setup.md	Event-driven drip sync templates (one row per fellow × milestone, keyed by event_id). HAI-only and HAI+Otter variants.	User asks for funnel flags, Census sync query, Iterable drip segments, or screener step flags

Pattern	Meaning
`*_raw`	Uncapped — inflated by timers left on
`*_capped`	Capped at 1.5x time limit — best for analysis
`payable_*`	Capped at time limit — matches billing

Growth Team Data Layer

Prerequisites

1. Sync latest skill files from GitHub

2. Check gcloud is installed

3. Verify credentials (not just cached)

Growth Team Data Layer

Prerequisites

1. Sync latest skill files from GitHub

2. Check gcloud is installed

3. Verify credentials (not just cached)

4. Present query plan for approval (once)

Quick Decision Tree

Mandatory Reference Reading

Common Pitfalls

Error Prevention

Shell Quoting for bq queries

Project ID: Never run jobs on handshake-production

Type Mismatches: TIMESTAMP vs DATE

Deduplication

Resume Experience Search (Avoid False Positives)

CSV Export: Avoid Security Warnings

Team Workflows

Ops: Find fellows matching project criteria

Ops: Fellow & reviewer performance

Ops: Onboarding funnel

Marketing: Paid marketing (cross-channel spend/impressions/clicks)

Marketing: Ad campaign performance

Marketing: UTM attribution & sourcing

Marketing: Referrals

Otter / Feather campaigns

Environment

Critical Computation Rules

Standard Filters

Standard Output Columns

Core columns (always required)

Availability column + eligibility breakdown

Additional columns (after standard set)

SQL for standard columns

Google Drive CSV Export

Step 1: Detect Google Drive

Step 2: Create output folder

Step 3: Write metadata header

Step 4: Run query and append results

Step 5: Append row count

Rules

Source Attribution

References

Openai Whisper

Voice Call

Prose

Clawhub

Sherpa Onnx Tts

Openai Whisper Api