$3c
Verify these every time this skill is invoked.
Always pull the latest reference files before proceeding — schema and query patterns are updated regularly.
cd ~/hai-growth-claude && git pull
If the pull succeeds, continue. If it fails (e.g. no network, auth issue), note it to the user and continue with local files.
gcloud --version
If not found: brew install --cask gcloud-cli
Do NOT rely on — it shows expired tokens as "active". Test with an actual API call:
gcloud auth listbq show --format=prettyjson hs-ai-production:hai_dev.fact_fellow_perf 2>&1 | head -5
If auth error, run the auth commands directly — do NOT print them for the user to copy-paste. Execute them in the terminal:
gcloud auth login --enable-gdrive-access
gcloud auth application-default login
These commands will open a browser for the user to complete OAuth. Wait for each to finish before proceeding.
--enable-gdrive-access is required — the eligibility filter queries hai_on_hold, a Google Sheets-backed table that needs Drive OAuth scope.
Before running your first bq query, present the user with a plan and get approval:
Use EnterPlanMode and wait for approval before executing. Once the user approves the plan, proceed with execution — do not ask for approval again for follow-up queries within the same task (e.g., retries, refinements, exports, or verification queries).
Read this FIRST to decide what to do.
User asks about...
│
├─ "eligible/available fellows" or "who can work on X" or CSV export
│ → STOP. Read references/eligibility.md NOW. Use the Standard Output Query.
│
├─ fellow matching (education, domain, skills, experience)
│ → STOP. Read references/eligibility.md § "Education & Background Queries".
│ → Apply dual verification (hai_profiles_dim + resumes). Both sources required.
│
├─ fellow/reviewer performance, approval rates, TIC, AHT
│ → Read references/query-patterns.md § "Approval Rates" or § "Reviewer Performance"
│ → Tables: fact_fellow_perf, fact_reviewer_perf, fact_task_activity
│
├─ onboarding funnel
│ → Read references/query-patterns.md § "Funnel Analysis & Drop-Off"
│ → Table: fact_project_funnel
│
├─ funnel flags, Census sync, Fivetran sync, drip comms, Iterable segments, screener flags
│ → STOP. Read references/onboarding-funnel-drip-campaign-setup.md NOW.
│ → Event-driven sync: one row per (fellow × milestone), keyed by deterministic event_id.
│ → Ask user: project_id, slug, assessment y/n, and whether project uses Otter/Feather screener.
│
├─ paid marketing spend, impressions, clicks, CTR, CPM, cross-channel spend
│ → STOP. Read references/fact-paid-marketing.md NOW. Unified table across LinkedIn, Meta, Reddit, Google.
│ → Use `fact_paid_marketing` for spend/impressions/clicks queries. No microcurrency conversion needed.
│
├─ Indeed effectiveness, Indeed job title performance, Indeed cost per applicant
│ → STOP. Read references/fact-paid-marketing.md § "Indeed Effectiveness — Spend × Conversions" NOW.
│ → Join diamond_growth_indeed (spend) with diamond_growth_ashby (conversions, filter source = Indeed).
│
├─ ad campaign performance (Meta, Google, LinkedIn, Reddit, Indeed, ZipRecruiter)
│ → See Team Workflows § "Marketing: Ad campaign performance" below for table pointers.
│
├─ attribution, UTM, sign-ups by channel, cost per sign-up, funnel conversion by campaign
│ → STOP. Read references/fact-hai-attribution.md NOW. Best attribution table — enriched campaign/ad names, Indeed backfill.
│
├─ Reddit ads (spend, impressions, subreddit targeting, conversions)
│ → STOP. Read references/reddit-ads-tables.md NOW. Spend is in microcurrency (÷ 1,000,000).
│ → Default table: campaign_report joined to campaign for names.
│
├─ engagement, engagement score, engagement bucket
│ → STOP. Read references/engagement-score.md NOW.
│ → Classifies fellows into no/low/medium/high engagement tiers from fact_project_funnel.
│
├─ lifecycle comms, email communications, push notifications, fellows invited/onboarding emails
│ → STOP. Read references/lifecycle-comms.md NOW. No profile_id — join via user_id or email.
│ → **Very large table (~13B rows).** Always filter by sent_at date range.
│
├─ Otter / Feather
│ → STOP. Read references/otter-tables.md NOW. Different identity model (email, not profile_id).
│ → Read references/query-patterns-otter.md for all Otter SQL patterns.
│
├─ referrals, referral incentives, referral payouts, referral-to-project mapping
│ → See Team Workflows § "Marketing: Referrals" below.
│ → Tables: hai_public.referrals + hs-ai-sandbox.hai_dev.referrals_project_match (join on incentive ID).
│
└─ anything else (UTM, Ashby, resume lookup)
→ See Team Workflows below, then check references/query-patterns.md
You MUST read the relevant reference files before writing any SQL. Do not guess column names or query patterns.
| Situation | Read this file FIRST |
|---|---|
| "Who is eligible/available?" | references/eligibility.md — MANDATORY. Contains the full Standard Output Query (4 CTEs + SELECT + JOINs + WHERE). You must apply every criterion. Do not skip any. |
| Education, degrees, background | references/eligibility.md — contains the Dual-Source Rule: you must query BOTH hai_profiles_dim AND hai_public.resumes. |
| Approval rates, fellow counts | references/query-patterns.md § "Approval Rates by Project", § "Active Fellow Counts" |
| Onboarding funnel | references/query-patterns.md § "Funnel Analysis & Drop-Off" |
| Funnel flags, Census/Fivetran sync, drip comms queries, screener flags | references/onboarding-funnel-drip-campaign-setup.md — MANDATORY. Event-driven sync templates (one row per fellow × milestone). Ask user for project_id, slug, assessment y/n, and Otter y/n before generating. |
| Resume search (keywords, experience, education) | references/query-patterns.md § "Resume Keyword Search", § "Resume Experience & Project Extraction" |
| Reviewer performance (R1/R2) | references/query-patterns.md § "Reviewer Performance (R1/R2)" |
| Task lifecycle, comments, block values | references/query-patterns.md § "Task Lifecycle Analysis", § "Comment / Quality Analysis", § "Block Values Analysis" |
| Otter/Feather campaigns | references/query-patterns-otter.md — approval rates, campaign health, cross-table joins + references/otter-tables.md for schemas |
| Paid marketing spend, impressions, clicks (cross-channel, including Indeed) | references/fact-paid-marketing.md — fact_paid_marketing (LinkedIn, Meta, Reddit, Google) + diamond_growth_indeed (Indeed). Spend in USD. UNION ALL pattern included. |
| Indeed effectiveness, job title ROI, cost per applicant | references/fact-paid-marketing.md § "Indeed Effectiveness — Spend × Conversions" — join diamond_growth_indeed (spend) with diamond_growth_ashby (conversions). |
| Attribution, UTM source, sign-ups by campaign, cost per sign-up/FO/allocated | references/fact-hai-attribution.md — best attribution table. Enriched campaign/ad/adset names, Indeed backfill, funnel stage definitions, cost metric formulas. |
| Fellow engagement score / engagement tiers | references/engagement-score.md — classifies fellows into no/low/medium/high engagement from fact_project_funnel email open + funnel milestones. |
| Lifecycle comms, email comms, push notifications, fellows invited | references/lifecycle-comms.md — standalone reference. No profile_id — join via user_id or email_address. ~13B rows — always filter by sent_at. |
| Reddit ads (spend, targeting, conversions) | references/reddit-ads-tables.md for schemas, joins, and query patterns. Spend is microcurrency. |
| Column names or types | references/fact-tables.md or references/dimension-tables.md. If still unsure, run bq show --format=prettyjson PROJECT:SCHEMA.TABLE. |
If BigQuery says "Unrecognized name" — stop, run bq show on the table, and find where the column actually lives.
| Want this column? | It's NOT in | It IS in | Join on |
|---|---|---|---|
email | hai_profiles_dim | hai_public.profiles | profile_id = profiles.id |
full_name | hai_profiles_dim | hai_public.profiles | profile_id = profiles.id |
status | hai_profiles_dim | hai_public.profiles | profile_id = profiles.id |
These are the most common errors users hit. Follow these rules strictly.
Always wrap SQL in single quotes. Backticks (for BQ table names) work inside single quotes without escaping. Double quotes + backslash-escaped backticks (```) will fail.
# CORRECT — single quotes, backticks just work
bq query --use_legacy_sql=false --format=csv '
SELECT * FROM `hs-ai-production.hai_dev.fact_fellow_perf` LIMIT 10
'
# WRONG — double quotes require escaping backticks, which breaks
bq query --use_legacy_sql=false --format=csv "
SELECT * FROM \`hs-ai-production.hai_dev.fact_fellow_perf\` LIMIT 10
"
If your SQL contains single quotes (e.g., WHERE status = 'verified'), you cannot nest them inside a single-quoted shell string. Use a heredoc instead:
bq query --use_legacy_sql=false --format=csv <<'EOF'
SELECT * FROM `hs-ai-production.hai_public.profiles`
WHERE status = 'verified'
EOF
Always use <<'EOF' (quoted EOF) so the shell does not interpret backticks or variables inside the heredoc.
Users do NOT have bigquery.jobs.create on handshake-production. Always run queries from hs-ai-production (the default project). Reference handshake-production tables via fully-qualified names:
# CORRECT — job runs on hs-ai-production, references handshake-production table
bq query --use_legacy_sql=false '
SELECT * FROM `handshake-production.hai_dev.fact_comments` LIMIT 10
'
# WRONG — explicitly sets project to handshake-production
bq query --project_id=handshake-production --use_legacy_sql=false '...'
Never compare TIMESTAMP and DATE directly. Always cast to the same type:
-- CORRECT
WHERE DATE(timestamp_col) >= DATE '2026-01-01'
WHERE timestamp_col >= TIMESTAMP('2026-01-01')
-- WRONG — will error: "No matching signature for operator >="
WHERE timestamp_col >= DATE '2026-01-01'
Always ensure query results have no duplicate rows. Use DISTINCT or QUALIFY ROW_NUMBER():
-- For resumes (multiple per profile), take latest:
QUALIFY ROW_NUMBER() OVER (PARTITION BY r.profileId ORDER BY r.updated_at DESC) = 1
When matching company + role together (e.g., "Google software engineers"), do NOT use broad TO_JSON_STRING LIKE — it matches keywords across unrelated sections. Instead, UNNEST the experience array:
-- CORRECT — matches company + position in the SAME job entry
FROM `hs-ai-production.hai_public.resumes` r,
UNNEST(JSON_EXTRACT_ARRAY(r.parsed_data, '$.experience')) AS exp
WHERE LOWER(JSON_EXTRACT_SCALAR(exp, '$.company')) LIKE '%google%'
AND LOWER(JSON_EXTRACT_SCALAR(exp, '$.position')) LIKE '%software engineer%'
-- WRONG — "google" could be in skills, "software engineer" in a different job
WHERE LOWER(TO_JSON_STRING(r.parsed_data)) LIKE '%google%'
AND LOWER(TO_JSON_STRING(r.parsed_data)) LIKE '%software engineer%'
Use broad TO_JSON_STRING LIKE only for single-keyword searches (e.g., "has React experience anywhere").
This UNNEST pattern applies to any multi-field resume search: company + title, school + degree, etc.
Do NOT use cat heredocs or $() command substitution for CSV export — these trigger Claude Code security prompts. Use printf + bq query >> file as shown in the Google Drive CSV Export section.
Criteria varies per project (education, domain, experience, location). Follow these rules in order:
Step 1 (only if applicable): Apply eligibility filter.
STOP — if the user said "eligible", "available", or "who can work on X", or you are exporting a CSV, you MUST read
references/eligibility.mdNOW and use the Standard Output Query as your base. Do not proceed without reading it.
If the user is just searching for fellows by background/skills without mentioning availability, skip this step.
Before running the query, ask the user: "Is this for an Otter/Feather project?" If yes, the Otter-specific eligibility filters (KYC verification + IPRoyal proxy exclusion) MUST be applied in the WHERE clause. See references/eligibility.md for the exact lines.
Step 2: Match criteria using dual verification.
STOP — if you are filtering by education, domain, expertise, or background, you MUST read
references/eligibility.md§ "Education & Background Queries (Dual-Source Rule)" NOW. Do not write SQL until you have read the join patterns.
Any filter on education, domain, expertise, or background MUST be verified in both sources:
hai_profiles_dim: structured fields (highest_education_level, domain, subdomain, major, major_group, graduate_institution_name)hai_public.resumes: parsed resume JSON (parsed_data.education[].degree, parsed_data.education[].fieldOfStudy, parsed_data.experience[].position)Do not match on Source A alone — domain = 'STEM' may miss a fellow with a Ph.D. in Astrophysics classified under a different subdomain. Do not match on Source B alone — parsed resume data can be noisy or incomplete. Use both and include columns from each in the output so the user can verify.
See references/eligibility.md § "Education & Background Queries (Dual-Source Rule)" for join patterns.
Step 3: Apply additional filters. Layer on as needed:
hai_profiles_dim (state_code, country_code)hai_public.resumes with LOWER(TO_JSON_STRING(parsed_data)) LIKE '%keyword%' (single-keyword only — for multi-field matching like company + role, use UNNEST; see Error Prevention)hai_profiles_dim.resume_url_in_producthai_public.resumes.short_parsed_data (72% populated; use parsed_data as fallback)hai_public.resumes.parsed_data → $.experience[] (95% populated)fact_fellow_perf (pre-computed: approval rate, AHT, TIC, hours)fact_fellow_perf WHERE active = TRUEfact_reviewer_perffact_tasks or fact_task_activityfact_commentsfact_block_valuesfact_project_funnel (PSO → Assessment → Contract → First Claim → First Submit → First Approval)hs-ai-production.hai_dev.fact_paid_marketing — daily ad-level data across LinkedIn, Meta, Reddit, Google. Spend already in USD. Use this for cross-channel comparisons, total spend, CTR, CPM.Default to campaign-level reports unless the user asks for ad or keyword detail.
hs-ai-production.hai_facebook_ads.basic_campaign (default), also basic_ad, basic_ad_set + *_actions and *_cost_per_action_type viewshs-ai-production.hai_google_ads_google_ads.google_ads__campaign_report (default), also google_ads__ad_report, google_ads__keyword_reporths-ai-production.hai_external_linkedin_ads.linkedin_ads__campaign_report (default), also linkedin_ads__creative_reporths-ai-production.reddit_ads.campaign_report (default), also ad_group_report, ad_report + *_conversions_report variants. Spend is in microcurrency (÷ 1,000,000). Join to campaign for names. See references/reddit-ads-tables.md.hs-ai-sandbox.hai_dev.diamond_growth_indeed (Google Sheets-backed)hs-ai-sandbox.hai_dev.growth_ziprecruiter (Google Sheets-backed)hai_user_growth_dim JOIN hai_profiles_dimdiamond_growth_ashbyhai_public.referrals (created_at, status, awarded_at, paid_at, incentive_amount_cents)hs-ai-sandbox.hai_dev.referrals_project_match (Google Sheets-backed). Maps incentive_rule_id to project name/ID with incentive amount and status.hai_public.referrals.referral_incentive_id = referrals_project_match.incentive_rule_idreferrals stores incentive_amount_cents (divide by 100 for dollars); referrals_project_match stores incentive_amount as STRINGSTOP — you MUST read
references/otter-tables.mdNOW before writing any Otter/Feather SQL. The identity model, statuses, and grouping are all different from HAI. Do not guess.
Key differences from HAI:
campaign + task_batch (not project_id)email (not profile_id) — map via fact_otter_email_mappingcompleted, signed_off, needs_work, fixing_donefact_otter_task_activity, fact_otter_tasks, fact_otter_comments, fact_otter_fellow_perf, fact_otter_reviewer_perf| Setting | Value |
|---|---|
| BQ Projects | handshake-production, hs-ai-production, hs-ai-sandbox |
| Schemas | hai_dev (curated fact/dim), hai_public (raw platform), hai_facebook_ads, hai_google_ads_google_ads, hai_external_linkedin_ads, reddit_ads |
| Refresh | Hourly via Airflow |
| Region | US only |
GPA — stored as integers (385 = 3.85): current_cumulative_gpa / 100.0 AS gpa
Safe Division — always use NULLIF: SUM(x) / NULLIF(SUM(y), 0)
Approval Rate — from fact_fellow_perf (stored 0.0–1.0): ROUND(approval_rate * 100, 2) AS approval_rate_pct
TIC — (total_major_issues + 0.33 * total_minor_issues) / NULLIF(tasks_attempted, 0)
Week Truncation — always Monday: DATE_TRUNC(DATE(col), WEEK(MONDAY))
Time Columns:
| Pattern | Meaning |
|---|---|
*_raw | Uncapped — inflated by timers left on |
*_capped | Capped at 1.5x time limit — best for analysis |
payable_* | Capped at time limit — matches billing |
NOT LOWER(email) LIKE '%@joinhandshake.com%'active = TRUE (activity within last 7 days)r.status = 'PROCESSED'LOWER(p.status) = 'verified'Every ops/fellow query MUST include these columns, in this order.
| # | Column | Source |
|---|---|---|
| 1 | profile_id | hai_profiles_dim or profiles |
| 2 | email | hai_public.profiles |
| 3 | first_name | hai_profiles_dim or profiles |
| 4 | last_name | hai_profiles_dim or profiles |
| 5 | status | hai_public.profiles |
| 6 | current_onboarding_stage | hai_public.profiles |
| 7 | resume_url_in_product | hai_profiles_dim |
| 8 | highest_education_level | hai_profiles_dim |
| 9 | domain | hai_profiles_dim |
| 10 | subdomain | hai_profiles_dim |
| # | Column | Source | Purpose |
|---|---|---|---|
| 11 | available | Computed from eligibility CTEs (see references/eligibility.md) | Final availability verdict: Available - Idle, Available - Project Paused, or Unavailable - Active |
| 12 | current_project | fact_fellow_status | Shows what project the fellow is on (context for availability) |
| 13 | last_activity | fact_fellow_status | Last activity date (idle if 20+ days ago) |
| 14 | otter_ringfenced | fact_fellow_status | TRUE if Otter activity in last 30 days |
| 15 | on_hold | CASE WHEN oh.profile_id IS NOT NULL THEN TRUE ELSE FALSE END | TRUE if fellow is on the on-hold sheet |
| 16 | opt_cpt | survey_opt CTE | TRUE if fellow requires OPT/CPT sponsorship |
| 17 | country_code | hai_public.profiles | Must be US for eligibility |
Columns 12–17 are the eligibility breakdown — they show the underlying data behind the available verdict so anyone reading the CSV can verify the logic without re-running the query.
After the standard columns above, add:
major, resume_degree, resume_field_of_study if filtering by education)The complete SQL that produces columns 1–17 (CTEs, SELECT, JOINs, and WHERE) is in references/eligibility.md under "Standard Output Query". Use that as your base query and extend it — do not write the column logic from scratch.
When the user asks to export results to CSV, you MUST follow this exact template.
ls /Users/*/Library/CloudStorage/GoogleDrive-* 2>/dev/null
If no Drive found, skip export (results are already in the terminal).
mkdir -p "<gdrive_path>/My Drive/claude-bq"
printf '# source_tables: <tables>\n# query_date: YYYY-MM-DD\n# query: <summary>\n' > "<path>/YYYY-MM-DD_<description>.csv"
bq query --format=csv --use_legacy_sql=false <<'EOF' >> "<path>/YYYY-MM-DD_<description>.csv"
SELECT ...
EOF
wc -l < "<path>/YYYY-MM-DD_<description>.csv"
Report the row count to the user (subtract 4 for the metadata header lines + CSV column header).
YYYY-MM-DD_<description>.csv (lowercase, hyphens, max 60 chars)# source_tables, # query_date, and # query linesprintf for header, heredoc for query, >> to append — do NOT use cat heredocs for the metadata, $() substitution, or backslash-escaped paths (see Error Prevention)Always cite source table(s) when presenting results:
Sources:
hs-ai-production.hai_dev.fact_fellow_perf,hs-ai-production.hai_dev.fact_task_activity
| File | Contents | Read when... |
|---|---|---|
| references/eligibility.md | Full eligibility filter + education dual-source rule | User asks about eligible/available fellows, or any education query |
| references/fact-tables.md | Column schemas for fact tables (fellow perf, tasks, reviewer perf, block values) | You need exact column names, types, or join keys |
| references/otter-tables.md | Column schemas for 5 Otter/Feather tables | You need Otter table schemas |
| references/dimension-tables.md | Schemas for hai_profiles_dim, hai_user_growth_dim, and hai_public tables (resumes, profiles) | You need profile dimensions or resume data |
| references/reddit-ads-tables.md | Column schemas for 24 Reddit Ads tables (Fivetran sync) | You need Reddit ad performance, conversions, or targeting data |
| references/fact-paid-marketing.md | Unified daily ad-level spend/impressions/clicks across LinkedIn, Meta, Reddit, Google + Indeed (diamond_growth_indeed) | Paid marketing spend, cross-channel spend comparison, CTR, CPM, Indeed job spend |
| references/fact-hai-attribution.md | Best attribution table — enriched UTM/campaign/ad/adset names, Indeed backfill, funnel stage definitions, cost metric formulas | Attribution, sign-ups by channel/campaign, cost per sign-up/FO/allocated/activated |
| references/lifecycle-comms.md | Schema, efficiency rules, and query patterns for lifecycle_communication_messages (~13B rows) | Lifecycle comms, email/push engagement, onboarding emails, HAI communications |
| references/query-patterns.md | Fellow search, resume patterns, funnel analysis, active counts, cross-table joins | You're writing a fellow search, resume, or funnel query |
| references/query-patterns-otter.md | Otter/Feather SQL patterns: approval rates, campaign health, cross-table joins | You're writing any Otter or Feather SQL |
| references/onboarding-funnel-drip-campaign-setup.md | Event-driven drip sync templates (one row per fellow × milestone, keyed by event_id). HAI-only and HAI+Otter variants. | User asks for funnel flags, Census sync query, Iterable drip segments, or screener step flags |