Designs and runs DuneSQL regression queries comparing CI test_schema tables to production spells after pipeline-only dbt changes; after SQL is tailored to the feature branch and CI table name, uses Dune MCP to execute queries and validate parity. Use for prod vs CI regression, lineage parity checks, or row/metric validation when data should not drift; invoke manually per branch—not on every edit.
test_schema.git_dunesql_<hash>_<schema>_<alias>.on: pull_request (e.g. ). For those runs, in is the GitHub builds for — the PR branch head shown in the PR commits list. So the CI hash (e.g. ).
then .
copy the hash from (workflow run / logs). Workflows triggered by (if any) would use the pushed commit—then matches.tokens.yml${{ github.sha }}dbt_run.ymlrefs/pull/<N>/mergeorigin/<branch> tip (e.g. 4ed0409…) will not match8d61409…git fetch origin pull/<PR_NUMBER>/mergegit rev-parse FETCH_HEAD | tr - _ | cut -c1-7pushorigin/<branch>{custom_schema}_{alias} from the model config (tokens.transfers → tokens_transfers). Confirm in dbt run initial model(s) when unsure.Spellbook note: see also .cursor/skills/debug-ci/SKILL.md for CI context. Never embed API keys in the skill.
After SQL is built from branch context (PR merge SHA or correct github.sha source + {schema}_{alias}, filters):
mcps/user-dune/tools/ (or the enabled Dune server’s tool descriptors), open the JSON for each tool you call so arguments match the contract.createDuneQuery: create a temporary query (is_temp defaults true) with a clear name, the full DuneSQL query text, and optional description. Capture the returned query_id.executeQueryById: run with that query_id (and performance if needed). Capture execution_id from the response.getExecutionResults: pass executionId (ULID), increase timeout for heavy scans, use limit for previews. Interpret state: COMPLETED → check data.rows and resultMetadata.totalRowCount; FAILED → use errorMessage / errorMetadata to fix SQL and repeat.Run exploratory SQL (min dates, raw counts) and the final diff query through the same pipeline. If MCP is unavailable, fall back to python scripts/dune_query.py (repo root, DUNE_API_KEY in .env).
Partitioning: Dune tooling expects filters on partition columns (e.g. block_date) where applicable—keep regression windows as tight as the comparison allows.
git diff, dbt compile, _schema.yml, and model SQL to list relevant columns (partition keys, block_date, block_time, block_number, metrics like amount_usd, etc.). When in doubt, inspect the Dune table schema or compiled SQL for CI and prod.count(1) (rows) and sum(amount_usd) (or the spell’s primary USD column). That pair usually surfaces pipeline issues quickly. Extend or swap metrics from the schema (e.g. sum(amount_raw), volume fields) per lineage—tokens-style spells are the template, not a universal rule.group by / join keys) that exists on both CI and prod. It is not always blockchain: common Spellbook grains include blockchain, block_month, block_date, project, or combinations (e.g. blockchain + block_date). Match what the spell actually keys on for the question you are answering.block_month / block_date slices) to confirm totals and distributions before joining on a row-level unique key (unique_key, tx_hash + evt_index, etc.). Cheap grain checks catch most pipeline regressions; unique-key diffs are for pinpointing survivors.count(1) and other numeric columns that make sense from the schema.Dynamic filters (required): Do not copy example block_date / block_number literals from docs or chat. Always run phase 1 on Dune (or MCP), read the result set, then substitute the returned bounds into every later query. Examples in reference.md use placeholders <MIN_BLOCK_DATE>, <MIN_BLOCK_NUMBER> for that reason.
select min(block_date) as min_block_date, min(block_number) as min_block_number from <ci_table> where block_date != current_datewhere fragment (same on all CTEs):block_date != current_date and block_date >= date '<MIN_BLOCK_DATE>' and block_number >= <MIN_BLOCK_NUMBER> — adjust or drop block_number if the model is not chain-scoped that way.count(1) and sum(<metric>) on prod and ci with the identical where; row count and metric totals should match before trusting a grain inner join.where, same group by grain, same metrics (e.g. count(1), sum(amount_usd)).inner join on the full grain (all group by columns). Use full outer join temporarily to inspect missing keys.abs(diff_rows) > 0 or abs(diff_usd) > <tolerance> (e.g. 5 for float noise) only after spot-checking unfiltered join output.Build the shared where from phase 1 bounds (see workflow). Keep it on every CTE; join only on grain keys.
| Output | Meaning |
|---|---|
| CI row count | count(1) on CI with full shared where. |
| Prod row count | Same on prod. |
| Row-count diff | prod - ci (expect 0 before grain join). |
| CI raw metric | e.g. sum(amount_usd) on CI with the same where (spell-dependent column). |
| Prod raw metric | Same on prod. |
| Raw metric abs diff | e.g. abs(prod_usd - ci_usd) (use tolerance for floats). |
| Inner-join grain count | count(1) from prod_agg inner join ci_agg without diff filter. |
| Inner-join metric sums | sum(prod grain total_usd) vs sum(ci grain total_usd) over that join (should match raw totals when every row maps to one grain and keys align). |
| Diff grain count | count(1) from the join with diff filter on rows / USD. 0 ⇒ pass. |
Optional: count(distinct …) per grain column on each side vs inner-join count.
See reference.md for phase 1 SQL, placeholder filters, grain diff query, and a rollup that returns counts + USD in one row.
with ci as (
select
<grain_columns>
, count(1) as total_rows
, sum(<metric_column>) as total_metric
from
test_schema.git_dunesql_<GIT_HASH>_<schema>_<alias>
where
block_date != current_date
and block_date >= date '<MIN_BLOCK_DATE>'
and <optional_aligned_predicates_e_g_block_number>
group by
<grain_columns>
)
, prod as (
select
<grain_columns>
, count(1) as total_rows
, sum(<metric_column>) as total_metric
from
<prod_catalog>.<prod_schema>.<prod_alias>
where
block_date != current_date
and block_date >= date '<MIN_BLOCK_DATE>'
and <same_optional_predicates_as_ci>
group by
<grain_columns>
)
select
<compare_columns>
from prod
inner join ci
on <join_keys>
where
(
abs(prod.total_rows - ci.total_rows) > 0
or abs(prod.total_metric - ci.total_metric) > <tolerance>
)
order by
1
The same workflow applies in any dbt + Dune (or Trino) repo: adjust prod relation (catalog.schema.table), CI schema/table prefix, and column names. Copy this folder to ~/.cursor/skills/dbt-prod-ci-regression/ for a personal default.
See reference.md for a filled tokens.transfers example and exploratory snippets.