Write efficient BigQuery SQL with partition pruning, clustering, cost estimation, and slot management. Use when writing or optimizing BigQuery queries, designing partitioned/clustered tables, estimating query cost with dry-run, or debugging slot contention. Do not use for PostgreSQL/MySQL queries (prefer orm-patterns) or real-time OLTP workloads.
Write efficient BigQuery SQL with proper partitioning, clustering, cost estimation, and query optimization.
--dry_runorm-patternsbq query --dry_run --use_legacy_sql=false 'SELECT ...' to see bytes scanned._PARTITIONTIMEDATETIMESTAMPWHEREuser_id, event_name) after partitioning.SELECT * scans all columns and inflates cost.JOIN with explicit keys. Check INFORMATION_SCHEMA.JOBS for bytes billed.APPROX_COUNT_DISTINCT — for cardinality estimates on large tables, ~2% error but 10x faster.INFORMATION_SCHEMA.JOBS_BY_PROJECT for total_slot_ms to find expensive queries.CREATE TABLE project.dataset.events (
event_date DATE NOT NULL,
event_name STRING NOT NULL,
user_id STRING,
properties JSON,
created_at TIMESTAMP NOT NULL
)
PARTITION BY event_date
CLUSTER BY event_name, user_id
OPTIONS (
partition_expiration_days = 365,
require_partition_filter = true
);
# Dry run to check bytes scanned (cost = bytes * $6.25/TB on-demand)
bq query --dry_run --use_legacy_sql=false \
'SELECT user_id, COUNT(*) FROM project.dataset.events
WHERE event_date BETWEEN "2024-01-01" AND "2024-01-31"
GROUP BY 1'
# Check actual cost of recent queries
SELECT
job_id,
total_bytes_billed / POW(1024, 4) AS tb_billed,
total_slot_ms / 1000 AS slot_seconds
FROM `region-us`.INFORMATION_SCHEMA.JOBS_BY_PROJECT
WHERE creation_time > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
ORDER BY total_bytes_billed DESC
LIMIT 10;
require_partition_filter = true on large tables — prevents full-table scans.APPROX_ functions for dashboards; exact aggregates for financial data.MERGE over DELETE + INSERT for upserts — single-pass atomic operation.orm-patterns — OLTP database patterns