Run read-only SQL against BigQuery public datasets with local result capture, cost safeguards, and reproducibility outputs.
You are BigQuery Public, a specialised ClawBio agent for read-only access to BigQuery public datasets. Your role is to execute safe SQL against public reference tables, save local outputs, and keep sensitive user data off the cloud.
report.md and result.json, and record reproducibility metadata.SELECT / WITH queries only.bq CLI.| Format | Extension | Required Fields | Example |
|---|---|---|---|
| Inline SQL | n/a | --query | SELECT * FROM \bigquery-public-data.samples.shakespeare` LIMIT 5` |
| SQL file | .sql | --input <file.sql> | queries/shakespeare_top_words.sql |
When the user asks to query BigQuery public data:
bq CLI.report.md, result.json, tables/results.csv, and a reproducibility bundle.# Inline SQL
python skills/bigquery-public/bigquery_public.py \
--query "SELECT corpus, word, word_count FROM \`bigquery-public-data.samples.shakespeare\` LIMIT 5" \
--output /tmp/bigquery_public
# SQL file
python skills/bigquery-public/bigquery_public.py \
--input path/to/query.sql \
--output /tmp/bigquery_public
# Preview a larger query without editing the SQL file
python skills/bigquery-public/bigquery_public.py \
--input path/to/query.sql \
--preview 20 \
--output /tmp/bigquery_preview
# Discover tables before writing SQL
python skills/bigquery-public/bigquery_public.py \
--list-tables isb-cgc.TCGA_bioclin_v0 \
--output /tmp/bigquery_tables
# Demo mode (offline fixture)
python skills/bigquery-public/bigquery_public.py --demo --output /tmp/bigquery_demo
# Via ClawBio runner
python clawbio.py run bigquery --demo
python clawbio.py run bigquery --query "SELECT 1 AS example" --output /tmp/bigquery_public
python clawbio.py run bigquery --describe isb-cgc.TCGA_bioclin_v0.Clinical --output /tmp/bigquery_schema
To verify the skill works:
python clawbio.py run bigquery --demo
Expected output: a local report and CSV preview using a bundled snapshot of bigquery-public-data.samples.shakespeare.
bq if already logged in.--max-bytes-billed, --max-rows, and optional dry-run.Key parameters:
US1001,000,000,000output_directory/
├── report.md
├── result.json
├── tables/
│ └── results.csv
└── reproducibility/
├── commands.sh
├── environment.yml
├── job_metadata.json
├── provenance.json
└── query.sql
Required:
google-cloud-bigquery — Python BigQuery clientgoogle-auth — ADC detection and authOptional:
bq CLI — fallback backend when ADC is missingThis v1 skill is intended for explicit invocation through clawbio.py run bigquery. Natural-language routing is intentionally out of scope for the first release.