Run SQL queries against DuckDB databases or ad-hoc files. Accepts raw SQL or natural language. Use whenever the user wants to query data, explore tables, run analytics, or ask questions about datasets — even if they don't say "DuckDB" explicitly.
Input: $@
STATE_DIR=""
test -f .duckdb-skills/state.sql && STATE_DIR=".duckdb-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.duckdb-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.duckdb-skills/$PROJECT_ID"
Mode: Ad-hoc if --file flag present, SQL references file paths, or no state. Session if state exists and input references tables/is natural language.
For natural language, first get schema context (session mode):
pixi run duckdb -init "$STATE_DIR/state.sql" -csv -c "SELECT table_name FROM duckdb_tables() ORDER BY table_name;"
pixi run duckdb -init "$STATE_DIR/state.sql" -csv -c "DESCRIBE <table_name>;"
Skip for DESCRIBE, SUMMARIZE, aggregations, or queries with LIMIT. For unbounded queries on >1M rows: suggest adding LIMIT or aggregation before running.
Ad-hoc (sandboxed):
pixi run duckdb :memory: -csv <<'SQL'
SET allowed_paths=['FILE_PATH'];
SET enable_external_access=false;
SET allow_persistent_secrets=false;
SET lock_configuration=true;
<QUERY>;
SQL
Session:
pixi run duckdb -init "$STATE_DIR/state.sql" -csv <<'SQL'
<QUERY>;
SQL
FROM duckdb_tables()find "$PWD" -name "<filename>"FROM table WHERE x > 10 — implicit SELECT *GROUP BY ALL / ORDER BY ALL — auto-detect columnsSELECT * EXCLUDE (col1, col2) / REPLACE (expr AS col)UNION ALL BY NAME — combine tables with different column ordersLIMIT 10% — percentage limitSELECT x: 42 — prefix alias syntaxcount() instead of count(*)SELECT i+1 AS j, j+2 AS kCOLUMNS(*) with regex, EXCLUDE, REPLACE, lambdasFILTER (WHERE ...) for conditional aggregationmax(col, 3) — top-N as list; arg_max(arg, val, n), min_by(arg, val, n)SET VARIABLE x = expr -> getvariable('x')FROM 'file.csv', FROM 'data.parquet'FROM 'data/part-*.parquet''hello'.upper(), col.trim().lower()[x*2 FOR x IN list_col]col[1:3], col[-1]SELECT s.* FROM (SELECT {'a': 1} AS s)format('{}->{}', a, b)CREATE OR REPLACE TABLE — no DROP neededCREATE TABLE ... AS SELECTINSERT INTO ... BY NAME — match by column nameINSERT OR IGNORE INTO / INSERT OR REPLACE INTO