Run SQL queries against the attached DuckDB database or ad-hoc against files. Accepts raw SQL or natural language questions. Uses DuckDB Friendly SQL idioms.
You are helping the user query data using DuckDB.
Input: $@
Follow these steps in order.
Look for an existing state file in either location:
STATE_DIR=""
test -f .duckdb-skills/state.sql && STATE_DIR=".duckdb-skills"
PROJECT_ROOT="$(git rev-parse --show-toplevel 2>/dev/null || echo "$PWD")"
PROJECT_ID="$(echo "$PROJECT_ROOT" | tr '/' '-')"
test -f "$HOME/.duckdb-skills/$PROJECT_ID/state.sql" && STATE_DIR="$HOME/.duckdb-skills/$PROJECT_ID"
If found, verify the databases it references are still accessible:
duckdb -init "$STATE_DIR/state.sql" -c "SHOW DATABASES;"
Now determine the mode:
--file flag is present, or the SQL references file paths/literals (e.g. ), or is empty.FROM 'data.csv'STATE_DIRSTATE_DIR is set and the input references table names, is natural language, or is SQL without file references.If no state file exists and no file is referenced, fall back to ad-hoc mode against :memory: — the user must reference files directly in their SQL.
If the state file exists but any ATTACH in it fails, warn the user and fall back to ad-hoc mode.
command -v duckdb
If not found, delegate to /duckdb-skills:install-duckdb and then continue.
If the input is natural language (not valid SQL), generate SQL using the Friendly SQL reference below.
In session mode, first retrieve the schema to inform query generation:
duckdb -init "$STATE_DIR/state.sql" -csv -c "
SELECT table_name FROM duckdb_tables() ORDER BY table_name;
"
Then for relevant tables:
duckdb -init "$STATE_DIR/state.sql" -csv -c "DESCRIBE <table_name>;"
Use the schema context and the Friendly SQL reference to generate the most appropriate query.
Before executing, estimate whether the query could produce a very large result that would consume excessive tokens when returned to this conversation.
Session mode — check row counts for the tables involved:
duckdb -init "$STATE_DIR/state.sql" -csv -c "
SELECT table_name, estimated_size, column_count
FROM duckdb_tables()
WHERE table_name IN ('<table1>', '<table2>');
"
Ad-hoc mode — probe the source:
duckdb :memory: -csv -c "
SET allowed_paths=['FILE_PATH'];
SET enable_external_access=false;
SET allow_persistent_secrets=false;
SET lock_configuration=true;
SELECT count() AS row_count FROM 'FILE_PATH';
"
Evaluate:
LIMIT, count(), or other aggregation that bounds the output -> safe, proceed.LIMIT 1000 or an aggregation to keep the output manageable."
Ask for confirmation before running as-is.Skip this step for queries that are intrinsically bounded (e.g. DESCRIBE, SUMMARIZE, aggregations, count()).
Ad-hoc mode (sandboxed — only the referenced file is accessible):
duckdb :memory: -csv <<'SQL'
SET allowed_paths=['FILE_PATH'];
SET enable_external_access=false;
SET allow_persistent_secrets=false;
SET lock_configuration=true;
<QUERY>;
SQL
Replace FILE_PATH with the actual file path extracted from the query or --file argument.
If multiple files are referenced, include all paths in the allowed_paths list.
Session mode (user-trusted database):
duckdb -init "$STATE_DIR/state.sql" -csv -c "<QUERY>"
For multi-line queries, use a heredoc with -init:
duckdb -init "$STATE_DIR/state.sql" -csv <<'SQL'
<QUERY>;
SQL
Always use heredocs (<<'SQL') for multi-line queries to avoid shell quoting issues.
Extension "X" not loaded): delegate to /duckdb-skills:install-duckdb <ext>, then retry.FROM duckdb_tables() and suggest corrections.find "$PWD" -name "<filename>" 2>/dev/null to locate the file and suggest the corrected path./duckdb-skills:duckdb-docs <error message or relevant keywords> to search the documentation for guidance, then apply the fix and retry.Show the query output to the user. If the result has more than 100 rows, note the truncation and suggest adding LIMIT to the query.
For natural language questions, also provide a brief interpretation of the results.
When generating SQL, prefer these idiomatic DuckDB constructs:
FROM table WHERE x > 10 (implicit SELECT *)LIMIT 10% returns a percentage of rowsSELECT x: 42 instead of SELECT 42 AS xcount(*)SELECT i+1 AS j, j+2 AS kcount() FILTER (WHERE x > 10) for conditional aggregationmax(col, 3) returns top 3 as a list; also arg_max(arg, val, n), min_by(arg, val, n)getvariable('x')FROM 'file.csv', FROM 'data.parquet'FROM 'data/part-*.parquet' reads multiple files'hello'.upper() or col.trim().lower()[x*2 FOR x IN list_col]col[1:3], negative indexing col[-1]SELECT s.* FROM (SELECT {'a': 1, 'b': 2} AS s)[1, 2, 3]format('{}->{}', a, b) for string formattingDROP TABLE IF EXISTS first