DuckDB for Python data engineering and React analytics applications. Use when working with SQL queries on local/remote data, building full-stack data apps with DuckDB-WASM in the browser, analyzing parquet/csv/json files without a database server, setting up OLAP pipelines with pandas/Polars integration, or creating React components for in-browser analytics. Covers Python API, WASM bindings, extensions (httpfs, json, spatial), and performance optimization for single-node analytics.
Complete guide for DuckDB in both Python data pipelines and React browser applications.
Choose your domain:
| I want to... | Go to |
|---|---|
| Query data in Python scripts | Python Overview |
| Build lazy data pipelines | Python Relational API |
| Load CSV/Parquet/JSON | Python Data Ingestion |
| Add a React analytics dashboard | WASM + React |
| Check syntax quickly | Quick Reference |
| Fix common errors | Troubleshooting |
Python:
pip install duckdb pandas pyarrow
# or with Polars support
pip install duckdb polars pyarrow
React/WASM:
npm install @duckdb/duckdb-wasm
This skill includes helper scripts in scripts/:
| Script | Purpose |
|---|---|
validate_sql.py | Validate DuckDB SQL syntax without executing |
generate_types.py | Generate TypeScript types from SQL queries |
Usage:
# Validate SQL syntax
python scripts/validate_sql.py "SELECT * FROM events WHERE date > '2024-01-01'"
# Generate TypeScript types
python scripts/generate_types.py "SELECT id, name, price FROM products" --name Product --output types.ts
import duckdb
# In-memory (fastest for <10GB)
conn = duckdb.connect()
# Persistent database
conn = duckdb.connect("analytics.db")
# Context manager (recommended)
with duckdb.connect("analytics.db") as conn:
result = conn.execute("SELECT * FROM events").fetchdf()
Critical: Connections are not thread-safe. Create one connection per thread.
# CSV files
df = conn.execute("SELECT * FROM 'data.csv' LIMIT 100").fetchdf()
# Parquet files (columnar, faster)
df = conn.execute("SELECT * FROM 'data.parquet'").fetchdf()
# Arrow directly (zero-copy)
arrow_table = conn.execute("SELECT * FROM events").fetch_arrow_table()
# Pandas DataFrame
conn.execute("CREATE TABLE events AS SELECT * FROM df")
import * as duckdb from '@duckdb/duckdb-wasm';
// Initialize in useEffect
const initDb = async () => {
const JSDELIVR_BUNDLES = duckdb.getJsDelivrBundles();
const bundle = await duckdb.selectBundle(JSDELIVR_BUNDLES);
const worker_url = URL.createObjectURL(
new Blob([`importScripts("${bundle.mainWorker}");`], {type: 'text/javascript'})
);
const worker = new Worker(worker_url);
const logger = new duckdb.ConsoleLogger();
const db = new duckdb.AsyncDuckDB(logger, worker);
await db.instantiate(bundle.mainModule, bundle.pthreadWorker);
return db;
};
Essential extensions for common workflows:
| Extension | Purpose | Install |
|---|---|---|
| httpfs | S3/HTTPS/GCS access | INSTALL httpfs; LOAD httpfs; |
| json | JSON parsing | Built-in (DuckDB 0.10+) |
| parquet | Parquet I/O | Built-in |
| spatial | Geospatial queries | INSTALL spatial; LOAD spatial; |
| fts | Full-text search | INSTALL fts; LOAD fts; |
INSTALL httpfs;
LOAD httpfs;
SET s3_region='us-east-1';
SET s3_access_key_id='...';
SET s3_secret_access_key='...';
SELECT * FROM 's3://bucket/data.parquet' LIMIT 100;
CREATE INDEX idx ON table(column);SET preserve_insertion_order=false;Python developers:
React developers:
Everyone:
scripts/generate_types.py| From | To DuckDB |
|---|---|
| SQLite | Same SQL dialect, faster analytics, better type support |
| pandas | df = conn.execute("...").fetchdf() |
| Polars | pl_df = conn.execute("...").fetch_pl() |
| PostgreSQL | Most SQL compatible, embedded (no server) |
| BigQuery | Similar SQL, local execution |