SQLite PRAGMA tuning and schema patterns for high-throughput bulk insert at 100K+ row scale
When bulk-inserting into SQLite databases that grow beyond ~200MB (100K+ files, millions of rows), default PRAGMAs and schema constraints cause severe per-row cost growth. This skill covers the PRAGMA stack, index management, and constraint handling needed for bulk-insert throughput.
// Connection defaults (always set):
PRAGMA journal_mode=WAL;
PRAGMA busy_timeout=5000;
PRAGMA synchronous=NORMAL; // safe default
PRAGMA cache_size=-65536; // 64MB
PRAGMA mmap_size=268435456; // 256MB
// Bulk/turbo mode (set before bulk insert begins):
PRAGMA synchronous=OFF; // skip WAL fsync — safe against app crash, not OS crash
PRAGMA wal_autocheckpoint=0; // no mid-run checkpoints — single checkpoint at end
PRAGMA temp_store=MEMORY; // temp tables in RAM
PRAGMA cache_size=-131072; // 128MB page cache
PRAGMA mmap_size=2147483648; // 2GB — must cover expected final DB size
PRAGMA foreign_keys=OFF; // disable FK checks during bulk insert
DROP INDEX IF EXISTS ...)stable_key TEXT UNIQUE) cannot be dropped — they survive DROP INDEX. For highest throughput, create tables without UNIQUE during bulk, then add constraint after.Set mmap_size ≥ expected final database size. When the DB exceeds the mmap window, every B-tree page access beyond the window falls back to read() system calls. This is the primary cause of non-linear per-row cost growth.
From codetopo: src/db/connection.h (enable_turbo), src/db/schema.h (drop_bulk_indexes/rebuild_indexes), src/cli/cmd_index.cpp (bulk_mode flag gating).
config.turbo = true) without calling the function that applies the PRAGMAs. Always verify flags reach their codepath with a test or log.last_insert_rowid(), FK violations are structurally impossible. Disable them.read() calls instead of memory-mapped access.