Optimize ClickHouse Cloud costs — compute scaling, storage tiering, compression, and query efficiency for lower bills. Use when analyzing ClickHouse Cloud bills, reducing storage costs, or optimizing compute utilization. Trigger: "clickhouse cost", "clickhouse billing", "reduce clickhouse spend", "clickhouse pricing", "clickhouse expensive", "clickhouse storage cost".
Reduce ClickHouse Cloud costs through storage optimization, compression tuning, TTL policies, compute scaling, and query efficiency improvements.
| Component | Pricing Model | Key Driver |
|---|---|---|
| Compute | Per-hour per replica | vCPU + memory tier |
| Storage | Per GB-month | Compressed data on disk |
| Network | Per GB egress | Query result sizes |
| Backups | Per GB stored | Backup retention |
ClickHouse bills on storage, and ClickHouse compresses extremely well (often 10-20x). Your cost driver is usually compute, not storage.
-- Storage cost breakdown by table
SELECT
database,
table,
formatReadableSize(sum(bytes_on_disk)) AS compressed_size,
formatReadableSize(sum(data_uncompressed_bytes)) AS raw_size,
round(sum(data_uncompressed_bytes) / sum(bytes_on_disk), 1) AS compression_ratio,
sum(rows) AS total_rows,
count() AS parts
FROM system.parts
WHERE active
GROUP BY database, table
ORDER BY sum(bytes_on_disk) DESC;
-- Storage by column (find bloated columns)
SELECT
table,
column,
type,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
formatReadableSize(sum(column_data_uncompressed_bytes)) AS raw,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics'
GROUP BY table, column, type
ORDER BY sum(column_data_compressed_bytes) DESC
LIMIT 30;
-- Check current codec per column
SELECT name, type, compression_codec
FROM system.columns
WHERE database = 'analytics' AND table = 'events';
-- Apply better codecs to large columns
ALTER TABLE analytics.events
MODIFY COLUMN properties String CODEC(ZSTD(3)); -- JSON blobs
ALTER TABLE analytics.events
MODIFY COLUMN created_at DateTime CODEC(DoubleDelta, ZSTD); -- Timestamps
ALTER TABLE analytics.events
MODIFY COLUMN user_id UInt64 CODEC(Delta, ZSTD); -- Sequential IDs
-- Verify improvement after next merge
OPTIMIZE TABLE analytics.events FINAL;
-- Check new compression ratio
SELECT
column,
formatReadableSize(sum(column_data_compressed_bytes)) AS compressed,
round(sum(column_data_uncompressed_bytes) / sum(column_data_compressed_bytes), 1) AS ratio
FROM system.parts_columns
WHERE active AND database = 'analytics' AND table = 'events'
GROUP BY column ORDER BY sum(column_data_compressed_bytes) DESC;
-- Expire old data automatically (reduces storage)
ALTER TABLE analytics.events
MODIFY TTL created_at + INTERVAL 90 DAY;
-- Move old data to cheaper storage tier (ClickHouse Cloud)
ALTER TABLE analytics.events
MODIFY TTL
created_at + INTERVAL 30 DAY TO VOLUME 'hot',
created_at + INTERVAL 90 DAY TO VOLUME 'cold',
created_at + INTERVAL 365 DAY DELETE;
-- Drop entire partitions manually (fastest way to delete bulk data)
ALTER TABLE analytics.events
DROP PARTITION '202401'; -- Drops January 2024
-- Check TTL status
SELECT database, table, result_ttl_expression
FROM system.tables
WHERE database = 'analytics';
-- ClickHouse Cloud: Scale compute dynamically
-- Configure in Cloud Console:
-- - Auto-scaling: min 2 / max 8 replicas
-- - Idle timeout: 5 minutes (auto-suspend when no queries)
-- - Use "Development" tier for staging environments
-- Reduce per-query compute consumption
SET max_threads = 4; -- Use fewer cores per query
SET max_memory_usage = 5000000000; -- 5GB cap per query
-- Server-side async inserts (reduces insert compute)
SET async_insert = 1;
SET async_insert_max_data_size = 10000000; -- Flush at 10MB
SET async_insert_busy_timeout_ms = 5000; -- or every 5 seconds
-- Find the most expensive queries (by data scanned)
SELECT
normalized_query_hash,
count() AS executions,
formatReadableSize(sum(read_bytes)) AS total_read,
round(avg(query_duration_ms)) AS avg_ms,
any(substring(query, 1, 200)) AS sample
FROM system.query_log
WHERE type = 'QueryFinish'
AND event_time >= now() - INTERVAL 7 DAY
GROUP BY normalized_query_hash
ORDER BY sum(read_bytes) DESC
LIMIT 20;
-- Use materialized views to avoid repeated full scans
-- Instead of: SELECT count() FROM events WHERE date = today()
-- Pre-compute:
-- CREATE MATERIALIZED VIEW daily_counts_mv TO daily_counts AS
-- SELECT toDate(created_at) AS date, count() AS cnt FROM events GROUP BY date;
-- Then: SELECT cnt FROM daily_counts WHERE date = today()
-- Use PREWHERE to read less data
SELECT user_id, properties FROM analytics.events
PREWHERE event_type = 'purchase' -- Filter first, read fewer columns
WHERE created_at >= today() - 7;
// Track query costs in your application
async function queryWithCostTracking<T>(
client: ReturnType<typeof import('@clickhouse/client').createClient>,
sql: string,
): Promise<{ rows: T[]; cost: { readRows: number; readBytes: number; durationMs: number } }> {
const start = Date.now();
const rs = await client.query({ query: sql, format: 'JSONEachRow' });
const rows = await rs.json<T>();
const durationMs = Date.now() - start;
// Log for cost analysis
console.log({
query: sql.slice(0, 100),
readRows: rs.response_headers['x-clickhouse-summary']
? JSON.parse(rs.response_headers['x-clickhouse-summary']).read_rows
: 'unknown',
durationMs,
});
return { rows, cost: { readRows: 0, readBytes: 0, durationMs } };
}
max_threads limited for non-critical queriesasync_insert enabled for high-frequency small insertssystem.query_log analysis| Issue | Cause | Solution |
|---|---|---|
| Storage growing fast | No TTL, no drops | Add TTL or schedule partition drops |
| High compute bill | Full-scan queries | Add materialized views, fix ORDER BY |
| Egress charges | Large result sets | Add LIMIT, use aggregations |
| Idle compute cost | No auto-suspend | Enable idle timeout in Cloud console |
For architecture patterns, see clickhouse-reference-architecture.