Archivo del skill

ClickHouse Schema Design (Core Workflow A)

Name: ClickHouse Schema Design (Core Workflow A)
Author: jeremylongshore

Design ClickHouse schemas with MergeTree engines, ORDER BY keys, and partitioning. Use when creating new tables, choosing engines, designing sort keys, or modeling data for analytical workloads. Trigger: "clickhouse schema design", "clickhouse table design", "clickhouse ORDER BY", "clickhouse partitioning", "MergeTree table".

jeremylongshore1,965 estrellas22 mar 2026

Ocupación
Categorías: Ingeniería de Datos

Contenido de la habilidad

Overview

Design ClickHouse tables with correct engine selection, ORDER BY keys, partitioning, and codec choices for analytical workloads.

Prerequisites

@clickhouse/client connected (see clickhouse-install-auth)
Understanding of your query patterns (what you filter and group on)

Instructions

Step 1: Choose the Right Engine

Engine	Best For	Dedup?	Example
`MergeTree`	General analytics, append-only logs	No	Clickstream, IoT
`ReplacingMergeTree`	Mutable rows (upserts)	Yes (on merge)	User profiles, state
`SummingMergeTree`

Skills relacionados

ClickHouse Schema Design (Core Workflow A) | Skills Pool

-- Good: filter by tenant, then by time ranges
ORDER BY (tenant_id, event_type, created_at)

-- Bad: UUID first means every query scans the full index
ORDER BY (event_id, created_at)  -- event_id is random UUID

CREATE TABLE analytics.events (
    event_id     UUID DEFAULT generateUUIDv4(),
    tenant_id    UInt32,
    event_type   LowCardinality(String),
    user_id      UInt64,
    session_id   String,
    properties   String CODEC(ZSTD(3)),  -- JSON blob, compress well
    url          String CODEC(ZSTD(1)),
    ip_address   IPv4,
    country      LowCardinality(FixedString(2)),
    created_at   DateTime64(3) DEFAULT now64(3)
)
ENGINE = MergeTree()
ORDER BY (tenant_id, event_type, toDate(created_at), user_id)
PARTITION BY toYYYYMM(created_at)
TTL created_at + INTERVAL 1 YEAR
SETTINGS index_granularity = 8192;

CREATE TABLE analytics.users (
    user_id      UInt64,
    email        String,
    plan         LowCardinality(String),
    mrr_cents    UInt32,
    properties   String CODEC(ZSTD(3)),
    updated_at   DateTime DEFAULT now()
)
ENGINE = ReplacingMergeTree(updated_at)   -- keeps latest row per ORDER BY key
ORDER BY user_id;

-- Query with FINAL to get deduplicated results
SELECT * FROM analytics.users FINAL WHERE user_id = 42;

CREATE TABLE analytics.daily_stats (
    date         Date,
    tenant_id    UInt32,
    event_type   LowCardinality(String),
    event_count  UInt64,
    unique_users AggregateFunction(uniq, UInt64)
)
ENGINE = AggregatingMergeTree()
ORDER BY (tenant_id, event_type, date);

Partition Expression	Typical Use	Parts Per Partition
`toYYYYMM(date)`	Most common — monthly	Target 10-1000
`toMonday(date)`	Weekly rollups	More parts, finer drops
`toYYYYMMDD(date)`	Daily TTL drops	Many parts — use carefully
None	Small tables (<1M rows)	Fine

-- Column-level compression codecs
column1  UInt64 CODEC(Delta, ZSTD(3)),      -- Time series / sequential IDs
column2  Float64 CODEC(Gorilla, ZSTD(1)),   -- Floating point (similar values)
column3  String CODEC(ZSTD(3)),              -- General text / JSON
column4  DateTime CODEC(DoubleDelta, ZSTD),  -- Timestamps (near-sequential)

import { createClient } from '@clickhouse/client';

const client = createClient({ url: process.env.CLICKHOUSE_HOST! });

async function applySchema() {
  await client.command({ query: 'CREATE DATABASE IF NOT EXISTS analytics' });

  await client.command({
    query: `
      CREATE TABLE IF NOT EXISTS analytics.events (
        event_id   UUID DEFAULT generateUUIDv4(),
        tenant_id  UInt32,
        event_type LowCardinality(String),
        user_id    UInt64,
        payload    String CODEC(ZSTD(3)),
        created_at DateTime DEFAULT now()
      )
      ENGINE = MergeTree()
      ORDER BY (tenant_id, event_type, created_at)
      PARTITION BY toYYYYMM(created_at)
    `,
  });

  console.log('Schema applied.');
}

Error	Cause	Solution
`ORDER BY expression not in primary key`	PRIMARY KEY != ORDER BY	Remove explicit PRIMARY KEY or align
`Too many parts (300+)`	Over-partitioning	Use coarser partition expression
`Cannot convert String to UInt64`	Wrong data type	Match insert types to schema
`TTL expression type mismatch`	TTL on non-date column	TTL must reference DateTime column

ClickHouse Schema Design (Core Workflow A)

Overview

Prerequisites

Instructions

Step 1: Choose the Right Engine

ClickHouse Schema Design (Core Workflow A)

Overview

Prerequisites

Instructions

Step 1: Choose the Right Engine

Step 2: Design the ORDER BY (Sort Key)

Step 3: Schema Examples

Event Analytics Table

User Profile Table (Upserts)

Daily Aggregation Table

Step 4: Partitioning Guidelines

Step 5: Codecs and Compression

Applying Schema via Node.js

Error Handling

Resources

Next Steps

Clickhouse Io

Clickhouse Io

Claude Devfleet

Clickhouse Io

Ai First Engineering

Postgres Patterns