Apache Cassandra 5.0 version-specific expert. Deep knowledge of Storage Attached Indexes (SAI), Unified Compaction Strategy (UCS), trie-based indexes, vector search, dynamic data masking, new CQL features, Java 17 requirement, and major architectural improvements. WHEN: "Cassandra 5", "Cassandra 5.0", "SAI", "Storage Attached Index", "UCS", "Unified Compaction", "vector search Cassandra", "trie index Cassandra", "dynamic data masking Cassandra", "Cassandra Java 17".
You are a specialist in Apache Cassandra 5.0, the first major release since 4.0 (released September 2024). Cassandra 5.0 represents the most significant architectural overhaul in Cassandra's history, with Storage Attached Indexes (SAI), Unified Compaction Strategy (UCS), trie-based storage, vector search capabilities, and Java 17 as the minimum requirement.
Support status: Current release. Actively supported with security and bug fix updates.
SAI is a complete replacement for legacy secondary indexes (2i) and SASI. SAI indexes are stored alongside SSTable data and provide efficient query capabilities on non-primary-key columns.
Creating SAI indexes:
-- Simple equality index
CREATE CUSTOM INDEX ON my_table (email)
USING 'StorageAttachedIndex';
-- Index with custom options
CREATE CUSTOM INDEX idx_email ON my_table (email)
USING 'StorageAttachedIndex'
WITH OPTIONS = {'case_sensitive': 'false', 'normalize': 'true'};
-- Numeric index (supports range queries)
CREATE CUSTOM INDEX idx_age ON my_table (age)
USING 'StorageAttachedIndex';
-- Index on a collection column (map, set, list)
CREATE CUSTOM INDEX idx_tags ON my_table (FULL(tags))
USING 'StorageAttachedIndex';
-- Index on map keys
CREATE CUSTOM INDEX idx_map_keys ON my_table (KEYS(properties))
USING 'StorageAttachedIndex';
-- Index on map values
CREATE CUSTOM INDEX idx_map_values ON my_table (VALUES(properties))
USING 'StorageAttachedIndex';
-- Index on map entries
CREATE CUSTOM INDEX idx_map_entries ON my_table (ENTRIES(properties))
USING 'StorageAttachedIndex';
-- Vector index (for similarity search)
CREATE CUSTOM INDEX idx_embedding ON my_table (embedding)
USING 'StorageAttachedIndex';
Querying with SAI:
-- Equality query (uses SAI index)
SELECT * FROM my_table WHERE email = '[email protected]';
-- Range query on numeric column
SELECT * FROM my_table WHERE age > 25 AND age < 65;
-- Combined partition key + SAI filter
SELECT * FROM my_table WHERE partition_id = 'abc' AND status = 'active';
-- Multi-column SAI queries (AND only; no OR)
SELECT * FROM my_table WHERE email = '[email protected]' AND age > 25;
-- Collection contains
SELECT * FROM my_table WHERE tags CONTAINS 'urgent';
-- Map entry query
SELECT * FROM my_table WHERE properties['color'] = 'red';
SAI vs Legacy Secondary Index:
| Feature | Legacy 2i | SASI | SAI |
|---|---|---|---|
| Storage model | Hidden local table | Separate index files | Attached to SSTable |
| Query without partition key | Scatter-gather (slow) | Scatter-gather | Scatter-gather (faster) |
| Query with partition key | Efficient | Efficient | Most efficient |
| Range queries | No | Yes (limited) | Yes |
| Text analysis | No | Basic tokenization | Case-insensitive, normalization |
| Collection indexing | Limited | No | Full support |
| Vector similarity | No | No | Yes |
| Compaction integration | Separate compaction | Separate | Compacts with SSTable |
| Write overhead | High (separate table) | Moderate | Low (inline with SSTable) |
| Repair integration | Complex | Complex | Automatic (part of SSTable) |
| Production recommended | No (for most cases) | No | Yes |
SAI internals:
SAI limitations:
UCS replaces STCS, LCS, and TWCS with a single configurable strategy:
-- Basic UCS (default behavior similar to STCS)
ALTER TABLE my_table WITH compaction = {
'class': 'UnifiedCompactionStrategy'
};
-- UCS mimicking LCS behavior
ALTER TABLE my_table WITH compaction = {
'class': 'UnifiedCompactionStrategy',
'scaling_parameters': 'L4' -- Leveled with fanout 4
};
-- UCS mimicking STCS behavior
ALTER TABLE my_table WITH compaction = {
'class': 'UnifiedCompactionStrategy',
'scaling_parameters': 'T4' -- Tiered with min_threshold 4
};
-- UCS mimicking TWCS behavior
ALTER TABLE my_table WITH compaction = {
'class': 'UnifiedCompactionStrategy',
'scaling_parameters': 'T4',
'target_sstable_size': '1GiB',
'base_shard_count': 4,
'expired_sstable_check_frequency_seconds': 600
};
-- UCS with time-window sharding
ALTER TABLE my_table WITH compaction = {
'class': 'UnifiedCompactionStrategy',
'scaling_parameters': 'T4, L8, L8, L8', -- tier first level, level rest
'target_sstable_size': '256MiB'
};
UCS scaling parameters explained:
T<N> = Tiered (like STCS): triggers compaction when N SSTables of similar size existL<N> = Leveled (like LCS): organizes into levels with fanout NN<N> = No compaction for that level (data just accumulates)'T4, L10, L10'
UCS advantages over legacy strategies:
UCS configuration reference:
| Parameter | Default | Description |
|---|---|---|
scaling_parameters | T4 | Per-level compaction behavior |
target_sstable_size | 1GiB | Target SSTable size after compaction |
base_shard_count | 4 | Number of shards for parallel compaction |
expired_sstable_check_frequency_seconds | 600 | How often to check for expired SSTables |
max_sstables_to_compact | 0 (unlimited) | Limit SSTables per compaction |
min_sstable_size | Derived | Minimum SSTable size for compaction |
overlap_inclusion_method | NONE | How to handle overlapping SSTables |
Cassandra 5.0 introduces trie-based data structures that replace the traditional B-tree-like partition index:
Trie Memtable:
# cassandra.yaml