Time-partitioned segment storage patterns for stupid-db. Covers mmap-backed segments, rotation, eviction, TTL, MessagePack format, and the segment lifecycle. Use when working on storage, segment management, or data retention.
Active (writing) → Sealed (read-only, mmap) → Archived (optional) → Evicted (deleted)
SegmentWritermemmap2 for zero-copy access[segment file]
┌──────────────────────────┐
│ Header (magic + version) │
├──────────────────────────┤
│ Document 1 (msgpack) │
│ Document 2 (msgpack) │
│ ... │
│ Document N (msgpack) │
├──────────────────────────┤
│ Index (doc_id → offset) │
├──────────────────────────┤
│ Footer (index offset) │
└──────────────────────────┘
rmp-serdeuse memmap2::MmapOptions;
let file = File::open(segment_path)?;
let mmap = unsafe { MmapOptions::new().map(&file)? };
// mmap[offset..offset+len] gives zero-copy access to document bytes
Why mmap: Sealed segments are read-only and potentially larger than RAM. mmap lets the OS manage page caching efficiently.
pub struct SegmentWriter {
file: BufWriter<File>,
index: HashMap<DocId, u64>, // doc_id → file offset
doc_count: usize,
created_at: DateTime<Utc>,
}
impl SegmentWriter {
pub fn write_document(&mut self, doc: &Document) -> Result<()>;
pub fn seal(self) -> Result<SealedSegment>;
pub fn should_rotate(&self, config: &SegmentConfig) -> bool;
}
pub struct SegmentReader {
mmap: Mmap,
index: HashMap<DocId, u64>,
time_range: (DateTime<Utc>, DateTime<Utc>),
}
impl SegmentReader {
pub fn get(&self, doc_id: &DocId) -> Result<Option<Document>>;
pub fn scan(&self, filter: &Filter) -> Result<Vec<Document>>;
pub fn scan_range(&self, start: DateTime<Utc>, end: DateTime<Utc>) -> Result<Vec<Document>>;
}
pub struct SegmentRotator {
config: SegmentConfig, // max_age, max_size
}
impl SegmentRotator {
pub fn should_rotate(&self, writer: &SegmentWriter) -> bool;
pub fn rotate(&self, writer: SegmentWriter) -> Result<(SealedSegment, SegmentWriter)>;
}
pub struct SegmentEvictor {
retention: Duration, // 15-30 days
}
impl SegmentEvictor {
pub fn evict_expired(&self, segments: &mut Vec<SealedSegment>) -> Vec<SegmentId>;
// Returns IDs of evicted segments for vector index + graph cleanup
}
When a segment is evicted, cleanup cascades:
This is why edges in the graph carry segment metadata — to enable efficient pruning.
pub struct SegmentConfig {
pub max_segment_age: Duration, // default: 1 hour
pub max_segment_size: usize, // default: 100MB
pub retention_days: u32, // default: 30
pub compression: CompressionType, // zstd level
pub data_dir: PathBuf,
}
| Operation | Complexity | Notes |
|---|---|---|
| Write document | O(1) amortized | Append to file |
| Read by doc_id | O(1) | Index lookup + mmap read |
| Scan with filter | O(n) per segment | Full scan, filter in memory |
| Rotation | O(1) | Close file, open new |
| Eviction | O(1) per segment | Delete file |
| Time range scan | O(segments) | Check time range, scan matching |
Recent architectural decision: segment reading uses sequential iteration instead of parallel to limit memory usage. When scanning across many segments, parallel reads can cause memory spikes from concurrent mmap faults.