Use when creating benchmark scenarios for new openzl codec nodes in unitBench - adding kernel-level encode/decode benchmarks or graph-level compress/decompress benchmarks for codecs like bitsplit, delta, transpose, entropy, etc.
Create benchmark scenarios for openzl codec nodes. Two benchmark types exist: kernel (test encode/decode kernel functions directly) and graph (test a node within a full compress/decompress pipeline).
Before creating scenarios, ask the user:
Does this node have a standalone kernel function? (e.g., ZL_bitSplitEncode, ZL_bitSplitDecode)
Should the node be tested in a graph?
What data types/widths does the node operate on?
| What | Where |
|---|---|
| Kernel benchmarks | benchmark/unitBench/scenarios/codecs/<name>.c and .h |
| Graph benchmarks | benchmark/unitBench/scenarios/<name>_graph.c and .h |
| Scenario registration | benchmark/unitBench/benchList.h |
| BUCK file | benchmark/unitBench/BUCK |
| Test data | /tmp/ (use dd if=/dev/urandom) |
All paths relative to the openzl dev root.
Test encode/decode kernel functions directly. Requires a standalone kernel API.
.h)Add declarations to existing scenarios/codecs/<codec>.h or create a new one:
// Decode
size_t <codec>Decode_<type>_prep(void* src, size_t srcSize, const BenchPayload* bp);
size_t <codec>Decode_<type>_outSize(const void* src, size_t srcSize);
size_t <codec>Decode_<type>_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload);
// Encode
size_t <codec>Encode_<type>_prep(void* src, size_t srcSize, const BenchPayload* bp);
size_t <codec>Encode_<type>_outSize(const void* src, size_t srcSize);
size_t <codec>Encode_<type>_wrapper(const void* src, size_t srcSize, void* dst, size_t dstCapacity, void* customPayload);
.c)Decode scenario: prep packs split streams contiguously into src, wrapper recomputes pointers and calls the decode kernel, outSize returns (srcSize / sumSrcElt) * dstEltWidth.
Encode scenario: prep fills src with random values, wrapper calls the encode kernel writing streams contiguously into dst, outSize returns (srcSize / srcEltWidth) * sumDstElt.
Reference implementation: See scenarios/codecs/bitSplit.c for the complete pattern with multiple data type examples.
Test a node within a full compress/decompress graph. Required when no standalone kernel exists. Also useful alongside kernel benchmarks to measure graph overhead.
.h)// Copyright (c) Meta Platforms, Inc. and affiliates.
#ifndef GUARD_MACRO_H
#define GUARD_MACRO_H
#include "openzl/shared/portability.h"
#include "openzl/zl_compressor.h"
ZL_BEGIN_C_DECLS
ZL_GraphID <name>_graph(ZL_Compressor* cgraph);
ZL_END_C_DECLS
#endif
.c)Build the graph using ZL_Compressor_registerStaticGraph_fromNode1o. Typical pattern: tokenize input -> apply node -> downstream graph.
#include "openzl/codecs/zl_<codec>.h" // ZL_NODE_<YOUR_NODE>
#include "openzl/zl_compressor.h"
#include "openzl/zl_public_nodes.h" // ZL_NODE_INTERPRET_AS_LE*
ZL_GraphID my_graph(ZL_Compressor* cgraph)
{
if (ZL_isError(ZL_Compressor_setParameter(
cgraph, ZL_CParam_formatVersion, ZL_MAX_FORMAT_VERSION))) {
abort();
}
if (ZL_isError(ZL_Compressor_setParameter(
cgraph, ZL_CParam_compressionLevel, 1))) {
abort();
}
return ZL_Compressor_registerStaticGraph_fromNode1o(
cgraph,
ZL_NODE_INTERPRET_AS_LE64, // tokenizer matching element width
ZL_Compressor_registerStaticGraph_fromNode1o(
cgraph, ZL_NODE_YOUR_NODE, ZL_GRAPH_STORE));
// ZL_GRAPH_STORE to benchmark node in isolation
// ZL_GRAPH_ZSTD to benchmark with compression
}
Tokenizer node must match element width: ZL_NODE_INTERPRET_AS_LE16 (2 bytes), ZL_NODE_INTERPRET_AS_LE32 (4 bytes), ZL_NODE_INTERPRET_AS_LE64 (8 bytes).
Reference: See scenarios/sao_graph.c for a complex multi-stream graph example.
#include "benchmark/unitBench/scenarios/<name>_graph.h"
scenarioList[] array (maintain alphabetical order):// Kernel scenarios (set .func via first positional arg)
{ "<codec>Decode_<type>", <codec>Decode_<type>_wrapper, .prep = <codec>Decode_<type>_prep, .outSize = <codec>Decode_<type>_outSize },
{ "<codec>Encode_<type>", <codec>Encode_<type>_wrapper, .prep = <codec>Encode_<type>_prep, .outSize = <codec>Encode_<type>_outSize },
// Graph scenario (set .graphF - harness auto-wires init and compression)
{ "<graphName>", .graphF = <name>_graph },
Add a library target for graph benchmarks (following sao_graph pattern):
zs_library(
name = "<name>_graph",
srcs = ["scenarios/<name>_graph.c"],
headers = ["scenarios/<name>_graph.h"],
deps = [
"../..:zstronglib",
],
)
Kernel .c/.h files are auto-included by the unitBench binary's glob(["**/*.c"]).
Test data size must be a multiple of the element width for the codec/node being tested. For example, fp64 (8-byte elements) needs a file size divisible by 8. Using standard sizes like 1MB/10MB works for all common element widths.
# Generate test data (use sizes that are multiples of element width)
dd if=/dev/urandom of=/tmp/openzl_bench/test_1MB.bin bs=1M count=1
dd if=/dev/urandom of=/tmp/openzl_bench/test_10MB.bin bs=1M count=10
# Build with BUCK in opt mode (best practice - optimized, no ASAN)
buck build @//mode/opt //openzl/dev/benchmark/unitBench:unitBench
# Run benchmark via buck run
buck run @//mode/opt //openzl/dev/benchmark/unitBench:unitBench -- <scenarioName> /tmp/openzl_bench/test_10MB.bin
# Useful options (after the -- separator)
# -i <seconds> benchmark duration (default ~2s)
# -B <bytes> split input into blocks
# --csv CSV output for parsing
# -z compression only (skip decompression round-trip)
# List all scenarios
buck run @//mode/opt //openzl/dev/benchmark/unitBench:unitBench -- --list
Always use buck build/run @//mode/opt for benchmarking. If buck is not available, fall back to make unitBench (from the openzl dev root).
(srcSize / sumSrcElt) * dstEltWidth. Encode: (srcSize / srcEltWidth) * sumDstElt.ZL_CParam_formatVersion to ZL_MAX_FORMAT_VERSION for newer nodes./tmp/, not in the source tree.