Help write an interestingness test for shrinkray (test-case reducer). Use when the user needs to reduce a test case, write or fix an interestingness test, or is working with test-case reduction.
You are helping the user write an interestingness test — a script that shrinkray (or another test-case reducer) uses to determine whether a candidate test case still exhibits the property of interest. Always assume shrinkray unless the user specifically asks for a different reducer.
An interestingness test is an executable that:
The reducer calls this script thousands of times with progressively smaller/simpler variants of the original file. The quality of the interestingness test is the single most important factor determining the quality of the reduced output.
Before writing anything, determine:
Get the user to describe the specific behavior they want to preserve during reduction. Ask for:
What format is the test case? This affects validity checking:
Is there a risk of undefined behavior or bug slippage? Especially critical for C/C++ wrong-code bugs.
Order checks from cheapest/most-likely-to-fail first to most-expensive last. The reducer invokes this script thousands of times, so every millisecond counts.
#!/bin/bash
# Phase 1: Quick rejection (syntax, size, required content)
# Phase 2: Validity checks (compilation, parsing, UB detection)
# Phase 3: Bug reproduction (run the program, check for the specific bug)
Reject obviously-broken candidates fast:
# Ensure the file is non-empty (prevents reducing to nothing)
test -s "$1" || exit 1
# Ensure required constructs are still present (optional, use sparingly)
grep -q 'some_essential_function' "$1" || exit 1
Warning: Don't over-constrain with grep checks. Every constraint you add is something the reducer can't remove, potentially preventing deeper reduction. Only add grep checks when you're getting bad results without them.
Ensure the reduced test case is still well-formed enough to be meaningful. This prevents the reducer from finding a different, trivial bug (slippage).
For compiler crash bugs:
# Usually skip validity checks — the crash IS on invalid code, and that's fine.
# Only add checks if you're getting slippage to a different crash.
For wrong-code (miscompilation) bugs — this is critical:
# Must compile cleanly under strict warnings with BOTH compilers
gcc -Wall -Wextra -pedantic -c reduced.c 2>/dev/null || exit 1
clang -Wall -Wextra -pedantic -c reduced.c 2>/dev/null || exit 1
# Runtime UB detection (if the bug involves execution)
gcc -fsanitize=undefined -o test_ub reduced.c && timeout 5 ./test_ub || exit 1
For tool bugs (linters, formatters, parsers):
# Usually just need the file to be parseable by a reference tool
python3 -c "import ast; ast.parse(open('$1').read())" 2>/dev/null || exit 1
Check for the specific bug, not just any failure. The recommended pattern is to capture output to a file, check the exit code, then grep the file. Piping (tool | grep) loses the exit code of the tool, which is often an important signal.
# BAD: Too broad — will find any crash, not YOUR crash
some_tool "$1" 2>&1; test $? -ne 0
# BAD: Piping loses the exit code of some_tool
some_tool "$1" 2>&1 | grep -q "specific error"
# GOOD: Capture output, check exit code, then grep
some_tool "$1" > output.txt 2>&1
exit_code=$?
# Check exit code first (cheap) — e.g., SIGSEGV = 139, SIGABRT = 134
test $exit_code -ne 0 || exit 1
# Then check for the specific error message
grep -q "specific error: in function_name" output.txt
This pattern lets you check both the exit code AND the output, which is more precise than either alone. It also means the grep runs on a local file (fast) rather than blocking on a pipe.
For wrong-code bugs (differential testing):
gcc -O0 -o exe0 "$1" && gcc -O2 -o exe2 "$1" || exit 1
timeout 5 ./exe0 > out0.txt 2>&1 || exit 1
timeout 5 ./exe2 > out2.txt 2>&1 || exit 1
! diff -q out0.txt out2.txt >/dev/null 2>&1
Always set resource limits to prevent hangs and memory bombs (reduction can introduce infinite loops):
ulimit -t 10 # CPU time limit in seconds
ulimit -v 2000000 # Virtual memory limit (~2GB)
Or use timeout for wall-clock limits:
timeout 5 ./program "$1"
When the buggy tool crashes (non-zero exit) under the bad condition but succeeds (zero exit) normally, the exit code convention is inverted from what the reducer expects. Use shell negation or explicit exit code handling:
#!/bin/bash
# The tool crashes on interesting inputs — invert the exit code
! some_tool "$1" 2>/dev/null
But this is too broad (any failure counts). Better to capture and check specifically:
#!/bin/bash
some_tool "$1" > output.txt 2>&1
grep -q "specific crash message" output.txt
For bugs in tools that process structured input, a powerful pattern is: first verify the input is valid according to a reference implementation, then check that the buggy tool misbehaves. This prevents the reducer from producing degenerate inputs that trivially crash the tool.
#!/bin/bash
# "Not bogus" — reference tool accepts it
reference_tool "$1" >/dev/null 2>&1 || exit 1
# But buggy tool crashes on it
buggy_tool "$1" > output.txt 2>&1
grep -q "specific error" output.txt
This is especially important when the bug is "tool crashes on valid input." Without the validity check, the reducer will find the simplest invalid input that crashes the tool, which is rarely the bug you care about.
For complex logic (exception type checking, AST comparison, multi-step validation), Python is often clearer than bash. shrinkray works with any executable:
#!/usr/bin/env python3
"""Interestingness test — reduce to trigger a specific bug."""
import subprocess
import sys
def is_interesting(filename: str) -> bool:
# Phase 1: Quick rejection
with open(filename, 'rb') as f:
content = f.read()
if len(content) == 0:
return False
# Phase 2: Validity (reference tool accepts it)
result = subprocess.run(
['reference_tool', filename],
capture_output=True, timeout=10
)
if result.returncode != 0:
return False
# Phase 3: Bug reproduction
result = subprocess.run(
['buggy_tool', filename],
capture_output=True, timeout=10
)
return b"specific error message" in result.stderr
if __name__ == '__main__':
try:
sys.exit(0 if is_interesting(sys.argv[1]) else 1)
except (subprocess.TimeoutExpired, Exception):
sys.exit(1)
shrinkray runs your interestingness test in a temporary directory, not your original working directory. This is the single most common source of confusion. Your test will fail if it assumes it's running in your project directory or that any files other than the test case exist.
Always use the command-line argument form ($1) to read the test case. This is the simplest and most reliable approach — $1 is an absolute path to a temporary copy of the test case, so it works regardless of what directory the test runs in:
#!/bin/bash
some_tool "$1" > output.txt 2>&1
grep -q "specific error" output.txt
This is the recommended default. Only use other input modes if you have a specific reason:
stdin — Only if the tool you're testing exclusively reads from stdin and doesn't accept filenames:
#!/bin/bash
# Tool only reads stdin, no filename argument
stdin_only_tool < "$1" > output.txt 2>&1
grep -q "specific error" output.txt
basename — Only needed for creduce compatibility or if the tool requires the file to have a specific name/extension and be in the CWD. The file is placed in the CWD with the same basename as the original:
#!/bin/bash
# Only use this if the tool requires a specific filename in CWD
some_tool original_name.ext > output.txt 2>&1
grep -q "specific error" output.txt
If your test only uses one mode, tell shrinkray to skip the others:
shrinkray --input-type=arg ./test.sh file.c
Because the test runs in a temp directory:
./helper.sh or ../data/expected.json will fail./tmp/output.txt — parallel runs will clobber each other.Common pattern for tests that need auxiliary files:
#!/bin/bash
# Resolve paths relative to the test script's location, not CWD
SCRIPT_DIR="$(cd "$(dirname "$0")" && pwd)"
# Reference auxiliary files by absolute path
"$SCRIPT_DIR/reference_tool" "$1" >/dev/null 2>&1 || exit 1
diff <(some_tool "$1") "$SCRIPT_DIR/expected_output.txt"
Or just hardcode absolute paths:
#!/bin/bash
/home/user/tools/reference_compiler -c "$1" 2>/dev/null || exit 1
gcc -O2 -c "$1" > output.txt 2>&1
grep -q "internal compiler error" output.txt
shrinkray's --also-interesting feature (exit code 101 by default) lets you record variants that are interesting for a different reason without derailing the main reduction:
#!/bin/bash
output=$(buggy_tool "$1" 2>&1)
# Primary bug we're reducing for
if echo "$output" | grep -q "TypeError: unexpected None"; then
exit 0
fi
# Different but related bug — record it but don't reduce toward it
if echo "$output" | grep -q "TypeError:"; then
exit 101
fi
exit 1
Recorded variants are saved in .shrinkray/ history for later investigation.
#!/bin/bash
# Reduce a GCC internal compiler error
ulimit -t 10
ulimit -v 2000000
gcc -O2 -c "$1" > output.txt 2>&1
# GCC ICEs produce a specific error string — check for it
grep -q "internal compiler error: in fold_convert_loc" output.txt
#!/bin/bash
ulimit -t 10
ulimit -v 2000000
# Must compile cleanly (reject UB-introducing reductions)
gcc -Wall -Wextra -pedantic -Werror -c "$1" 2>/dev/null || exit 1
clang -Wall -Wextra -pedantic -Werror -c "$1" 2>/dev/null || exit 1
# Must not trigger UB at runtime
gcc -fsanitize=undefined -o test_ub "$1" 2>/dev/null || exit 1
timeout 5 ./test_ub >/dev/null 2>&1 || exit 1
# Differential test: O0 vs O2 must produce different output
gcc -O0 -o exe0 "$1" && gcc -O2 -o exe2 "$1" || exit 1
out0=$(timeout 5 ./exe0 2>&1) || exit 1
out2=$(timeout 5 ./exe2 2>&1) || exit 1
[ "$out0" != "$out2" ]
#!/usr/bin/env python3
import libcst
import sys
from pathlib import Path