Testing scientific code: pytest, testthat, nf-test, snapshot testing, CI with GitHub Actions, and practical priorities for academic software
TRIGGER when: user asks about testing bioinformatics code, writing tests for pipelines, pytest or testthat on the cluster, snapshot testing, CI/CD for scientific software, or how much testing is appropriate for academic code.
Scientific code has a fundamental testing challenge: for many analyses, there is no "known correct answer" to test against (the oracle problem). Despite this, testing remains essential. Untested code is unreliable code, and unreliable code produces unreliable science. The goal is not 100% coverage but strategic testing that catches the errors most likely to invalidate your results.
Most bioinformatics training does not cover testing. This skill provides practical, prioritized guidance for researchers at Fred Hutch.
Test individual functions with known inputs and outputs. Focus on:
Verify that pipeline steps connect correctly:
Capture outputs from a "known good" run and compare future runs against them:
Run the full pipeline on a small, curated test dataset:
Verify the pipeline fails gracefully on bad input:
# tests/test_analysis.py
import pytest
import numpy as np
from mypackage.analysis import normalize, differential_expression
def test_normalize_basic():
"""Known input/output pair."""
raw = np.array([[10, 20], [30, 40]])
result = normalize(raw)
expected = np.array([[0.25, 0.5], [0.75, 1.0]])
np.testing.assert_allclose(result, expected, rtol=1e-5)
def test_normalize_empty():
"""Edge case: empty input."""
with pytest.raises(ValueError, match="empty"):
normalize(np.array([]))
def test_normalize_single_sample():
"""Edge case: single sample."""
raw = np.array([[5, 10]])
result = normalize(raw)
assert result.shape == (1, 2)
@pytest.fixture
def small_dataset(tmp_path):
"""Create a minimal test dataset."""
data_file = tmp_path / "counts.csv"
data_file.write_text("gene,sample1,sample2\nTP53,100,200\nBRCA1,50,75\n")
return data_file
def test_differential_expression(small_dataset):
"""Integration test with fixture."""
result = differential_expression(small_dataset)
assert "pvalue" in result.columns
assert all(result["pvalue"] >= 0)
assert all(result["pvalue"] <= 1)
Run tests:
# Load Python on Gizmo
ml fhPython/3.11.3-foss-2023a
# Run all tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=mypackage --cov-report=term-missing
# Run a specific test
pytest tests/test_analysis.py::test_normalize_basic
@pytest.fixture(scope="session")
def reference_genome(tmp_path_factory):
"""Download or create a small reference genome for testing.
Session-scoped so it's only created once per test run."""
ref_dir = tmp_path_factory.mktemp("reference")
ref_file = ref_dir / "chr22_subset.fa"
# Create a minimal FASTA or copy from test data
ref_file.write_text(">chr22\nACGTACGTACGT\n")
return ref_file
# tests/testthat/test-analysis.R
library(testthat)
test_that("normalize_counts returns expected values", {
raw <- matrix(c(10, 30, 20, 40), nrow = 2)
result <- normalize_counts(raw)
expect_equal(dim(result), c(2, 2))
expect_true(all(result >= 0 & result <= 1))
})
test_that("normalize_counts rejects empty input", {
expect_error(normalize_counts(matrix(nrow = 0, ncol = 0)), "empty")
})
test_that("differential_expression returns valid p-values", {
# Use a small built-in or bundled dataset
result <- run_de(test_counts, test_metadata)
expect_true(all(result$pvalue >= 0 & result$pvalue <= 1))
expect_true("log2FoldChange" %in% colnames(result))
})
Run tests:
# In R on Gizmo
ml R/4.4.1-foss-2023b
R -e 'testthat::test_dir("tests/testthat")'
# Or from within R