Find and use existing GPU-powered computational biology tools for the Whitehead fry cluster, or scaffold a new one. Single-purpose Python tools that produce outputs (embeddings, segmentations, predictions) for downstream analysis.
Single-purpose, GPU-powered Python tools for computational biology on the Whitehead fry cluster. Each tool wraps one model or library, takes a YAML config, and produces an output (embeddings, segmentations, predictions) that you use downstream in your analysis. Install with conda+uv, run with sbatch.
Install and run any of these on fry right now:
| Tool | What it produces | Wraps | Repo |
|---|---|---|---|
| goudacell | Cell segmentation masks | Cellpose | cheeseman-lab/goudacell |
| emmentalembed | Protein embeddings | ESM | cheeseman-lab/emmentalembed |
git clone https://github.com/cheeseman-lab/TOOL.git
cd TOOL
conda create -n TOOL -c conda-forge python=3.11 uv pip -y
conda activate TOOL
uv pip install -e ".[gpu]"
Then configure and run:
cp configs/example_config.yaml my_config.yaml
# Edit my_config.yaml with your paths
sbatch scripts/run.sh my_config.yaml
When you need GPU compute for a new task (e.g., structure prediction, variant effect scoring, image classification), scaffold a new tool following the same pattern.
TOOL/
├── src/TOOL/
│ ├── __init__.py # __version__ = "0.1.0"
│ ├── cli.py # typer CLI entry point
│ └── config.py # dataclass-backed YAML config
├── configs/
│ └── example_config.yaml # ship a working example
├── scripts/
│ ├── run.sh # sbatch GPU job script
│ └── jupyter_gpu.sh # interactive GPU notebook
├── tests/
│ ├── conftest.py
│ ├── test_install.py # smoke tests
│ └── test_config.py # config validation
├── pyproject.toml
├── README.md
└── CLAUDE.md
[build-system]
requires = ["setuptools>=61.0"]
build-backend = "setuptools.build_meta"
[project]
name = "TOOL"
version = "0.1.0"
description = "One-line: what output does this produce?"
readme = "README.md"
requires-python = ">=3.10"
license = { text = "MIT" }
dependencies = [
"pyyaml>=6.0",
"typer>=0.9.0",
"rich>=13.0.0",
]
[project.optional-dependencies]
gpu = [
"torch==2.7.0",
# The model/library this tool wraps
]
dev = [
"pytest>=8.0.0",
"ruff>=0.4.0",
]
[project.scripts]
TOOL = "TOOL.cli:app"
[tool.setuptools]
package-dir = {"" = "src"}
packages = ["TOOL"]
[tool.ruff]
line-length = 100
[tool.ruff.lint]
select = ["E", "F", "I", "D"]
pydocstyle = { convention = "google" }
[tool.ruff.lint.per-file-ignores]
"tests/*.py" = ["D100", "D103"]
[tool.pytest.ini_options]
testpaths = ["tests"]
Key decisions:
[gpu] extra — CPU-only install must work for testing/dev>= floor pins for most deps, exact == pins only for torch or model-specific libssrc/ layout — prevents accidental imports from the repo root# src/TOOL/cli.py
import typer
from rich.console import Console
app = typer.Typer(help="TOOL — one-line description")
console = Console()
@app.command()
def run(config: str = typer.Argument(..., help="Path to config YAML")):
"""Run the tool."""
from TOOL.config import Config
cfg = Config.from_yaml(config)
# ... load model, process input, write output
console.print("[green]Done![/green]")
@app.command()
def version():
"""Show version and environment info."""
from TOOL import __version__
console.print(f"TOOL v{__version__}")
if __name__ == "__main__":
app()
# src/TOOL/config.py
from dataclasses import dataclass, asdict
import yaml
@dataclass
class Config:
"""Tool configuration."""
input_dir: str = "."
output_dir: str = "./output"
gpu: bool = False
# ... tool-specific params (model name, batch size, etc.)
@classmethod
def from_yaml(cls, path: str) -> "Config":
with open(path) as f:
data = yaml.safe_load(f)
return cls(**data)
def to_yaml(self, path: str) -> None:
with open(path, "w") as f:
yaml.dump(asdict(self), f, default_flow_style=False)
scripts/run.sh — GPU batch job:
#!/bin/bash
#SBATCH --job-name=TOOL
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32gb
#SBATCH --time=04:00:00
#SBATCH --partition=YOUR_GPU_PARTITION
#SBATCH --gres=gpu:1
#SBATCH --output=TOOL-%j.out
# Usage: sbatch scripts/run.sh /path/to/config.yaml
set -e
if [ -z "$1" ]; then
echo "Usage: sbatch scripts/run.sh /path/to/config.yaml"
exit 1
fi
CONFIG_PATH="$(realpath "$1")"
CONFIG_DIR="$(dirname "$CONFIG_PATH")"
cd "$CONFIG_DIR"
source ~/.bashrc
conda activate TOOL
echo "================================================"
echo "TOOL — $(date)"
echo "Host: $(hostname)"
echo "Config: ${CONFIG_PATH}"
echo "GPU: $(nvidia-smi --query-gpu=name --format=csv,noheader 2>/dev/null || echo 'none')"
echo "================================================"
TOOL run "${CONFIG_PATH}"
echo "Completed: $(date)"
scripts/jupyter_gpu.sh — Interactive GPU notebook:
#!/bin/bash
#SBATCH --job-name=TOOL_jupyter
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --cpus-per-task=4
#SBATCH --mem=32gb
#SBATCH --time=04:00:00
#SBATCH --partition=YOUR_GPU_PARTITION
#SBATCH --gres=gpu:1
#SBATCH --output=TOOL_jupyter-%j.out
source ~/.bashrc
conda activate TOOL
unset XDG_RUNTIME_DIR
NOTEBOOK_DIR="${SLURM_SUBMIT_DIR:-$(pwd)}"
jupyter-lab \
--no-browser \
--port-retries=0 \
--ip=0.0.0.0 \
--port=$(shuf -i 8900-10000 -n 1) \
--notebook-dir="${NOTEBOOK_DIR}"
tests/test_install.py:
"""Smoke tests: does the package install and import correctly?"""
def test_import():
import TOOL
def test_version():
from TOOL import __version__
parts = __version__.split(".")
assert len(parts) == 3
def test_cli_entry_point():
import subprocess
result = subprocess.run(["TOOL", "--help"], capture_output=True, text=True)
assert result.returncode == 0
def test_dependencies_importable():
import yaml
import typer
import rich
tests/test_config.py:
from TOOL.config import Config
def test_load_example_config(example_config):
cfg = Config.from_yaml(str(example_config))
assert cfg.input_dir is not None
def test_config_roundtrip(tmp_path):
cfg = Config()
path = tmp_path / "test_config.yaml"
cfg.to_yaml(str(path))
cfg2 = Config.from_yaml(str(path))
assert cfg == cfg2
Required sections:
sbatch exampleTOOL with the actual tool name everywherecli.py and config.pyexample_config.yamlpytest tests/ -vcli.pyuv pip install -e ".[dev]" then pytest tests/ -v[gpu] adds torch + model depssbatch scripts/run.sh config.yaml