Name: Pathml
Author: jaechang-hits

Pathml

PathML is an open-source toolkit for computational pathology. Use it to process whole-slide images (WSIs): load slides, extract tiles, apply stain normalization and nuclear segmentation preprocessing, extract features, and train machine learning models. Supports H&E and multiplex imaging. Ideal for building end-to-end digital pathology pipelines from raw WSI files to quantitative outputs.

jaechang-hits119 starsFeb 18, 2026

Occupation
Categories: Scientific Computing

Overview

PathML is a Python toolkit designed for computational pathology workflows on whole-slide images (WSIs). It provides a unified pipeline from raw slide files (SVS, NDPI, MRXS, TIFF) through tile extraction, preprocessing (stain normalization, nuclear segmentation, tissue detection), feature extraction, and machine learning. PathML integrates with popular Python ML and image processing libraries while abstracting the complexity of WSI handling through its SlideData and Pipeline abstractions.

When to Use

Processing whole-slide H&E images: Tiling a large WSI, normalizing staining variability across slides from different scanners or batches.
Nuclear segmentation on pathology slides: Detecting and segmenting nuclei in H&E or DAPI-stained WSIs using built-in segmentation pipelines.
Building ML training datasets from WSIs: Extracting tiles with associated labels for training tissue classifiers, tumor detectors, or survival prediction models.
Multiplex immunofluorescence (mIF) image analysis: Processing multi-channel IF slides with channel-specific preprocessing and feature extraction.

Overview

When to Use

Processing whole-slide H&E images: Tiling a large WSI, normalizing staining variability across slides from different scanners or batches.
Nuclear segmentation on pathology slides: Detecting and segmenting nuclei in H&E or DAPI-stained WSIs using built-in segmentation pipelines.
Building ML training datasets from WSIs: Extracting tiles with associated labels for training tissue classifiers, tumor detectors, or survival prediction models.
Multiplex immunofluorescence (mIF) image analysis: Processing multi-channel IF slides with channel-specific preprocessing and feature extraction.

Parameter	Default	Range / Options	Effect
`shape`	`(256, 256)`	`(64,64)` – `(1024,1024)`	Tile dimensions in pixels
`stride`	equals `shape`	any tuple ≤ `shape`	Step between tile centers; `stride < shape` gives overlapping tiles
`level`	`0`	`0` – max pyramid level	Pyramid resolution level (0 = full resolution)
`kernel_size`	`5`	odd integers `3`–`21`	Smoothing kernel size in `BoxBlur`
`mask_name`	required	any string	Name of output mask stored in `tile.masks`
`distributed`	`False`	`True`, `False`	Enable Dask distributed processing for large slides
`pad`	`False`	`True`, `False`	Pad edge tiles to full `shape` size

Problem	Cause	Solution
`openslide.lowlevel.OpenSlideUnsupportedFormatError`	OpenSlide C library not installed or WSI format unsupported	`conda install -c conda-forge openslide`; check format compatibility
`CUDA out of memory` during segmentation	Tile size too large for GPU	Reduce tile `shape` to `(128, 128)` or run with `distributed=False` on CPU
`slide.tiles` is empty after generate_tiles	Level index out of range or all tiles filtered	Use `level=0`; check slide pyramid with `slide.slide.level_count`
Stain normalization produces black tiles	Source slide too low contrast or failed tissue detection	Apply `TissueDetectionHE` before normalization; inspect tissue mask coverage
`KeyError: 'nuclei'` in tile.masks	Segmentation pipeline not yet run	Run the `NuclearSegmentation` pipeline with `slide.run()` before accessing masks
Very slow tile generation	High-resolution level 0 on large SVS	Use a lower pyramid level (`level=1` or `level=2`) for faster prototyping
`AttributeError: SlideData has no attribute 'write'`	Old PathML version	`pip install --upgrade pathml` to get HDF5 save/load support

Pathml

Overview

When to Use

Pathml

Overview

When to Use

Prerequisites

Quick Start

Workflow

Step 1: Load a Whole-Slide Image

Step 2: Define a Preprocessing Pipeline

Step 3: Create a TileDataset

Step 4: Run the Preprocessing Pipeline

Step 5: Nuclear Segmentation

Step 6: Feature Extraction

Step 7: Save and Export Processed Slide

Key Parameters

Common Recipes

Recipe: Tissue-Only Tile Filtering

Recipe: Export Tiles as PNG Files

Recipe: Batch Process Multiple Slides

Expected Outputs

Troubleshooting

References

Deep Research

Data Analyst

Academic Researcher

Data Scientist

Biopython

Binary Analysis Patterns