Bulk RNA-seq batch correction with pyComBat: remove batch effects from merged cohorts, export corrected matrices, and benchmark visualizations.
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
before downstream analysis. It follows t_bulk_combat.ipynb, w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.
omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.pd.read_pickle(...)/pd.read_csv(...).anndata.AnnData objects so adata.obs stores sample metadata.batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa
ble.anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.ov.bulk.batch_correction(adata, batch_key='batch').adata.layers['batch_correction'] while the original counts remain in adata.X.adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected)..to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).ov.pl.red_color, blue_color, gree n_color palettes to match batches.adata.layers['raw'] = adata.X.copy() before PCA.ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).ov.pl.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo
r the corrected layer to verify mixing.# Before ComBat: verify batch column exists and has >1 batch
assert 'batch' in adata.obs.columns, "adata.obs must contain a 'batch' column"
n_batches = adata.obs['batch'].nunique()
assert n_batches > 1, f"Only {n_batches} batch — need >1 for batch correction"
# Verify gene overlap after concatenation
if adata.n_vars < 100:
print(f"WARNING: Only {adata.n_vars} shared genes after concat — check gene ID harmonization")
batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.t_bulk_combat.ipynbomicverse_guide/docs/Tutorials-bulk/data/combat/reference.md