Use omicverse's pyComBat wrapper to remove batch effects from merged bulk RNA-seq or microarray cohorts, export corrected matrices, and benchmark pre/post correction visualisations.
Apply this skill when a user has multiple bulk expression matrices measured across different batches and needs to harmonise them
before downstream analysis. It follows t_bulk_combat.ipynb, w
hich demonstrates the pyComBat workflow on ovarian cancer microarray cohorts.
omicverse as ov, anndata, pandas as pd, and matplotlib.pyplot as plt.ov.ov_plot_set() (aliased ov.plot_set() in some releases) to align figures with omicverse styling.pd.read_pickle(...)/pd.read_csv(...).anndata.AnnData objects so adata.obs stores sample metadata.batch column for every cohort (adata.obs['batch'] = '1', '2', ...). Encourage descriptive labels when availa
ble.anndata.concat([adata1, adata2, adata3], merge='same') to retain the intersection of genes across batches.adata reports balanced sample counts per batch; if not, prompt users to re-check inputs.ov.bulk.batch_correction(adata, batch_key='batch').adata.layers['batch_correction'] while the original counts remain in adata.X.adata.to_df().T (raw) and adata.to_df(layer='batch_correction').T (corrected)..to_csv(...)) plus the harmonised AnnData (adata.write_h5ad('adata_batch.h5ad', compressio n='gzip')).ov.utils.red_color, blue_color, gree n_color palettes to match batches.adata.layers['raw'] = adata.X.copy() before PCA.ov.pp.pca(adata, layer='raw', n_pcs=50) and ov.pp.pca(adata, layer='batch_correction', n_pcs=50).ov.utils.embedding(..., basis='raw|original|X_pca', color='batch', frameon='small') and repeat fo
r the corrected layer to verify mixing.batch_correction layer is missing, ensure the batch_key matches the column name in adata.obs.t_bulk_combat.ipynbomicverse_guide/docs/Tutorials-bulk/data/combat/reference.md