Gene set enrichment analysis with correct geneset format handling. Critical guidance for loading pathway databases and running enrichment in OmicVerse.
This skill covers gene set enrichment analysis (GSEA) and pathway enrichment workflows in OmicVerse. It provides critical guidance on the correct data formats and API usage patterns to avoid common errors.
The ov.bulk.geneset_enrichment() function requires a dictionary of gene sets, NOT a file path string. You must first load the geneset file using ov.utils.geneset_prepare().
CORRECT usage:
# Step 1: Download pathway database (if not already available)
ov.utils.download_pathway_database()
# Step 2: Load geneset file into dictionary format - REQUIRED!
pathways_dict = ov.utils.geneset_prepare(
'genesets/GO_Biological_Process_2021.txt', # or .gmt file
organism='Human' # or 'Mouse'
)
# Step 3: Now run enrichment with the DICTIONARY
enr = ov.bulk.geneset_enrichment(
gene_list=deg_genes,
pathways_dict=pathways_dict, # Pass the DICTIONARY, not file path!
pvalue_type='auto',
organism='Human'
)
WRONG - DO NOT USE:
# WRONG! Don't pass file path directly to geneset_enrichment!
# enr = ov.bulk.geneset_enrichment(
# gene_list=deg_genes,
# pathways_dict='genesets/GO_Biological_Process_2021.gmt' # ERROR! String path doesn't work!
# )
# WRONG! geneset_enrichment expects dict, not file path
# enr = ov.bulk.geneset_enrichment(
# gene_list=deg_genes,
# pathways_dict='GO_Biological_Process_2021' # ERROR!
# )
| File Extension | Load Method | Notes |
|---|---|---|
.txt | ov.utils.geneset_prepare() | OmicVerse format |
.gmt | ov.utils.geneset_prepare() | Standard GMT format |
.json | json.load() then convert | Custom handling needed |
import omicverse as ov
# 1. Setup
ov.plot_set()
# 2. Ensure pathway database is available
ov.utils.download_pathway_database()
# 3. Load gene sets - ALWAYS use geneset_prepare first!
go_bp = ov.utils.geneset_prepare('genesets/GO_Biological_Process_2021.txt', organism='Human')
go_mf = ov.utils.geneset_prepare('genesets/GO_Molecular_Function_2021.txt', organism='Human')
kegg = ov.utils.geneset_prepare('genesets/KEGG_2021_Human.txt', organism='Human')
# 4. Prepare gene list (e.g., from DEG analysis)
# Assuming dds is a pyDEG object with results
deg_genes = dds.result.loc[dds.result['sig'] != 'normal'].index.tolist()
# 5. Run enrichment with dictionary
enr_go_bp = ov.bulk.geneset_enrichment(
gene_list=deg_genes,
pathways_dict=go_bp, # Dictionary, NOT file path!
pvalue_type='auto',
organism='Human'
)
# 6. Visualize results
ov.bulk.geneset_plot(enr_go_bp, figsize=(6, 8), num=10)
# 7. For multiple databases, combine into dict
enr_dict = {
'GO_BP': enr_go_bp,
'GO_MF': enr_go_mf,
'KEGG': enr_kegg
}
colors_dict = {
'GO_BP': '#1f77b4',
'GO_MF': '#ff7f0e',
'KEGG': '#2ca02c'
}
ov.bulk.geneset_plot_multi(enr_dict, colors_dict, num=5)
Cause: Passing file path string instead of dictionary to geneset_enrichment()
Solution: First load with ov.utils.geneset_prepare(), then pass the returned dictionary
Cause: Pathway database not downloaded
Solution: Run ov.utils.download_pathway_database() first
Cause: Gene list doesn't overlap with pathway genes, or organism mismatch Solution:
organism parameter matches your dataAfter running ov.utils.download_pathway_database():
GO_Biological_Process_2021.txtGO_Molecular_Function_2021.txtGO_Cellular_Component_2021.txtKEGG_2021_Human.txtKEGG_2021_Mouse.txtReactome_2022.txtWikiPathway_2023_Human.txtgeneset_enrichment()download_pathway_database() once per environmentorganism='Human' or organism='Mouse'background parametert_deg.ipynb (enrichment section)ov.utils.download_pathway_database()reference.md