Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
SeuratClusteringOfAllCells processTOrBCellSelection (this is a pre-selection analysis)ClusterMarkersOfAllCells for complete pre-selection profiling[TopExpressingGenesOfAllCells]
cache = true
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
Note: srtobj accepts the output from SeuratClusteringOfAllCells.
[TopExpressingGenesOfAllCells.envs]
# Number of top expressing genes to identify per cluster
n = 250
# Enrichment style
enrich_style = "enrichr" # Options: "enrichr", "clusterprofiler"
# Enrichment databases
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
# Plot type for enrichment results
plot_type = "bar" # Options: "bar", "dot", "lollipop", "network", "enrichmap", "wordcloud"
# Device parameters
devpars = {res = 100, width = 800, height = 600}
# Additional output formats
more_formats = []
# Save R code to reproduce plots
save_code = false
# Top terms to display
top_term = 10 # Number of top enriched pathways to show
ncol = 1 # Number of columns in multi-panel plots
[TopExpressingGenesOfAllCells.envs]
# Subset cells before analysis (optional)
subset = ""
[TopExpressingGenesOfAllCells.envs]
# Cache intermediate results
cache = "/tmp" # true, false, or directory path
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 100
dbs = [
"KEGG_2021_Human",
"MSigDB_Hallmark_2020",
"GO_Biological_Process_2025"
]
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "bar"
top_term = 10
What to expect: Top 10 genes per cluster showing broad cell type markers (CD3 for T cells, CD19 for B cells, CD14 for monocytes, etc.)
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 50
[TopExpressingGenesOfAllCells.envs.enrich_plots]
"T Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"B Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"Myeloid Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
What to expect: Identification of T cell (CD3E, CD3D), B cell (CD19, MS4A1), and myeloid (CD14, LYZ) signatures across clusters
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 20
dbs = [
"GO_Biological_Process_2025",
"GO_Cellular_Component_2025"
]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "dot"
top_term = 15
What to expect: Detection of contamination (e.g., EPCAM for epithelial, COL1A1 for fibroblasts, RBC markers)
TopExpressingGenesOfAllCells vs TopExpressingGenes:
| Aspect | TopExpressingGenesOfAllCells | TopExpressingGenes |
|---|---|---|
| When it runs | BEFORE TOrBCellSelection | AFTER TOrBCellSelection |
| Input data | All cells (unfiltered) | Only selected T or B cells |
| Upstream process | SeuratClusteringOfAllCells | SeuratClustering + TOrBCellSelection |
| Use case | Initial assessment, quality check | Detailed T/B cell analysis |
| Cell types | ALL cell types present | Only T OR B cells |
| Typical markers | CD3, CD19, CD14, etc. | Specific T/B cell subtypes |
| Position in workflow | Pre-selection overview | Post-selection deep dive |
Workflow context:
RNA Input → SeuratPreparing → SeuratClusteringOfAllCells
↓
TopExpressingGenesOfAllCells ← Runs here
↓
TOrBCellSelection (separates T/B)
↓
SeuratClustering (on selected cells)
↓
TopExpressingGenes ← Runs here
Recommendation:
TopExpressingGenesOfAllCells to assess overall data quality and cell type compositionTopExpressingGenes for detailed analysis of T or B cell subtypesSeuratClusteringOfAllCellsTOrBCellSelection (optional - this process provides pre-selection context)n parameter: Must be positive integer (typically 10-500)dbs: Must be valid enrichit/Enrichr database names or local GMT file pathsenrich_style: Must be "enrichr" or "clusterprofiler"plot_type: Must be valid scplotter plot typeSeuratClusteringOfAllCells is enabledIssue: TopExpressingGenesOfAllCells not executed despite being in config
Causes:
SeuratClusteringOfAllCells not enabledSolutions:
SeuratClusteringOfAllCells is enabled in configpython -m immunopipe.validate_config config.toml[SeuratClusteringOfAllCells]
[TopExpressingGenesOfAllCells]
Issue: Clusters show multiple cell type markers (CD3 + CD19)
Causes:
Solutions:
SeuratClusteringOfAllCellsSeuratPreparing stepTOrBCellSelection after assessment to clean dataIssue: Top genes list lacks expected markers (CD3, CD19, CD14)
Causes:
Solutions:
SeuratClusterStatsOfAllCellsSeuratClusteringOfAllCellsIssue: Top genes list dominated by housekeeping genes (RPS, RPL, MT-)
Solutions:
n parameter to see beyond housekeeping genesSeuratPreparing stepClusterMarkersOfAllCells for differential expressionIssue: No pathways enriched despite top genes identified
Causes:
n too small for meaningful enrichmentSolutions:
n to 100-500 genesGO_Biological_Process_2025)Issue: Enrichment plots fail to render
Causes:
Solutions:
top_term parameterbar, dot)enrichit, scplotter<srtobj_stem>.top_expressing_genes/
├── <cluster_name>/ # One subdirectory per cluster (ALL cells)
│ ├── top_genes.tsv # Top N genes with expression metrics
│ └── enrich/ # Enrichment results
│ ├── <db_name>/ # One subdirectory per database
│ │ ├── *.Bar-Plot.png # Enrichment plots
│ │ ├── *.enrich.tsv # Enrichment tables
│ │ └── ...
Built-in databases:
KEGG_2021_Human - KEGG pathways (human)MSigDB_Hallmark_2020 - MSigDB Hallmark gene setsGO_Biological_Process_2025 - GO Biological Process termsGO_Cellular_Component_2025 - GO Cellular Component termsGO_Molecular_Function_2025 - GO Molecular Function termsReactome_Pathways_2024 - Reactome pathwaysWikiPathways_2024_Human - WikiPathways (human)Enrichr libraries: See https://maayanlab.cloud/Enrichr/#libraries
bar - Bar chart of enriched termsdot - Dot plot (bubble chart)lollipop - Lollipop plotnetwork - Network visualization of term relationshipsenrichmap - Enrichment map (similar to Cytoscape)wordcloud - Word cloud visualizationenrichr - Fisher's exact test (Enrichr-style)clusterprofiler - Hypergeometric test (clusterProfiler-style)TopExpressingGenes - Top genes for selected T/B cells after selectionClusterMarkersOfAllCells - Differential expression for all cells before selectionSeuratClusteringOfAllCells - Clustering on all cells before T/B selectionTOrBCellSelection - T/B cell separation process