Integrate and analyze multiple omics datasets (transcriptomics, proteomics, epigenomics, genomics, metabolomics) for systems biology and precision medicine. Performs cross-omics correlation, multi-omics clustering (MOFA+, NMF), pathway-level integration, and sample matching. Coordinates ToolUniverse skills for expression data (RNA-seq), epigenomics (methylation, ChIP-seq), variants (SNVs, CNVs), protein interactions, and pathway enrichment. Use when analyzing multi-omics datasets, performing integrative analysis, discovering multi-omics biomarkers, studying disease mechanisms across molecular layers, or conducting systems biology research that requires coordinated analysis of transcriptome, genome, epigenome, proteome, and metabolome data.
Coordinate and integrate multiple omics datasets for comprehensive systems biology analysis. Orchestrates specialized ToolUniverse skills to perform cross-omics correlation, multi-omics clustering, pathway-level integration, and unified interpretation.
Multi-omics integration asks whether different molecular layers tell a concordant story. If a gene is upregulated in RNA-seq AND its protein is elevated in proteomics, that is concordant evidence of true biological change. Discordance — high mRNA but low protein, or elevated protein without matching mRNA — may indicate post-transcriptional regulation (miRNA silencing, protein degradation, translational control) and is itself a meaningful finding worth reporting. Not every discordance is noise; some are the most interesting biology.
ReactomeAnalysis_pathway_enrichment or gseapy on the actual gene lists; never list enriched pathways from memory.Phase 1: Data Loading & QC
Load each omics type, format-specific QC, normalize
Supported: RNA-seq, proteomics, methylation, CNV/SNV, metabolomics
Phase 2: Sample Matching
Harmonize sample IDs, find common samples, handle missing omics
Phase 3: Feature Mapping
Map features to common gene-level identifiers
CpG->gene (promoter), CNV->gene, metabolite->enzyme
Phase 4: Cross-Omics Correlation
RNA vs Protein (translation efficiency)
Methylation vs Expression (epigenetic regulation)
CNV vs Expression (dosage effect)
eQTL variants vs Expression (genetic regulation)
Phase 5: Multi-Omics Clustering
MOFA+, NMF, SNF for patient subtyping
Phase 6: Pathway-Level Integration
Aggregate omics evidence at pathway level
Score pathway dysregulation with combined evidence
Phase 7: Biomarker Discovery
Feature selection across omics, multi-omics classification
Phase 8: Integrated Report
Summary, correlations, clusters, pathways, biomarkers
See: phase_details.md for complete code and implementation details.
| Omics | Formats | QC Focus |
|---|---|---|
| Transcriptomics | CSV/TSV, HDF5, h5ad | Low-count filter, normalize (TPM/DESeq2), log-transform |
| Proteomics | MaxQuant, Spectronaut, DIA-NN | Missing value imputation, median/quantile normalization |
| Methylation | IDAT, beta matrices | Failed probes, batch correction, cross-reactive filter |
| Genomics | VCF, SEG (CNV) | Variant QC, CNV segmentation |
| Metabolomics | Peak tables | Missing values, normalization |
def match_samples_across_omics(omics_data_dict):
"""Match samples across multiple omics datasets."""
sample_ids = {k: set(df.columns) for k, df in omics_data_dict.items()}
common_samples = set.intersection(*sample_ids.values())
matched_data = {k: df[sorted(common_samples)] for k, df in omics_data_dict.items()}
return sorted(common_samples), matched_data
from scipy.stats import spearmanr, pearsonr
# RNA vs Protein: expect positive r ~ 0.4-0.6
# Methylation vs Expression: expect negative r (promoter repression)
# CNV vs Expression: expect positive r (dosage effect)
for gene in common_genes:
r, p = spearmanr(rna[gene], protein[gene])
# Score pathway dysregulation using combined evidence from all omics
# Aggregate per-gene evidence, then per-pathway
pathway_score = mean(abs(rna_fc) + abs(protein_fc) + abs(meth_diff) + abs(cnv))
See: phase_details.md for full implementations of each operation.
| Method | Description | Best For |
|---|---|---|
| MOFA+ | Latent factors explaining cross-omics variation | Identifying shared/omics-specific drivers |
| Joint NMF | Shared decomposition across omics | Patient subtype discovery |
| SNF | Similarity network fusion | Integrating heterogeneous data types |
| Skill | Used For | Phase |
|---|---|---|
tooluniverse-rnaseq-deseq2 | RNA-seq analysis | 1, 4 |
tooluniverse-epigenomics | Methylation, ChIP-seq | 1, 4 |
tooluniverse-variant-analysis | CNV/SNV processing | 1, 3, 4 |
tooluniverse-protein-interactions | Protein network context | 6 |
tooluniverse-gene-enrichment | Pathway enrichment | 6 |
tooluniverse-expression-data-retrieval | Public data retrieval | 1 |
tooluniverse-target-research | Gene/protein annotation | 3, 8 |
Integrate TCGA RNA-seq + proteomics + methylation + CNV to identify patient subtypes, cross-omics driver genes, and multi-omics biomarkers.
Identify SNP -> methylation -> expression regulatory chains (mediation analysis).
Predict drug response using baseline multi-omics profiles; identify resistance/sensitivity pathways.
See: phase_details.md "Use Cases" for detailed step-by-step workflows.
| Component | Requirement |
|---|---|
| Omics types | At least 2 datasets |
| Common samples | At least 10 across omics |
| Cross-correlation | Pearson/Spearman computed |
| Clustering | At least one method (MOFA+, NMF, or SNF) |
| Pathway integration | Enrichment with multi-omics evidence scores |
| Report | Summary, correlations, clusters, pathways, biomarkers |