Generates complete dual-disease transcriptomic + machine learning research designs from a user-provided disease pair. Use when users want to identify shared DEGs, common hub genes, cross-disease biomarkers, or shared molecular mechanisms between two diseases using public GEO data. Triggers: "shared biomarker study for two diseases", "dual-disease transcriptomic ML paper", "identify common DEGs between disease A and B", "cross-disease hub gene discovery", "shared DEG + PPI + ROC design", "immune infiltration shared biomarker", or "I want to study disease X and Y together". Always outputs four workload configurations (Lite / Standard / Advanced / Publication+) with a recommended primary plan, step-by-step workflow, figure plan, validation strategy, minimal executable version, and publication upgrade path.
Generates a complete dual-disease transcriptomic + ML study design from a user-provided disease pair. Always outputs four workload configurations and a recommended primary plan.
| Style | Description | Example |
|---|---|---|
| A. Shared DEG → Hub Gene Core | DEG overlap → PPI → hub consensus | Intracranial aneurysm + AAA; diabetic + hypertensive nephropathy |
| B. Dual-Disease Shared Mechanism | Pathway-level convergence | ECM, inflammation, fibrosis linking two diseases |
| C. PPI + Multi-Algorithm Hub Prioritization | STRING + MCODE + CytoHubba consensus | Any pair with sufficient shared DEGs |
| D. Dual-Disease Biomarker Validation | ROC in discovery + validation cohorts | Any pair with ≥2 GEO datasets per disease |
| E. Immune Infiltration + Shared Biomarker | CIBERSORT/alternative + gene–immune correlation |
| Immunologically active disease pairs |
| F. Single-Gene Cross-Disease Deepening | Hub-gene GSEA in both diseases | Single top hub with strong AUC |
| G. Publication-Oriented Integrated Design | Full pipeline: DEG → PPI → ROC → immune → GSEA | High-impact submission target |
Identify:
Always generate all four. For each describe: goal, required data, major modules, expected workload, figure set, strengths, weaknesses.
| Config | Goal | Timeframe | Best For |
|---|---|---|---|
| Lite | Shared DEG + basic hub, 1 dataset per disease | 2–4 weeks | Pilot, skeleton manuscript, single-dataset constraint |
| Standard | Full pipeline + validation + ROC + one deepening layer | 5–9 weeks | Core publishable paper |
| Advanced | Standard + immune + GSEA + multi-cohort robustness | 9–14 weeks | Competitive journal target |
| Publication+ | Full multi-layer + experimental suggestions + reviewer defense | 12–20 weeks | High-impact submission |
Select the best-fit configuration and explain why, given disease pair biology, GEO data availability, time constraints, and publication ambition.
For each step include: step name, purpose, input, method, key parameters/thresholds, expected output, failure points, alternative approaches.
Dataset & Preprocessing
Fault tolerance — dataset level:
DEG & Shared Signature
Fault tolerance — DEG intersection:
Enrichment & Shared Mechanism
PPI & Hub Prioritization
Biomarker Performance
Fault tolerance — ROC:
Immune Infiltration (when disease-appropriate per Hard Rule 5)
Single-Gene Deepening (Standard and above)
→ Full figure list and table templates: references/figure_plan_template.md
Core figures: workflow schematic (Fig 1), DEG volcanos + Venn (Fig 2), shared DEG heatmap (Fig 3), GO/KEGG enrichment (Fig 4), PPI + MCODE + hub ranking (Fig 5), ROC curves (Fig 6), immune infiltration + correlation (Fig 7), single-gene GSEA (Fig 8). Tables: dataset summary, shared DEG list, hub rankings, ROC/AUC summary.
State what each layer proves and what it does not prove:
Always include a self-critical section addressing:
Public data only, one discovery dataset per disease, DEG + Venn + GO/KEGG, STRING + MCODE + CytoHubba top gene, ROC in discovery cohort, one-page interpretation. 2–4 week timeline. Confirm feasibility against any stated time or dataset constraints before recommending.
→ Full upgrade impact table: references/upgrade_path.md
Key upgrades by impact: validation cohort per disease (High / Low–Medium), multi-algorithm hub consensus (High / Low), cross-platform reproducibility logic (High / Medium), immune infiltration (Medium / Medium), single-gene GSEA (Medium / Low), mini-signature 3–5 genes (Medium / Medium).
When providing R code examples or pipeline frameworks:
# EXAMPLE ID — replace with your actual GSE accession before runningif (length(shared_genes) == 0) {
stop("No shared DEGs found. Recovery options: (1) relax logFC to 0.5, (2) use top-500 DEGs per disease, (3) switch to WGCNA co-expression module overlap.")
}
BiocManager::install() calls where needed.GEOquery::getGEO("GSEsearch", ...) or direct search at https://www.ncbi.nlm.nih.gov/geo/Standard R pipeline template:
library(GEOquery); library(limma); library(clusterProfiler); library(pROC)
# Load datasets — EXAMPLE IDs: replace before running
gse_disease1 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]] # EXAMPLE ID
gse_disease2 <- getGEO("GSEXXXXX", GSEMatrix = TRUE)[[1]] # EXAMPLE ID
# DEG analysis (repeat for disease2)
design <- model.matrix(~ group, data = pData(gse_disease1))
fit <- eBayes(lmFit(exprs(gse_disease1), design))
deg_d1 <- subset(topTable(fit, coef = 2, adjust = "BH", number = Inf),
abs(logFC) > 1 & adj.P.Val < 0.05)
# Shared DEG intersection with zero-guard
shared_genes <- intersect(rownames(deg_d1), rownames(deg_d2))
if (length(shared_genes) == 0) {
stop("No shared DEGs found. Recovery: relax logFC to 0.5 or use top-500 DEGs per disease.")
}
# ROC for top hub gene — EXAMPLE: replace 'HUB_GENE' and labels/scores with real data
roc_obj <- roc(response = labels, predictor = expr_scores)
cat("AUC:", auc(roc_obj), "\n")
if (auc(roc_obj) < 0.70) warning("AUC below 0.70 threshold. Consider mini-signature approach.")
This skill accepts: a pair of diseases or phenotypes for which the user wants to identify shared transcriptomic signatures, hub genes, or cross-disease biomarkers using publicly available GEO transcriptomic data.
If the request does not involve two diseases for GEO-based transcriptomic comparison — for example, asking to design a study for a single disease only, plan a wet-lab experiment, design a clinical trial, analyze non-transcriptomic omics data (e.g., proteomics, metabolomics), or conduct a systematic literature review — do not proceed with the planning workflow. Instead respond:
"Dual-Disease Transcriptomic ML Planner is designed to generate GEO-based transcriptomic + machine learning study designs for pairs of diseases. Your request appears to be outside this scope. Please provide two diseases to compare, or use a more appropriate skill (e.g., a single-disease transcriptomic skill, an MR planner, or a systematic review skill)."
| File | Content | Used In |
|---|---|---|
| references/tissue_and_tool_decisions.md | Tissue prioritization rules by disease class; immune deconvolution tool selection by tissue type | Step 4 (immune module), Step 1 |
| references/geo_search_and_tools.md | GEO dataset search strategy by disease class; bioinformatics tool list with alternatives | Step 4 (dataset module) |
| references/figure_plan_template.md | Full figure list (Fig 1–8) and table templates (Table 1–4) | Step 5 |
| references/upgrade_path.md | Publication upgrade impact vs complexity table | Step 9 |