Complete Two-Sample Mendelian Randomization (MR) research workflow. Use when the user needs to perform causal inference studies using genetic instruments, including: (1) Analyzing causal effects between exposure and outcome, (2) Generating MR visualizations (scatter plots, forest plots, funnel plots, leave-one-out plots), (3) Conducting sensitivity analyses (heterogeneity, pleiotropy, Steiger test), (4) Creating research reports and GitHub Pages for sharing results.
Complete workflow for conducting Two-Sample MR studies using TwoSampleMR R package.
library(TwoSampleMR)
# 1. Get exposure data
exposure <- extract_instruments("ieu-a-299") # HDL-C
# 2. Get outcome data
outcome <- extract_outcome_data(snps = exposure$SNP, outcomes = "ieu-a-7") # CAD
# 3. Harmonise
dat <- harmonise_data(exposure, outcome)
# 4. Run MR
results <- mr(dat)
# 5. Generate plots
mr_scatter_plot(results, dat)
Create Docker environment with required packages:
FROM rocker/r-ver:4.3.0
# System dependencies
RUN apt-get update && apt-get install -y \
libcurl4-openssl-dev libssl-dev libxml2-dev \
libfontconfig1-dev libharfbuzz-dev libfribidi-dev \
libfreetype6-dev libpng-dev libtiff5-dev libjpeg-dev \
curl git && rm -rf /var/lib/apt/lists/*
# CRAN packages
RUN R -e "install.packages(c('ggplot2', 'dplyr', 'tidyr', 'knitr', 'rmarkdown', 'remotes', 'patchwork'), repos='https://cloud.r-project.org/')"
# GitHub packages (with proxy if needed)
ARG HTTP_PROXY
ARG HTTPS_PROXY
RUN R -e "remotes::install_github('mrcieu/ieugwasr', upgrade='never')"
RUN R -e "remotes::install_github('MRCIEU/TwoSampleMR', upgrade='never')"
WORKDIR /research
Build with proxy:
docker build --build-arg HTTP_PROXY=http://proxy:port \
--build-arg HTTPS_PROXY=http://proxy:port \
--network=host -t mr-research:latest .
# Set environment variable
export OPENGWAS_JWT="your_jwt_token_here"
# Or in R
Sys.setenv(OPENGWAS_JWT = "your_jwt_token_here")
Get free token at: https://api.opengwas.io/
library(TwoSampleMR)
# Exposure data (instrument selection)
exposure_dat <- extract_instruments(
outcomes = "ieu-a-299", # GWAS ID for exposure
p1 = 5e-8, # Genome-wide significance
clump = TRUE, # LD clumping
r2 = 0.001 # LD threshold
)
# Outcome data
outcome_dat <- extract_outcome_data(
snps = exposure_dat$SNP,
outcomes = "ieu-a-7" # GWAS ID for outcome
)
dat <- harmonise_data(exposure_dat, outcome_dat)
# Run multiple MR methods
mr_results <- mr(dat, method_list = c(
"mr_ivw", # Inverse variance weighted (primary)
"mr_egger_regression", # MR-Egger (pleiotropy test)
"mr_weighted_median", # Weighted median
"mr_weighted_mode" # Weighted mode
))
# Heterogeneity test
heterogeneity <- mr_heterogeneity(dat)
# Pleiotropy test (MR-Egger intercept)
pleiotropy <- mr_pleiotropy_test(dat)
# Steiger directionality test
steiger <- directionality_test(dat)
# Leave-one-out analysis
loo <- mr_leaveoneout(dat)
# Single SNP analysis
single <- mr_singlesnp(dat)
# Scatter plot
p1 <- mr_scatter_plot(mr_results, dat)
ggsave("scatter_plot.png", p1[[1]], width = 10, height = 8, dpi = 300)
# Forest plot
p2 <- mr_forest_plot(single)
ggsave("forest_plot.png", p2[[1]], width = 10, height = 12, dpi = 300)
# Funnel plot
p3 <- mr_funnel_plot(single)
ggsave("funnel_plot.png", p3[[1]], width = 10, height = 8, dpi = 300)
# Leave-one-out plot
p4 <- mr_leaveoneout_plot(loo)
ggsave("leave_one_out.png", p4[[1]], width = 10, height = 8, dpi = 300)
mr-research/
├── scripts/
│ ├── analysis.R # Main MR analysis
│ └── sensitivity.R # Sensitivity analyses
├── data/ # GWAS data (cached)
├── results/
│ ├── mr_results.csv
│ ├── heterogeneity.csv
│ ├── pleiotropy.csv
│ └── sensitivity/
├── figures/
│ ├── scatter_plot.png
│ ├── forest_plot.png
│ ├── funnel_plot.png
│ └── leave_one_out.png
├── report/
│ └── report.html
├── docs/ # GitHub Pages
│ └── index.html
├── Dockerfile
└── README.md
| ID | Trait | Sample Size |
|---|---|---|
| ieu-a-299 | HDL cholesterol | 99,900 |
| ieu-a-300 | LDL cholesterol | 95,454 |
| ieu-a-301 | Total cholesterol | 94,595 |
| ieu-a-302 | Triglycerides | 88,989 |
| ID | Trait | Sample Size |
|---|---|---|
| ieu-a-7 | Coronary heart disease | 184,305 |
| ebi-a-GCST006414 | Stroke | 446,696 |
| ID | Trait | Sample Size |
|---|---|---|
| ieu-b-40 | BMI | 681,275 |
| ebi-a-GCST90002232 | Type 2 diabetes | 898,130 |
| Problem | Solution |
|---|---|
| 401 Unauthorized | Set OPENGWAS_JWT environment variable |
| Cannot access GitHub | Use HTTP_PROXY build arg in Docker |
| TwoSampleMR install fails | Install ieugwasr first |
| No SNPs after clumping | Relax r2 threshold (e.g., 0.01) |
| Metric | Good Result | Concern |
|---|---|---|
| IVW P-value | P < 0.05 | P > 0.05 (no effect) |
| MR-Egger intercept | P > 0.05 | P < 0.05 (pleiotropy) |
| Heterogeneity Q | P > 0.05 | P < 0.05 (use random-effects) |
| Steiger direction | TRUE | FALSE (reverse causation) |
| Leave-one-out | Stable | Single SNP drives result |
Use this prompt to quickly set up similar MR studies with AI:
请帮我完成一个两样本孟德尔随机化研究,分析 [暴露因素] 对 [结局变量] 的因果效应。
### 数据来源
- 暴露因素:[因素名称],数据集 ID:[ieu-a-xxx]
- 结局变量:[疾病名称],数据集 ID:[ieu-a-xxx]
### 分析流程
1. 从 OpenGWAS 获取 GWAS 汇总数据
2. 选择工具变量(P < 5×10⁻⁸)
3. Harmonise 数据
4. 运行 MR 分析(IVW, MR-Egger 等)
5. 敏感性分析
6. 生成图表和报告
Full template: references/ai_prompt_template.md
| Document | Description |
|---|---|
| GWAS Datasets | Complete list of available GWAS datasets |
| AI Prompt Template | Template for AI-powered MR research |
| Research Process | Complete HDL-CVD study documentation |
| Analysis Script | R script template for MR analysis |
| Report Template | HTML report template |