Build and interpret polygenic risk scores (PRS) for complex diseases using GWAS summary statistics. Calculates genetic risk profiles, interprets PRS percentiles, and assesses disease predisposition across conditions including type 2 diabetes, coronary artery disease, and Alzheimer's disease. Use when asked to calculate polygenic risk scores, interpret genetic risk for complex diseases, build custom PRS from GWAS data, or answer questions like "What is my genetic predisposition to breast cancer?"
mims-harvard1,271 starsMar 29, 2026
Occupation
Categories
Bioinformatics
Skill Content
Build and interpret polygenic risk scores for complex diseases using genome-wide association study (GWAS) data.
Reasoning Strategy
A polygenic risk score predicts genetic risk, not disease. A high PRS means elevated risk relative to the population — it does not mean the person will develop the condition, and a low PRS does not confer immunity. PRS performance varies dramatically across ancestries: a European-derived PRS applied to a West African population can lose 50–70% of its predictive power because the underlying GWAS was trained on European allele frequencies and LD patterns. Effect sizes from discovery GWAS are subject to winner's curse (overestimation in single studies); always prefer weights from large meta-analyses or validated PGS Catalog models. PRS should always be interpreted in the context of non-genetic risk factors — for most complex diseases, environmental factors contribute as much or more than genetics.
LOOK UP DON'T GUESS: Do not assume effect sizes, allele frequencies, or which SNPs are genome-wide significant for a trait — always query GWAS Catalog (gwas_get_associations_for_trait) for actual data. Do not assume a validated PRS model exists for a trait; check PGS Catalog via PubMed search.
Overview
Related Skills
Use Cases:
"Calculate my genetic risk for type 2 diabetes"
"Build a polygenic risk score for coronary artery disease"
"What's my genetic predisposition to Alzheimer's disease?"
"Interpret my PRS percentile for breast cancer risk"
What This Skill Does:
Extracts genome-wide significant variants (p < 5e-8) from GWAS Catalog
Builds weighted PRS models using effect sizes (beta coefficients)
Calculates individual risk scores from genotype data
Interprets PRS as population percentiles and risk categories
What This Skill Does NOT Do:
Diagnose disease (PRS is probabilistic, not deterministic)
Replace clinical assessment or genetic counseling
Account for non-genetic factors (lifestyle, environment)
Provide treatment recommendations
Methodology
PRS Calculation Formula
A polygenic risk score is calculated as a weighted sum across genetic variants:
PRS = Σ (dosage_i × effect_size_i)
Where:
dosage_i: Number of effect alleles at SNP i (0, 1, or 2)
effect_size_i: Beta coefficient or log(odds ratio) from GWAS
Standardization
Raw PRS is standardized to z-scores for interpretation:
Note: disease_trait search returns associations where the trait is one of potentially several linked EFO traits. For precise filtering, use EFO IDs via efo_trait param.
Open Targets Genetics
Integrated genetics platform with fine-mapped credible sets
gnomad_search_variants + gnomad_get_variant — population allele frequencies (ancestry-specific via VEP colocated_variants)
MyVariant_query_variants — CADD, SIFT, PolyPhen, ClinVar, gnomAD in one call
gnomad_get_gene_constraints — gene constraint metrics (pLI, oe_lof) for target prioritization
Key Concepts
Polygenic Risk Scores (PRS)
Polygenic risk scores aggregate the effects of many genetic variants to estimate an individual's genetic predisposition to a trait or disease. Unlike Mendelian diseases caused by single mutations, complex diseases involve hundreds to thousands of variants, each with small effects.
Key Properties:
Continuous distribution: PRS forms a bell curve in populations
Relative risk: Compares individual to population average
Ancestry-specific: PRS accuracy depends on matching GWAS and target ancestry
GWAS (Genome-Wide Association Studies)
GWAS compare allele frequencies between cases and controls (or correlate with trait values) across millions of SNPs to identify disease-associated variants.
Study Design:
Discovery cohort: Initial identification of associations
Replication cohort: Validation in independent samples
Understand genetic contribution vs. lifestyle factors
Caution: Personal PRS should not replace medical advice. Results may cause anxiety if not properly contextualized.
Limitations and Considerations
Heritability gap: PRS explains only a fraction of genetic heritability (T2D: ~50% heritable, PRS explains ~10–20%). Rare variants, epistasis, and gene-environment interactions are not captured.
Ancestry bias: European-derived PRS performance drops substantially in non-European populations. Use multi-ancestry GWAS weights when available.
Winner's curse: Discovery effect sizes are overestimated; use meta-analysis weights or PGS Catalog validated models.
Not diagnostic: High PRS does not guarantee disease; low PRS does not guarantee protection. Environmental factors contribute equally or more for most complex diseases.
Actionability varies: Alzheimer's PRS has limited actionable interventions; cardiovascular PRS can guide statin or lifestyle decisions. Always consider what the person can do with the information.
Ethical: Genetic data is permanent and familial. GINA protects employment/health insurance in the US, but not life insurance. Provide genetic counseling context.
Workflow
1. Trait Selection
Identify the disease or trait of interest:
Use standard terminology (e.g., "type 2 diabetes" not "T2D")
High risk: > 95th percentile (substantially increased risk)
Clinical Interpretation:
Percentiles assume normal distribution
Relative risk vs. average (not absolute risk)
Combine with family history, clinical risk factors
PRS is NOT diagnostic - many high-risk individuals never develop disease
Best Practices
Use validated PRS from PGS Catalog when available (externally validated, includes LD clumping and ancestry-specific weights)
Match ancestries between GWAS and target population; use multi-ancestry GWAS when available
For highly polygenic traits (height, education), relaxed p-value thresholds capture more signal; for oligogenic traits (IBD, T1D), strict thresholds are better
Combine PRS with clinical risk scores (Framingham, QRISK) for integrated prediction
In research: document SNP selection criteria, LD clumping parameters, and ancestry of GWAS; validate in held-out cohorts; report R² or AUC stratified by ancestry
Disclaimer
This skill is for educational and research purposes only.
Not for clinical diagnosis or treatment decisions
Not validated for clinical use - use PGS Catalog models for clinical-grade PRS