Map GWAS loci to ranked candidate genes using a deterministic multi-skill chain (EFO -> GWAS -> coordinates -> Open Targets L2G/coloc -> eQTL -> burden/coding context), with reproducible tables and optional figures. Use when a user provides a trait/EFO term and/or lead variants and needs locus-to-gene prioritization for downstream biology decisions.
Generate a reproducible locus-to-gene mapping for one trait (or a seed set of lead variants), with explicit evidence attribution and conservative confidence labels.
This skill is optimized for bioinformaticians who need executable, traceable mapping from variant signals to plausible causal genes.
Provide at least one anchor source:
trait_query (string), for example chronic obstructive pulmonary diseaseefo_id (string), for example EFO_0000341seed_rsids (list[string]), for example ["rs1873625", "rs7903146"]target_gene (string), optional gene of interest for highlighting in outputshow_child_traits (bool), default truephenotype_terms (list[string]), optional additional terms to include when finding anchorsmax_anchor_associations (int), default 1200max_loci (int), default 25max_genes_per_locus (int), default 10max_coloc_rows_per_locus (int), default 100max_eqtl_rows_per_variant (int), default 200genebass_burden_sets (list[string]), default ["pLoF", "missense|LC"]include_clinvar (bool), default trueinclude_gnomad_context (bool), default trueinclude_hpa_tissue_context (bool), default trueinclude_figures (bool), default falsedisable_default_seeds (bool), default false; if false, common traits automatically get built-in seed rsIDsfigure_output_dir (string), default ./output/figuresmapping_output_path (string), default ./output/locus_to_gene_mapping.jsonsummary_output_path (string), default ./output/locus_to_gene_summary.md3.11+requestsmatplotlib, seaborn, pandasscripts/map_locus_to_gene.pyRun:
python locus-to-gene-mapper-skill/scripts/map_locus_to_gene.py \
--input-json /path/to/input.json \
--print-result
Quick start (no input JSON file):
python locus-to-gene-mapper-skill/scripts/map_locus_to_gene.py \
--trait-query "type 2 diabetes" \
--print-result
Trait-only runs default to include_figures=true unless explicitly disabled with --no-include-figures.
Minimal input JSON:
{
"trait_query": "type 2 diabetes"
}
Built-in default seeds (when disable_default_seeds=false):
type 2 diabetes / t2d -> rs7903146, rs13266634, rs7756992, rs5219, rs1801282, rs4402960coronary artery disease / cad -> rs1333049, rs4977574, rs9349379, rs6725887, rs1746048, rs3184504body mass index / bmi -> rs9939609, rs17782313, rs6548238, rs10938397, rs7498665, rs7138803asthma -> rs7216389, rs2305480, rs9273349rheumatoid arthritis -> rs2476601, rs3761847, rs660895alzheimer disease -> rs429358, rs7412, rs6733839, rs11136000, rs3851179ldl cholesterol / total cholesterol -> rs7412, rs429358, rs6511720, rs629301, rs12740374, rs11591147When a user asks for locus-to-gene mapping and gives only a trait (for example, type 2 diabetes), do the following automatically:
--trait-query "<user_trait>" --print-result (no manual JSON required).No anchors remained, rerun once with a built-in default seed rsID for that trait (unless disable_default_seeds=true).mapping_output_path and summary_output_path.Top 5 cross-locus prioritized genesPer-locus top gene (score, confidence)Visualization artifact (figure path(s) or Mermaid fallback block)Warnings and limitationsinline_image_markdown from script resultDo not ask the user to run python manually unless execution is actually blocked.
Use these skills in order. Skip only when an earlier step is not needed by provided inputs.
efo-ontology-skill
trait_query to canonical EFO term and synonyms.show_child_traits=true.gwas-catalog-skill
variant-coordinate-finder-skill
opentargets-skill
gtex-eqtl-skill
genebass-gene-burden-skill
clinvar-variation-skill (when include_clinvar=true)
gnomad-graphql-skill (when include_gnomad_context=true)
human-protein-atlas-skill (when include_hpa_tissue_context=true)
Never perform additional retrieval after final candidate-gene scoring starts.
Always return:
locus_to_gene_mapping.jsonlocus_to_gene_summary.md{
"meta": {
"trait_query": "...",
"efo_id": "EFO_...",
"generated_at": "ISO-8601",
"sources_queried": []
},
"anchors": [
{
"rsid": "rs...",
"grch38": {"chr": "3", "pos": 49629531, "ref": "A", "alt": "C"},
"lead_trait": "...",
"p_value": 2e-11,
"cohort": "..."
}
],
"loci": [
{
"locus_id": "chr3:49000000-50200000",
"lead_rsid": "rs...",
"candidate_genes": [
{
"symbol": "MST1",
"ensembl_id": "ENSG...",
"overall_score": 0.71,
"confidence": "High|Medium|Low|VeryLow",
"evidence": {
"l2g_max": 0.83,
"coloc_max_h4": 0.84,
"eqtl_tissues": ["Lung"],
"rare_variant_support": "none|nominal|strong",
"coding_support": "none|noncoding|coding",
"clinvar_support": "none|present",
"gnomad_context": "...",
"hpa_tissue_support": ["lung"]
},
"rationale": [
"..."
],
"limitations": [
"..."
]
}
]
}
],
"cross_locus_ranked_genes": [
{
"symbol": "...",
"supporting_loci": 3,
"mean_score": 0.62,
"max_score": 0.81
}
],
"warnings": [],
"limitations": []
}
The summary must include sections in this exact order:
ObjectiveInputs and scopeAnchor variant summaryPer-locus top genesCross-locus prioritized genesKey caveatsRecommended next analysesOnly produce figures when include_figures=true.
If figures are generated, append this block to JSON:
{
"figures": [
{
"id": "locus_gene_heatmap",
"path": "./output/figures/locus_gene_heatmap.png",
"caption": "Top candidate genes by evidence component across loci"
}
]
}
Recommended figure set:
locus_gene_heatmap.png
L2G, coloc, eQTL, burden, coding).locus_score_decomposition.png
tissue_support_dotplot.png
If plotting dependencies are unavailable, skip PNG generation and output Mermaid diagrams in markdown as fallback.
The script also returns inline_image_markdown and render_instructions fields to support inline chat rendering.
For each candidate gene per locus, compute:
l2g_component: max L2G score for the gene in locus (0..1)coloc_component: max h4 (or clpp when only CLPP is available), clipped to 0..1eqtl_component: min(1, relevant_tissue_hits / 3)burden_component:
1.0 if burden p < 2.5e-60.6 if 2.5e-6 <= p < 0.050.0 otherwisecoding_component:
1.0 for coding consequence in target gene with supportive ClinVar annotation0.6 for coding consequence in target gene without supportive ClinVar annotation0.3 for noncoding-in-gene support only0.0 otherwiseOverall score:
overall_score = 0.40*l2g + 0.25*coloc + 0.15*eqtl + 0.10*burden + 0.10*coding
Confidence label:
High if score >= 0.75Medium if 0.55 <= score < 0.75Low if 0.35 <= score < 0.55VeryLow if score < 0.35trait_query, efo_id, seed_rsids is present.seed_rsids.max_loci and log dropped anchors in warnings.figures metadata.Fail the run when any of the following occurs:
overall_score outside 0..1.def map_locus_to_gene(input_json: dict) -> dict:
...
Return:
{
"status": "ok",
"mapping_output_path": "./output/locus_to_gene_mapping.json",
"summary_output_path": "./output/locus_to_gene_summary.md",
"figure_paths": [],
"warnings": []
}
source skill + endpoint family) in rationale lines.