Comprehensive structural variant (SV) analysis skill for clinical genomics. Classifies SVs (deletions, duplications, inversions, translocations), assesses pathogenicity using ACMG-adapted criteria, evaluates gene disruption and dosage sensitivity, and provides clinical interpretation with evidence grading. Use when analyzing CNVs, large deletions/duplications, chromosomal rearrangements, or any structural variants requiring clinical interpretation.
Systematic analysis of structural variants (deletions, duplications, inversions, translocations, complex rearrangements) for clinical genomics interpretation using ACMG-adapted criteria.
KEY PRINCIPLES:
Structural variants (SVs) present unique interpretation challenges:
This skill provides: A systematic workflow integrating SV classification, gene content analysis, dosage sensitivity assessment, population frequencies, and ACMG-adapted criteria into clinically actionable interpretations.
Use this skill when users:
┌─────────────────────────────────────────────────────────────────┐
│ STRUCTURAL VARIANT INTERPRETATION │
├─────────────────────────────────────────────────────────────────┤
│ │
│ Phase 1: SV IDENTITY & CLASSIFICATION │
│ ├── Normalize SV coordinates (hg19/hg38) │
│ ├── Determine SV type (DEL/DUP/INV/TRA/CPX) │
│ ├── Calculate SV size │
│ └── Assess breakpoint precision │
│ │
│ Phase 2: GENE CONTENT ANALYSIS │
│ ├── Identify genes fully contained in SV │
│ ├── Identify genes with breakpoints (disrupted) │
│ ├── Annotate gene function and disease associations │
│ ├── Identify regulatory elements affected │
│ └── Assess gene orientation (for inversions/translocations) │
│ │
│ Phase 3: DOSAGE SENSITIVITY ASSESSMENT │
│ ├── ClinGen dosage sensitivity scores │
│ │ └─ Haploinsufficiency / Triplosensitivity ratings │
│ ├── DECIPHER haploinsufficiency predictions │
│ ├── pLI scores (gnomAD) for loss-of-function intolerance │
│ ├── OMIM gene-disease associations (dominant/recessive) │
│ └── Known dosage-sensitive genes from literature │
│ │
│ Phase 4: POPULATION FREQUENCY CONTEXT │
│ ├── gnomAD SV database (overlapping SVs) │
│ ├── DGV (Database of Genomic Variants) │
│ ├── ClinVar (known pathogenic/benign SVs) │
│ └── Calculate reciprocal overlap with population SVs │
│ │
│ Phase 5: PATHOGENICITY SCORING │
│ ├── Pathogenicity score (0-10 scale) │
│ │ ├─ Gene content weight (40%) │
│ │ ├─ Dosage sensitivity weight (30%) │
│ │ ├─ Population frequency weight (20%) │
│ │ └─ Inheritance/phenotype match weight (10%) │
│ ├── Apply ACMG SV criteria │
│ └── Generate classification recommendation │
│ │
│ Phase 6: LITERATURE & CLINICAL EVIDENCE │
│ ├── PubMed: Similar SVs, gene disruption studies │
│ ├── DECIPHER: Developmental disorder cases │
│ ├── Clinical case reports │
│ └── Functional evidence for gene dosage effects │
│ │
│ Phase 7: ACMG-ADAPTED CLASSIFICATION │
│ ├── Apply SV-specific evidence codes │
│ ├── Calculate final classification │
│ ├── Identify limiting factors │
│ └── Generate clinical recommendations │
│ │
└─────────────────────────────────────────────────────────────────┘
Goal: Standardize SV notation and classify type
SV Types:
| Type | Abbreviation | Description | Molecular Effect |
|---|---|---|---|
| Deletion | DEL | Loss of genomic segment | Haploinsufficiency, gene disruption |
| Duplication | DUP | Gain of genomic segment | Triplosensitivity, gene dosage imbalance |
| Inversion | INV | Segment flipped in orientation | Gene disruption at breakpoints, position effects |
| Translocation | TRA | Segment moved to different chromosome | Gene fusions, disruption, position effects |
| Complex | CPX | Multiple rearrangement types | Variable effects |
Key Information to Capture:
Example:
SV: arr[GRCh38] 17q21.31(44039927-44352659)x1
- Type: Deletion (heterozygous)
- Size: 313 kb
- Genes: MAPT, KANSL1 (fully contained)
- Breakpoints: Well-defined (array resolution ±5kb)
Goal: Comprehensive annotation of genes affected by SV
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
Ensembl_lookup_gene | Gene structure, coordinates | Gene boundaries, exons, transcripts |
NCBI_gene_search | Gene information | Official symbol, aliases, description |
Gene_Ontology_get_term_info | Gene function | Biological process, molecular function |
OMIM_search, OMIM_get_entry | Disease associations | Inheritance, clinical features |
DisGeNET_search_gene | Gene-disease associations | Evidence scores |
Gene Categories:
Fully contained genes - Entire gene within SV boundaries
Partially disrupted genes - Breakpoint within gene
Flanking genes - Within 1 Mb of breakpoints
Example Gene Content Analysis:
def analyze_gene_content(tu, chrom, sv_start, sv_end, sv_type):
"""
Identify and annotate all genes within SV region.
"""
genes = {
'fully_contained': [],
'partially_disrupted': [],
'flanking': []
}
# Use Ensembl to find overlapping genes
# This is pseudocode - actual implementation depends on available tools
for gene in genes_in_region:
gene_start = gene['start']
gene_end = gene['end']
# Classify gene relationship to SV
if gene_start >= sv_start and gene_end <= sv_end:
# Fully contained
gene_info = annotate_gene(tu, gene['symbol'])
genes['fully_contained'].append(gene_info)
elif (gene_start < sv_start < gene_end) or (gene_start < sv_end < gene_end):
# Partially disrupted
gene_info = annotate_gene(tu, gene['symbol'])
genes['partially_disrupted'].append(gene_info)
elif abs(gene_start - sv_end) < 1000000 or abs(gene_end - sv_start) < 1000000:
# Flanking (within 1 Mb)
gene_info = annotate_gene(tu, gene['symbol'])
genes['flanking'].append(gene_info)
return genes
def annotate_gene(tu, gene_symbol):
"""
Comprehensive gene annotation.
"""
# OMIM associations
omim = tu.tools.OMIM_search(
operation="search",
query=gene_symbol,
limit=5
)
# DisGeNET associations
disgenet = tu.tools.DisGeNET_search_gene(
operation="search_gene",
gene=gene_symbol,
limit=10
)
# Gene Ontology
# Note: Need gene ID first
ncbi = tu.tools.NCBI_gene_search(
term=gene_symbol,
organism="human"
)
return {
'symbol': gene_symbol,
'omim': omim,
'disgenet': disgenet,
'ncbi': ncbi
}
Report Section:
### 2.1 Fully Contained Genes (Complete Dosage Effect)
| Gene | Function | Disease Association | Inheritance | Evidence |
|------|----------|---------------------|-------------|----------|
| **MAPT** | Microtubule-associated protein tau | Frontotemporal dementia (AD) | Autosomal Dominant | ★★★ |
| **KANSL1** | Histone acetyltransferase complex | Koolen-De Vries syndrome (AD) | Autosomal Dominant | ★★★ |
**Interpretation**: Deletion results in haploinsufficiency of two dosage-sensitive genes. KANSL1 haploinsufficiency is the primary cause of pathogenicity.
*Sources: OMIM, DisGeNET, Ensembl*
### 2.2 Partially Disrupted Genes (Breakpoint Within Gene)
| Gene | Breakpoint Location | Effect | Critical Domains Lost |
|------|-------------------|--------|----------------------|
| **NF1** | Intron 28 of 58 | 5' portion deleted | Yes - GTPase-activating domain |
**Interpretation**: Breakpoint disrupts NF1 coding sequence, likely resulting in loss-of-function. NF1 is haploinsufficient (causes neurofibromatosis type 1).
### 2.3 Flanking Genes (Potential Position Effects)
| Gene | Distance from SV | Regulatory Risk | Evidence |
|------|------------------|-----------------|----------|
| **KCNJ2** | 450 kb upstream | Low | ★☆☆ |
**Note**: Position effects are possible but less common. Consider if phenotype unexplained by contained genes.
Goal: Determine if affected genes are dosage-sensitive
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
ClinGen_search_dosage_sensitivity | Gold standard curation | HI/TS scores (0-3) |
ClinGen_search_gene_validity | Gene-disease validity | Definitive/Strong/Moderate |
gnomad_search (pLI) | Loss-of-function intolerance | pLI score (0-1) |
DECIPHER_search | Developmental disorders | Patient phenotypes with similar SVs |
OMIM_get_entry | Inheritance pattern | AD/AR indicates dosage sensitivity |
ClinGen Dosage Sensitivity Scores:
| Score | Haploinsufficiency (HI) | Triplosensitivity (TS) | Interpretation |
|---|---|---|---|
| 3 | Sufficient evidence | Sufficient evidence | Gene IS dosage-sensitive |
| 2 | Emerging evidence | Emerging evidence | Likely dosage-sensitive |
| 1 | Little evidence | Little evidence | Insufficient evidence |
| 0 | No evidence | No evidence | No established dosage sensitivity |
pLI Score Interpretation (gnomAD):
| pLI Range | Interpretation | LoF Intolerance |
|---|---|---|
| ≥0.9 | Extremely intolerant | High - likely haploinsufficient |
| 0.5-0.9 | Moderately intolerant | Moderate |
| <0.5 | Tolerant | Low - likely NOT haploinsufficient |
Implementation:
def assess_dosage_sensitivity(tu, gene_list):
"""
Assess dosage sensitivity for all genes in SV.
Returns dosage scores and interpretation.
"""
dosage_data = []
for gene_symbol in gene_list:
# 1. ClinGen dosage sensitivity (gold standard)
clingen = tu.tools.ClinGen_search_dosage_sensitivity(
gene=gene_symbol
)
hi_score = None
ts_score = None
if clingen.get('data'):
for entry in clingen['data']:
hi_score = entry.get('Haploinsufficiency Score')
ts_score = entry.get('Triplosensitivity Score')
break
# 2. ClinGen gene validity (supports dosage sensitivity)
validity = tu.tools.ClinGen_search_gene_validity(
gene=gene_symbol
)
validity_level = None
if validity.get('data'):
for entry in validity['data']:
validity_level = entry.get('Classification')
break
# 3. pLI score from gnomAD (if available via gene search)
# Note: May need to use myvariant or other tools
# pli_score = get_pli_score(tu, gene_symbol)
# 4. OMIM inheritance pattern
omim = tu.tools.OMIM_search(
operation="search",
query=gene_symbol,
limit=3
)
inheritance_pattern = None
if omim.get('data', {}).get('entries'):
for entry in omim['data']['entries']:
mim = entry.get('mimNumber')
details = tu.tools.OMIM_get_entry(
operation="get_entry",
mim_number=str(mim)
)
# Extract inheritance from details
# inheritance_pattern = parse_inheritance(details)
# Integrate evidence
dosage_assessment = {
'gene': gene_symbol,
'hi_score': hi_score,
'ts_score': ts_score,
'validity_level': validity_level,
'inheritance': inheritance_pattern,
'is_dosage_sensitive': (hi_score == '3' or ts_score == '3'),
'evidence_grade': calculate_evidence_grade(hi_score, ts_score, validity_level)
}
dosage_data.append(dosage_assessment)
return dosage_data
def calculate_evidence_grade(hi_score, ts_score, validity):
"""
Calculate evidence grade for dosage sensitivity.
"""
if (hi_score == '3' or ts_score == '3') and validity == 'Definitive':
return '★★★' # High confidence
elif (hi_score in ['2', '3'] or ts_score in ['2', '3']):
return '★★☆' # Moderate confidence
else:
return '★☆☆' # Low confidence
Report Section:
### 3. Dosage Sensitivity Assessment
#### Haploinsufficient Genes (Deletions/Disruptions)
| Gene | ClinGen HI Score | pLI | Validity | Disease | Evidence |
|------|-----------------|-----|----------|---------|----------|
| **KANSL1** | 3 (Sufficient) | 0.99 | Definitive | Koolen-De Vries syndrome | ★★★ |
| **MAPT** | 2 (Emerging) | 0.85 | Strong | FTD (rare) | ★★☆ |
**Interpretation**: KANSL1 has definitive evidence for haploinsufficiency. Deletion of one copy is expected to cause Koolen-De Vries syndrome (intellectual disability, hypotonia, distinctive facial features).
*Sources: ClinGen Dosage Sensitivity Map, gnomAD pLI*
#### Triplosensitive Genes (Duplications)
| Gene | ClinGen TS Score | Disease Mechanism | Evidence |
|------|-----------------|-------------------|----------|
| **MECP2** | 3 (Sufficient) | MECP2 duplication syndrome | ★★★ |
| **PMP22** | 3 (Sufficient) | Charcot-Marie-Tooth 1A | ★★★ |
**Note**: For this deletion, triplosensitivity is not applicable. Listed for reference.
#### Non-Dosage-Sensitive Genes
| Gene | HI Score | TS Score | Interpretation |
|------|----------|----------|----------------|
| **GENE_X** | 0 | 0 | No established dosage sensitivity |
| **GENE_Y** | 1 | 1 | Insufficient evidence |
**Interpretation**: These genes lack evidence for dosage sensitivity. Deletion/duplication less likely to be pathogenic solely due to these genes.
Goal: Determine if SV is common in general population (likely benign) or rare (supports pathogenicity)
Tools:
| Tool | Purpose | Key Data |
|---|---|---|
gnomad_search | Population SV frequencies | Overlapping SVs, frequencies |
ClinVar_search_variants | Known pathogenic/benign SVs | Classification, review status |
DECIPHER_search | Patient SVs with phenotypes | Case reports, phenotype similarity |
Frequency Interpretation (adapted from ACMG):
| SV Frequency | ACMG Code | Interpretation |
|---|---|---|
| ≥1% in gnomAD SVs | BA1 (Stand-alone Benign) | Too common for rare disease |
| 0.1-1% | BS1 (Strong Benign) | Likely benign common variant |
| <0.01% | PM2 (Supporting Pathogenic) | Rare, supports pathogenicity |
| Absent | PM2 (Supporting) | Very rare, supports pathogenicity |
Reciprocal Overlap Calculation:
For proper comparison, calculate reciprocal overlap between query SV and population SV:
Reciprocal Overlap = min(overlap_with_A, overlap_with_B)