SARS-CoV-2 lineage classification (Pango), Freyja wastewater deconvolution, spike protein mutation tracking, and surveillance pipeline tools
| Tool | Purpose | Output |
|---|---|---|
| Pangolin | Lineage assignment from consensus | CSV with lineage + confidence |
| Nextclade | Clade + QC + amino acid changes | TSV with quality flags |
| Freyja | Wastewater lineage deconvolution | Lineage fractions per sample |
| GISAID / outbreak.info | Lineage frequency tracking | Web API + download |
# Lineage assignment
pangolin sequences.fasta --outfile lineages.csv --threads 4
# Clade + quality flags
nextclade run --input-fasta sequences.fasta \
--input-dataset sars-cov-2 --output-tsv nextclade.tsv
| Region | Positions | Mutation effects |
|---|---|---|
| NTD | 13–305 | Antibody binding |
| RBD | 319–541 | ACE2 contact; immune escape (E484K, K417N, F486V); affinity (N501Y) |
| Furin cleavage | 681–685 | P681H/R increases fitness |
# 1. Call variants from wastewater BAM
freyja variants wastewater.bam --variants variants.tsv --depths depths.tsv
# 2. Deconvolve lineages (uses UShER barcodes)
freyja demix variants.tsv depths.tsv --output lineages.csv
# 3. Aggregate time-series
freyja aggregate --inputdir ./samples/ --output aggregated.tsv
# 4. Plot
freyja plot aggregated.tsv --output lineage_plot.pdf
Freyja model: linear mixture model solving for lineage fractions given observed variant frequencies and a barcode matrix of lineage-defining mutations.
import numpy as np
import pandas as pd
from scipy.stats import pearsonr
from scipy.ndimage import gaussian_filter1d
# Smooth wastewater signal before correlating with clinical cases
ww_smooth = gaussian_filter1d(viral_load, sigma=3)
# Lead-time correlation (WW leads cases by ~5 days/weeks)
r, p = pearsonr(ww_smooth[:-lag], cases[lag:])
# Lineage frequency normalization (stacked area plot)
freq_matrix = np.array(list(lineage_freqs.values()))
freq_norm = freq_matrix / freq_matrix.sum(axis=0) * 100
# Dominant lineage per time point
dominant_idx = np.argmax(freq_norm, axis=0)
freyja update); stale barcodes misclassify new sublineages as "Other"