Differential protein abundance testing using MSstats, limma, proDA, and scipy/statsmodels for Python. Multiple testing correction with BH FDR.
Statistical testing for differentially abundant proteins between experimental conditions.
python omicsclaw.py run differential-abundance --demo
python omicsclaw.py run differential-abundance --input <protein_matrix.csv> --output <dir>
library(MSstats)
comparison_matrix <- matrix(c(1, -1, 0, 0,
1, 0, -1, 0,
0, 1, -1, 0),
nrow = 3, byrow = TRUE)
rownames(comparison_matrix) <- c('Treatment1-Control', 'Treatment2-Control', 'Treatment1-Treatment2')
colnames(comparison_matrix) <- c('Control', 'Treatment1', 'Treatment2', 'Treatment3')
results <- groupComparison(contrast.matrix = comparison_matrix, data = processed)
sig_proteins <- results$ComparisonResult[results$ComparisonResult$adj.pvalue < 0.05 &
abs(results$ComparisonResult$log2FC) > 1, ]
library(limma)
design <- model.matrix(~ 0 + condition, data = sample_info)
colnames(design) <- levels(sample_info$condition)
fit <- lmFit(protein_matrix, design)
contrast_matrix <- makeContrasts(Treatment - Control, levels = design)
fit2 <- contrasts.fit(fit, contrast_matrix)
fit2 <- eBayes(fit2)
results <- topTable(fit2, number = Inf, adjust.method = 'BH')
sig_results <- results[results$adj.P.Val < 0.05 & abs(results$logFC) > 1, ]
library(QFeatures)
library(proDA)
fit <- proDA(protein_matrix, design = ~ condition, data = sample_info)
results <- test_diff(fit, contrast = 'conditionTreatment')
results$adj_pval <- p.adjust(results$pval, method = 'BH')
sig_results <- results[results$adj_pval < 0.05 & abs(results$diff) > 1, ]
import pandas as pd
import numpy as np
from scipy import stats
from statsmodels.stats.multitest import multipletests
def differential_test(intensities, group1_cols, group2_cols):
results = []
for protein in intensities.index:
g1 = intensities.loc[protein, group1_cols].dropna()
g2 = intensities.loc[protein, group2_cols].dropna()
if len(g1) >= 2 and len(g2) >= 2:
stat, pval = stats.ttest_ind(g1, g2)
log2fc = g2.mean() - g1.mean()
results.append({'protein': protein, 'log2FC': log2fc, 'pvalue': pval})
df = pd.DataFrame(results)
df['adj_pvalue'] = multipletests(df['pvalue'], method='fdr_bh')[1]
return df
sig = results[(results['adj_pvalue'] < 0.05) & (abs(results['log2FC']) > 1)]
library(ggplot2)
ggplot(results, aes(x = log2FC, y = -log10(adj.P.Val))) +
geom_point(aes(color = significant), alpha = 0.6) +
geom_hline(yintercept = -log10(0.05), linetype = 'dashed') +
geom_vline(xintercept = c(-1, 1), linetype = 'dashed') +
scale_color_manual(values = c('grey', 'red')) +
theme_minimal()
| Parameter | Default | Description |
|---|---|---|
--method | limma | limma, msstats, proda, ttest |
--fdr-cutoff | 0.05 | FDR threshold |
--fc-cutoff | 1.0 | Log2FC threshold |
output_directory/
├── report.md
├── result.json
├── statistics.csv
├── figures/
│ └── volcano_plot.png
├── tables/
│ └── differential_results.csv
└── reproducibility/
├── commands.sh
├── requirements.txt
└── checksums.sha256
Trigger conditions:
Chaining partners:
quantification — Upstream normalized expressionprot-enrichment — Downstream pathway enrichmentReference examples tested with: limma 3.58+, MSstats 4.8+, scipy 1.12+, statsmodels 0.14+
Required: numpy, pandas, scipy, statsmodels Optional: MSstats (R), limma (R), proDA (R), QFeatures (R)
quantification — Prepare normalized data for testingprot-enrichment — Pathway enrichment of significant proteinsptm — PTM-level differential analysis