Archivo del skill

Crispr Screen Analyzer

Name: Crispr Screen Analyzer
Author: openclaw

Process CRISPR screening data to identify essential genes and hit candidates. Performs quality control, statistical analysis (RRA), and hit calling for pooled CRISPR screens including viability screens and drug resistance/sensitivity studies.

openclaw4,189 estrellas18 mar 2026

Ocupación
Categorías: Química Computacional

Contenido de la habilidad

Analyze pooled CRISPR screening data to identify essential genes, drug resistance/sensitivity candidates, and screen quality metrics. Supports Robust Rank Aggregation (RRA) analysis, quality control assessment, and hit identification for functional genomics studies.

Key Capabilities:

Quality Control Assessment: Calculate Gini index, read depth, and dropout metrics to evaluate screen quality
Log Fold Change Calculation: Compute sgRNA-level fold changes between treatment and control conditions
Statistical Analysis: Perform Robust Rank Aggregation (RRA) to identify significantly enriched or depleted sgRNAs
Hit Identification: Apply FDR and fold change thresholds to identify candidate genes
Multi-Sample Support: Process multiple replicates and treatment conditions simultaneously

When to Use

✅ Use this skill when:

Analyzing genome-wide viability screens to identify essential genes required for cell survival
Performing drug resistance screens to find genes whose knockout confers resistance

Skills relacionados

Crispr Screen Analyzer | Skills Pool

Library Design (crispr-grna-designer) → Transduction → Sequencing → fastqc-report-interpreter → crispr-screen-analyzer → go-kegg-enrichment → Hit Validation

from scripts.main import CRISPRScreenAnalyzer

# Initialize analyzer with count matrix and sample annotations
analyzer = CRISPRScreenAnalyzer(
    counts_file="sgrna_counts.txt",
    samplesheet="samples.csv"
)

# Calculate QC metrics
qc_results = analyzer.qc_metrics()

# Review key metrics
print("Quality Control Metrics:")
print(f"Total reads per sample:")
for sample, reads in qc_results['total_reads'].items():
    print(f"  {sample}: {reads:,} reads")

print(f"\nGini index (library representation):")
for sample, gini in qc_results['gini_index'].items():
    status = "✅ Good" if gini < 0.3 else "⚠️  Check" if gini < 0.4 else "❌ Poor"
    print(f"  {sample}: {gini:.3f} {status}")

print(f"\nZero-count sgRNAs (potential dropout):")
for sample, zeros in qc_results['zero_count_sgrnas'].items():
    pct = (zeros / len(analyzer.counts)) * 100
    print(f"  {sample}: {zeros} ({pct:.1f}%)")

Metric	Target Range	Interpretation
Gini Index	<0.3	Measures library evenness; lower = more uniform
Total Reads	>10M per sample	Sufficient depth for statistical power
Zero-count sgRNAs	<5%	Acceptable dropout; higher indicates library loss
Read Distribution	Log-normal	Should follow expected distribution

from scripts.main import CRISPRScreenAnalyzer

analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")

# Define sample groups
control_samples = ["Control_1", "Control_2", "Control_3"]
treatment_samples = ["Drug_1", "Drug_2", "Drug_3"]

# Calculate log fold changes
lfc = analyzer.calculate_lfc(control_samples, treatment_samples)

# Analyze distribution
print("Log Fold Change Statistics:")
print(f"  Mean: {lfc.mean():.3f}")
print(f"  Std:  {lfc.std():.3f}")
print(f"  Max:  {lfc.max():.3f}")
print(f"  Min:  {lfc.min():.3f}")

# Identify extreme changes
strong_depletion = lfc[lfc < -2]  # Strong negative selection
strong_enrichment = lfc[lfc > 2]   # Strong positive selection

print(f"\nStrongly depleted sgRNAs: {len(strong_depletion)}")
print(f"Strongly enriched sgRNAs: {len(strong_enrichment)}")

lfc = log2((treatment_mean + 1) / (control_mean + 1))

LFC Range	Interpretation	Biological Meaning
LFC < -2	Strong depletion	Essential gene or drug sensitivity
LFC -2 to -1	Moderate depletion	Moderate effect
LFC -1 to 1	No change	No significant effect
LFC 1 to 2	Moderate enrichment	Moderate resistance
LFC > 2	Strong enrichment	Resistance gene or suppressor

from scripts.main import CRISPRScreenAnalyzer

analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")

# Calculate LFC first
lfc = analyzer.calculate_lfc(
    control_samples=["Ctrl_1", "Ctrl_2"],
    treatment_samples=["Treat_1", "Treat_2"]
)

# Perform RRA analysis
results = analyzer.rra_analysis(lfc, fdr_threshold=0.05)

# Review top hits
print("Top 10 Most Significant sgRNAs:")
top_hits = results.nsmallest(10, 'fdr')
print(top_hits[['sgrna', 'lfc', 'pvalue', 'fdr']].to_string(index=False))

# Summary statistics
print(f"\nTotal sgRNAs tested: {len(results)}")
print(f"Significant at FDR < 0.05: {sum(results['fdr'] < 0.05)}")
print(f"Significant depletions: {sum((results['fdr'] < 0.05) & (results['lfc'] < 0))}")
print(f"Significant enrichments: {sum((results['fdr'] < 0.05) & (results['lfc'] > 0))}")

from scripts.main import CRISPRScreenAnalyzer

analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
lfc = analyzer.calculate_lfc(["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
results = analyzer.rra_analysis(lfc)

# Identify hits with multiple thresholds
threshold_configs = [
    {"fdr": 0.05, "lfc": 1.0, "name": "Standard"},
    {"fdr": 0.01, "lfc": 1.5, "name": "Stringent"},
    {"fdr": 0.1, "lfc": 0.5, "name": "Permissive"}
]

for config in threshold_configs:
    hits = analyzer.identify_hits(
        results, 
        fdr_threshold=config['fdr'],
        lfc_threshold=config['lfc']
    )
    
    depletions = hits[hits['lfc'] < 0]
    enrichments = hits[hits['lfc'] > 0]
    
    print(f"\n{config['name']} (FDR<{config['fdr']}, |LFC|>{config['lfc']}):")
    print(f"  Total hits: {len(hits)}")
    print(f"  Depletions: {len(depletions)}")
    print(f"  Enrichments: {len(enrichments)}")

# Save hits for downstream analysis
standard_hits = analyzer.identify_hits(results, fdr_threshold=0.05, lfc_threshold=1.0)
standard_hits.to_csv("hits_standard.csv", index=False)

Category	Criteria	Biological Interpretation
Essential	FDR<0.05, LFC<-1	Required for cell viability
Drug Sensitive	FDR<0.05, LFC<-1	Synthetic lethal with treatment
Drug Resistant	FDR<0.05, LFC>1	Confers resistance to treatment
Suppressor	FDR<0.05, LFC>1	Suppresses phenotype of interest

import pandas as pd
from scripts.main import CRISPRScreenAnalyzer

analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")
lfc = analyzer.calculate_lfc(["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
results = analyzer.rra_analysis(lfc)

# Add gene annotations (example mapping)
sgrna_to_gene = pd.read_csv("library_annotation.csv")  # sgRNA, Gene columns
results_with_gene = results.merge(sgrna_to_gene, on='sgrna')

# Aggregate to gene level
gene_results = results_with_gene.groupby('Gene').agg({
    'lfc': 'mean',           # Average LFC across sgRNAs
    'pvalue': 'min',         # Best p-value
    'fdr': 'min',            # Best FDR
    'sgrna': 'count'         # Number of sgRNAs
}).rename(columns={'sgrna': 'sgrna_count'})

# Filter genes with multiple sgRNAs
gene_results = gene_results[gene_results['sgrna_count'] >= 2]

# Identify gene-level hits
gene_hits = gene_results[
    (gene_results['fdr'] < 0.05) & 
    (abs(gene_results['lfc']) > 1.0)
]

print(f"Gene-level hits: {len(gene_hits)}")
print("\nTop 10 hits:")
print(gene_hits.nsmallest(10, 'fdr')[['lfc', 'pvalue', 'fdr', 'sgrna_count']])

Method	Description	Best For
Mean LFC	Average across sgRNAs	General hit calling
Best FDR	Most significant sgRNA	Conservative approach
Second-best	Second most significant	Reduces outlier effects
STARS/RRA	Rank-based aggregation	Standard CRISPR analysis

from scripts.main import CRISPRScreenAnalyzer

analyzer = CRISPRScreenAnalyzer("counts.txt", "samples.csv")

# Define multiple comparisons
comparisons = {
    "Drug_A": {
        "control": ["DMSO_1", "DMSO_2"],
        "treatment": ["DrugA_1", "DrugA_2"]
    },
    "Drug_B": {
        "control": ["DMSO_1", "DMSO_2"], 
        "treatment": ["DrugB_1", "DrugB_2"]
    },
    "Combination": {
        "control": ["DMSO_1", "DMSO_2"],
        "treatment": ["Combo_1", "Combo_2"]
    }
}

# Analyze all conditions
all_results = {}
for comp_name, samples in comparisons.items():
    lfc = analyzer.calculate_lfc(samples['control'], samples['treatment'])
    results = analyzer.rra_analysis(lfc)
    hits = analyzer.identify_hits(results)
    
    all_results[comp_name] = {
        'lfc': lfc,
        'results': results,
        'hits': hits
    }
    
    print(f"{comp_name}: {len(hits)} hits")

# Find common hits across conditions
common_hits = set(all_results['Drug_A']['hits'].index)
for comp in ['Drug_B', 'Combination']:
    common_hits &= set(all_results[comp]['hits'].index)

print(f"\nCommon hits across all conditions: {len(common_hits)}")

# Compare LFC correlations between conditions
import matplotlib.pyplot as plt

lfc_drugA = all_results['Drug_A']['lfc']
lfc_drugB = all_results['Drug_B']['lfc']

correlation = lfc_drugA.corr(lfc_drugB)
print(f"\nCorrelation between Drug A and Drug B: {correlation:.3f}")

Comparison Type	Question Addressed	Interpretation
Drug vs Control	What genes mediate drug response?	Resistance/sensitivity mechanisms
Condition A vs B	Differential genetic dependencies	Context-specific essentiality
Time-course	How does genetic dependency change?	Temporal dynamics
Cell line comparison	Cell-type specific dependencies	Lineage-specific vulnerabilities

# Step 1: Run QC assessment
python scripts/main.py --counts sgrna_counts.txt --samples samples.csv --output qc_results

# Step 2: Perform differential analysis
python scripts/main.py \
  --counts sgrna_counts.txt \
  --samples samples.csv \
  --control "Ctrl_1,Ctrl_2,Ctrl_3" \
  --treatment "Drug_1,Drug_2,Drug_3" \
  --output drug_screen \
  --fdr 0.05

# Step 3: Review results
cat drug_screen_sgrna_results.csv | head -20

from scripts.main import CRISPRScreenAnalyzer
import pandas as pd

def analyze_crispr_screen(
    counts_file: str,
    samplesheet: str,
    control_samples: list,
    treatment_samples: list,
    output_prefix: str,
    fdr_threshold: float = 0.05,
    lfc_threshold: float = 1.0
) -> dict:
    """
    Complete CRISPR screen analysis workflow.
    """
    # Initialize analyzer
    analyzer = CRISPRScreenAnalyzer(counts_file, samplesheet)
    
    print(f"Loaded {analyzer.counts.shape[0]} sgRNAs x {analyzer.counts.shape[1]} samples")
    
    # Quality control
    print("\n1. Quality Control Assessment...")
    qc = analyzer.qc_metrics()
    
    # Check QC status
    qc_pass = all(gini < 0.4 for gini in qc['gini_index'].values())
    if not qc_pass:
        print("⚠️  Warning: High Gini index detected - check library representation")
    
    # Calculate fold changes
    print("\n2. Calculating log fold changes...")
    lfc = analyzer.calculate_lfc(control_samples, treatment_samples)
    
    # Statistical analysis
    print("\n3. Running RRA analysis...")
    results = analyzer.rra_analysis(lfc, fdr_threshold)
    
    # Identify hits
    print("\n4. Identifying significant hits...")
    hits = analyzer.identify_hits(results, fdr_threshold, lfc_threshold)
    
    # Categorize hits
    depletions = hits[hits['lfc'] < 0]
    enrichments = hits[hits['lfc'] > 0]
    
    # Save results
    results.to_csv(f"{output_prefix}_sgrna_results.csv", index=False)
    hits.to_csv(f"{output_prefix}_hits.csv", index=False)
    
    # Compile summary
    summary = {
        'total_sgrnas': len(results),
        'significant_hits': len(hits),
        'depletions': len(depletions),
        'enrichments': len(enrichments),
        'qc_metrics': qc,
        'output_files': {
            'full_results': f"{output_prefix}_sgrna_results.csv",
            'hits': f"{output_prefix}_hits.csv"
        }
    }
    
    # Print summary
    print(f"\n{'='*60}")
    print("ANALYSIS SUMMARY")
    print(f"{'='*60}")
    print(f"Total sgRNAs: {summary['total_sgrnas']}")
    print(f"Significant hits (FDR<{fdr_threshold}, |LFC|>{lfc_threshold}): {summary['significant_hits']}")
    print(f"  - Depletions: {summary['depletions']}")
    print(f"  - Enrichments: {summary['enrichments']}")
    print(f"\nResults saved:")
    print(f"  - {summary['output_files']['full_results']}")
    print(f"  - {summary['output_files']['hits']}")
    print(f"{'='*60}")
    
    return summary

# Execute workflow
results = analyze_crispr_screen(
    counts_file="sgrna_counts.txt",
    samplesheet="samples.csv",
    control_samples=["Ctrl_1", "Ctrl_2", "Ctrl_3"],
    treatment_samples=["Drug_1", "Drug_2", "Drug_3"],
    output_prefix="drug_resistance_screen",
    fdr_threshold=0.05,
    lfc_threshold=1.0
)

analysis_results/
├── drug_resistance_screen_sgrna_results.csv  # All sgRNA statistics
├── drug_resistance_screen_hits.csv          # Significant hits only
└── qc_report.txt                            # Quality control summary

{
  "screen_type": "viability",
  "comparison": "T14_vs_T0",
  "expected_depletions": "Essential genes (ribosomal, splicing, etc.)",
  "expected_enrichments": "None (unless suppressors of toxicity)",
  "positive_controls": ["RPL30", "RPS19", "PCNA"],
  "negative_controls": ["LacZ", "NTC"],
  "analysis_parameters": {
    "fdr_threshold": 0.05,
    "lfc_threshold": 1.0,
    "gene_aggregation": "mean"
  }
}

Essential Gene Screen Results:
  Total sgRNAs tested: 65,383
  Significantly depleted: 3,847 sgRNAs (FDR<0.05, LFC<-1)
  
Top Essential Genes:
  RPL30: mean LFC = -4.2, 5/5 sgRNAs significant
  RPS19: mean LFC = -3.8, 4/5 sgRNAs significant
  PCNA:  mean LFC = -3.5, 5/5 sgRNAs significant
  
QC Metrics:
  Gini index: 0.25 (excellent library representation)
  Read depth: 25M per sample (sufficient)

{
  "screen_type": "drug_resistance",
  "treatment": "vemurafenib (2 μM)",
  "control": "DMSO",
  "duration": "14 days",
  "expected_depletions": "Drug sensitizers, synthetic lethal",
  "expected_enrichments": "Drug resistance genes",
  "known_resistance_genes": ["NRAS", "MAP2K1", "MEK1"],
  "analysis_parameters": {
    "fdr_threshold": 0.05,
    "lfc_threshold": 1.0,
    "focus": "enrichments"
  }
}

Drug Resistance Screen Results (Vemurafenib):
  Significant enrichments: 156 sgRNAs (FDR<0.05, LFC>1)
  
Top Resistance Genes:
  NRAS:   mean LFC = +2.8, 4/5 sgRNAs enriched
  MAP2K1: mean LFC = +2.5, 5/5 sgRNAs enriched
  MED12:  mean LFC = +2.1, 3/5 sgRNAs enriched
  
Validation recommended:
  - Test individual sgRNAs in dose-response assay
  - Confirm resistance phenotype with cell viability assay
  - Check for known resistance mechanisms

{
  "screen_type": "drug_sensitivity",
  "treatment": "PARP inhibitor (olaparib)",
  "control": "DMSO",
  "cell_line": "BRCA1-mutant ovarian cancer",
  "expected_depletions": "DNA repair genes (synthetic lethal)",
  "expected_enrichments": "Drug resistance mechanisms",
  "known_synthetic_lethal": ["PARP1", "BRCA2", "PALB2"],
  "analysis_parameters": {
    "fdr_threshold": 0.05,
    "lfc_threshold": 1.0,
    "focus": "depletions"
  }
}

Synthetic Lethality Screen (Olaparib in BRCA1-mutant):
  Significant depletions: 234 sgRNAs (FDR<0.05, LFC<-1)
  
Top Synthetic Lethal Hits:
  BRCA2:   mean LFC = -3.2, 5/5 sgRNAs depleted
  PALB2:   mean LFC = -2.8, 4/5 sgRNAs depleted
  RAD51C:  mean LFC = -2.5, 5/5 sgRNAs depleted
  
Biological Interpretation:
  - Strong enrichment of homologous recombination genes
  - Consistent with known synthetic lethal interactions
  - Potential combination therapy targets identified

{
  "screen_type": "comparative",
  "comparison": "Melanoma_vs_Lung_cancer",
  "cell_lines": ["A375", "SKMEL28", "A549", "H1299"],
  "analysis_type": "differential_essentiality",
  "expected_lineage_specific": {
    "melanoma": ["MITF", "SOX10", "TYR"],
    "lung": ["NKX2-1", "TP63"]
  },
  "analysis_parameters": {
    "fdr_threshold": 0.05,
    "lfc_threshold": 1.0,
    "replicate_requirement": 2
  }
}

Comparative Screen: Melanoma vs Lung Cancer
  Melanoma-specific essential: 127 genes
  Lung-specific essential: 203 genes
  Common essential: 1,847 genes
  
Top Melanoma-Specific Dependencies:
  MITF:   LFC diff = -4.5 (essential in melanoma, not lung)
  SOX10:  LFC diff = -3.8
  TYR:    LFC diff = -3.2
  
Top Lung-Specific Dependencies:
  NKX2-1: LFC diff = -3.9
  TP63:   LFC diff = -3.1
  
Therapeutic Implications:
  - Lineage-specific targets identified
  - Potential for tumor-type selective therapy

Screen Type	Comparison	Expected Hits	Typical Duration
Viability	T14 vs T0	Essential genes depleted	10-14 days
Drug Resistance	Drug vs DMSO	Resistance genes enriched	14-21 days
Drug Sensitivity	Drug vs DMSO	Sensitizers depleted	14-21 days
Comparative	Cell A vs Cell B	Lineage-specific dependencies	10-14 days
Sensitizer	Drug A+B vs Drug A	Combination targets	10-14 days

Parameter	Type	Default	Required	Description
`--counts`, `-c`	string	-	Yes	sgRNA count matrix file
`--samples`, `-s`	string	-	Yes	Sample annotation file
`--control`	string	-	No	Control samples (comma-separated)
`--treatment`, `-t`	string	-	No	Treatment samples (comma-separated)
`--output`, `-o`	string	-	No	Output directory
`--fdr`	float	0.05	No	FDR threshold

# Analyze CRISPR screen data
python scripts/main.py --counts sgrna_counts.txt --samples samplesheet.csv

# With specific control and treatment
python scripts/main.py --counts counts.txt --samples samples.csv --control "Ctrl1,Ctrl2" --treatment "Treat1,Treat2"

# Custom FDR threshold
python scripts/main.py --counts counts.txt --samples samples.csv --fdr 0.01 --output ./results

Risk Indicator	Assessment	Level
Code Execution	Python script executed locally	Low
Network Access	No external API calls	Low
File System Access	Read count files, write results	Low
Data Exposure	Processes genomic screening data	Medium
PHI Risk	May contain cell line genetic info	Low

# Python 3.7+
numpy
pandas
scipy

Column	Description	Usage
`sgrna`	sgRNA identifier	Mapping to genes
`lfc`	Log fold change	Effect size
`pvalue`	Raw p-value	Statistical significance
`fdr`	Adjusted p-value (FDR)	Multiple testing correction

Crispr Screen Analyzer

When to Use

Crispr Screen Analyzer

When to Use

Integration with Other Skills

Core Capabilities

1. Quality Control Metrics Calculation

2. Log Fold Change Calculation

3. Robust Rank Aggregation (RRA) Statistical Analysis

4. Hit Identification with Thresholds

5. Gene-Level Aggregation

6. Multi-Condition Comparison

Complete Workflow Example

Common Patterns

Pattern 1: Viability Screen (Essential Gene Identification)

Pattern 2: Drug Resistance Screen

Pattern 3: Drug Sensitivity/Synthetic Lethality Screen

Pattern 4: Comparative Screen (Cell Line vs Cell Line)

Quality Checklist

Common Pitfalls

Troubleshooting

References

Scripts

Common CRISPR Screen Types

Parameters

Usage

Basic Usage

Risk Assessment

Security Checklist

Prerequisites

Evaluation Criteria

Success Metrics

Test Cases

Lifecycle Status

Healthcare Cdss Patterns

Drug Discovery

Qmd

Attack Tree Construction

Azure Ai Anomalydetector Java

Viboscope