Call protein-RNA binding site peaks from CLIP-seq data using CLIPper, PureCLIP, or Piranha. Use when identifying RBP binding sites from aligned CLIP reads.
Reference examples tested with: MACS3 3.0+, bedtools 2.31+, pandas 2.2+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Call binding sites from my CLIP-seq data" → Identify protein-RNA interaction sites from aligned CLIP reads using specialized peak callers that model crosslink-induced signatures.
clipper -b aligned.bam -s hg38 -o peaks.bed (CLIPper)pureclip -i aligned.bam -bai aligned.bam.bai -g genome.fa -o sites.bed (PureCLIP)Goal: Identify protein-RNA binding sites from aligned CLIP-seq reads.
Approach: Run CLIPper on deduplicated BAM with species-specific gene annotations to call peaks with FDR control and superlocal background correction.
# Basic peak calling
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--save-pickle
# With FDR threshold
clipper \
-b deduped.bam \
-s hg38 \
-o peaks.bed \
--FDR 0.05 \
--superlocal
# Specify gene annotations
clipper \
-b deduped.bam \
-s hg38 \
--gene genes.bed \
-o peaks.bed
| Option | Description |
|---|---|
| -b | Input BAM file |
| -s | Species (hg38, mm10) |
| -o | Output BED file |
| --FDR | FDR threshold (default 0.05) |
| --superlocal | Use superlocal background |
| --gene | Custom gene annotation BED |
| --save-pickle | Save intermediate data |
Goal: Identify single-nucleotide crosslink sites using a hidden Markov model that jointly models enrichment and truncation signals.
Approach: Run PureCLIP with aligned BAM and genome FASTA to produce both single-nucleotide crosslink sites and broader binding regions, optionally using an input control BAM.
PureCLIP uses an HMM to model crosslink sites, incorporating enrichment and truncation signals.
# Installation
conda install -c bioconda pureclip
# Basic peak calling
pureclip \
-i deduped.bam \
-bai deduped.bam.bai \
-g genome.fa \
-o crosslink_sites.bed \
-or binding_regions.bed \
-nt 4
# -nt 4: Number of threads. Adjust based on CPU cores.
# -o: Single-nucleotide crosslink sites
# -or: Broader binding regions
| Option | Description |
|---|---|
| -i | Input BAM file |
| -bai | BAM index file |
| -g | Reference genome FASTA |
| -o | Crosslink sites output |
| -or | Binding regions output |
| -nt | Number of threads |
| -iv | Interval file to restrict analysis |
| -dm | Min distance for merging |
# With SMInput control BAM
pureclip \
-i clip.bam \
-bai clip.bam.bai \
-g genome.fa \
-ibam sminput.bam \
-ibai sminput.bam.bai \
-o crosslinks.bed \
-or regions.bed \
-nt 8
# -ibam/-ibai: Input control BAM for background modeling
# Crosslink sites BED contains:
# chr start end name score strand
# Score interpretation:
# Higher scores = more confident crosslink sites
# Filter by score
# score>=3: Medium confidence. Use 5+ for high confidence.
awk '$5 >= 3' crosslink_sites.bed > filtered_sites.bed
# eCLIP (recommended settings)
pureclip -i eclip.bam -bai eclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4 -dm 8
# iCLIP (single-nucleotide resolution)
pureclip -i iclip.bam -bai iclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# PAR-CLIP (T-to-C transitions)
pureclip -i parclip.bam -bai parclip.bam.bai -g genome.fa \
-o sites.bed -or regions.bed -nt 4
# Basic usage
Piranha -s deduped.bam -o peaks.bed
# With p-value threshold
Piranha -s deduped.bam -o peaks.bed -p 0.01
# Stranded analysis
Piranha -s deduped.bam -o peaks.bed -p 0.01 -u
# Zero-truncated negative binomial
Piranha -s deduped.bam -o peaks.bed -d ZeroTruncatedNegativeBinomial
# PAR-CLIP specific caller
peakachu adaptive \
-c control.bam \
-t treatment.bam \
-r reference.fa \
-o peakachu_peaks.gff
# Use narrow peak calling mode
macs3 callpeak \
-t deduped.bam \
-f BAM \
-g hs \
-n clip_peaks \
--nomodel \
--extsize 50 \
-q 0.01
Goal: Call CLIP-seq peaks separately on each strand for strand-specific binding analysis.
Approach: Split the BAM into plus and minus strand reads using samtools flag filtering, call peaks on each strand independently, and merge.
# Split BAM by strand
samtools view -h -F 16 deduped.bam | samtools view -Sb - > plus_strand.bam
samtools view -h -f 16 deduped.bam | samtools view -Sb - > minus_strand.bam
# Call peaks on each strand
clipper -b plus_strand.bam -s hg38 -o peaks_plus.bed
clipper -b minus_strand.bam -s hg38 -o peaks_minus.bed
# Combine
cat peaks_plus.bed peaks_minus.bed | sort -k1,1 -k2,2n > peaks_all.bed
# By score
awk '$5 >= 10' peaks.bed > peaks_filtered.bed
# By size
awk '($3 - $2) >= 20 && ($3 - $2) <= 200' peaks.bed > peaks_sized.bed
# By read count (if in name field)
awk '$5 >= 5' peaks.bed > peaks_min5reads.bed
# Use bedtools to find consensus peaks
bedtools intersect -a rep1_peaks.bed -b rep2_peaks.bed -wa | \
sort -u > consensus_peaks.bed
# Require overlap in N replicates
bedtools multiinter -i rep1.bed rep2.bed rep3.bed | \
awk '$4 >= 2' | \
bedtools merge > consensus_peaks.bed
import pandas as pd
def load_clip_peaks(bed_path):
peaks = pd.read_csv(bed_path, sep='\t', header=None,
names=['chrom', 'start', 'end', 'name', 'score', 'strand'])
return peaks
def peak_stats(peaks):
stats = {
'n_peaks': len(peaks),
'mean_width': (peaks['end'] - peaks['start']).mean(),
'median_score': peaks['score'].median(),
'peaks_per_chrom': peaks.groupby('chrom').size().to_dict()
}
return stats
peaks = load_clip_peaks('peaks.bed')
print(peak_stats(peaks))
| Metric | Good Value | Description |
|---|---|---|
| Peak count | 1,000-50,000 | Depends on RBP |
| Peak width | 20-100 nt | Typical for RBP footprint |
| FRiP | >0.1 | Fraction reads in peaks |
# Reads in peaks
reads_in_peaks=$(bedtools intersect -a deduped.bam -b peaks.bed -u | samtools view -c -)
# Total reads
total_reads=$(samtools view -c deduped.bam)
# FRiP
frip=$(echo "scale=4; $reads_in_peaks / $total_reads" | bc)
echo "FRiP: $frip"