Generate alignment statistics using samtools flagstat, stats, depth, and coverage. Use when assessing alignment quality, calculating coverage, or generating QC reports.
Reference examples tested with: pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Get alignment statistics and coverage from my BAM file" → Generate read counts, mapping rates, per-chromosome statistics, depth profiles, and coverage summaries.
samtools flagstat, samtools stats, samtools depth, samtools coverage (samtools)pysam.AlignmentFile with pileup() and get_index_statistics() (pysam)Generate alignment statistics using samtools and pysam.
| Command | Output | Speed |
|---|---|---|
flagstat | Read counts by category | Very fast |
idxstats | Per-chromosome counts | Very fast (needs index) |
stats | Comprehensive statistics | Moderate |
depth | Per-position depth | Slow (full scan) |
coverage | Per-region coverage | Fast (needs index) |
Fast summary of alignment flags.
samtools flagstat input.bam
Output:
10000000 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
50000 + 0 supplementary
0 + 0 duplicates
9800000 + 0 mapped (98.00% : N/A)
9950000 + 0 paired in sequencing
4975000 + 0 read1
4975000 + 0 read2
9700000 + 0 properly paired (97.49% : N/A)
9750000 + 0 with itself and mate mapped
100000 + 0 singletons (1.01% : N/A)
25000 + 0 with mate mapped to a different chr
10000 + 0 with mate mapped to a different chr (mapQ>=5)
samtools flagstat -@ 4 input.bam
samtools flagstat input.bam > flagstat.txt
Per-chromosome read counts (requires index).
samtools idxstats input.bam
Output format: chrom length mapped unmapped
chr1 248956422 5000000 1000
chr2 242193529 4800000 800
chrM 16569 50000 100
* 0 0 150000
# Total mapped reads
samtools idxstats input.bam | awk '{sum += $3} END {print sum}'
# Mitochondrial percentage
samtools idxstats input.bam | awk '
/^chrM/ {mt = $3}
{total += $3}
END {print mt/total*100 "% mitochondrial"}'
Comprehensive statistics including insert size, base quality, and more.
samtools stats input.bam > stats.txt
samtools stats input.bam | grep "^SN"
Key summary fields:
raw total sequences - Total readsreads mapped - Mapped readsreads mapped and paired - Properly pairedinsert size average - Mean insert sizeinsert size standard deviation - Insert size spreadaverage length - Mean read lengtherror rate - Mismatch ratesamtools stats input.bam > stats.txt
plot-bamstats -p plots/ stats.txt
samtools stats input.bam chr1:1000000-2000000 > region_stats.txt
Per-position read depth.
samtools depth input.bam > depth.txt
Output: chrom position depth
samtools depth -r chr1:1000-2000 input.bam
samtools depth -a input.bam > depth_with_zeros.txt
samtools depth -d 0 input.bam # No cap (default 8000)
samtools depth -b regions.bed input.bam
samtools depth input.bam | awk '{sum += $3; n++} END {print sum/n}'
Per-chromosome or per-region coverage statistics (faster than depth).
samtools coverage input.bam
Output columns:
#rname - Reference namestartpos - Start positionendpos - End positionnumreads - Number of readscovbases - Bases with coveragecoverage - Percentage of bases coveredmeandepth - Mean depthmeanbaseq - Mean base qualitymeanmapq - Mean mapping qualitysamtools coverage -r chr1:1000000-2000000 input.bam
samtools coverage -b regions.bed input.bam
samtools coverage -m input.bam
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
total = mapped = paired = proper = 0
for read in bam:
total += 1
if not read.is_unmapped:
mapped += 1
if read.is_paired:
paired += 1
if read.is_proper_pair:
proper += 1
print(f'Total: {total}')
print(f'Mapped: {mapped} ({mapped/total*100:.1f}%)')
print(f'Properly paired: {proper} ({proper/paired*100:.1f}%)')
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for stat in bam.get_index_statistics():
print(f'{stat.contig}: {stat.mapped} mapped, {stat.unmapped} unmapped')
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for pileup in bam.pileup('chr1', 1000000, 1000001):
print(f'Position {pileup.pos}: depth {pileup.n}')
import pysam
def mean_depth(bam_path, chrom, start, end):
depths = []
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup in bam.pileup(chrom, start, end, truncate=True):
depths.append(pileup.n)
if depths:
return sum(depths) / len(depths)
return 0
depth = mean_depth('input.bam', 'chr1', 1000000, 2000000)
print(f'Mean depth: {depth:.1f}x')
Goal: Compute coverage breadth and depth for a genomic region from a BAM file.
Approach: Iterate pileup columns in the region, count covered positions and accumulate depth, then derive percentages and means.
Reference (pysam 0.22+):
import pysam
def coverage_stats(bam_path, chrom, start, end):
covered = 0
total_depth = 0
with pysam.AlignmentFile(bam_path, 'rb') as bam:
for pileup in bam.pileup(chrom, start, end, truncate=True):
covered += 1
total_depth += pileup.n
length = end - start
pct_covered = covered / length * 100
mean_depth = total_depth / length if length > 0 else 0
return {
'length': length,
'covered_bases': covered,
'pct_covered': pct_covered,
'mean_depth': mean_depth
}
stats = coverage_stats('input.bam', 'chr1', 1000000, 2000000)
print(f'Coverage: {stats["pct_covered"]:.1f}%')
print(f'Mean depth: {stats["mean_depth"]:.1f}x')
Goal: Compute the insert size distribution to assess library preparation quality.
Approach: Iterate properly paired read1 records, accumulate template lengths into a Counter, then compute summary statistics.
Reference (pysam 0.22+):
import pysam
from collections import Counter
insert_sizes = Counter()
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam:
if read.is_proper_pair and read.is_read1 and read.template_length > 0:
insert_sizes[read.template_length] += 1
sizes = list(insert_sizes.keys())
mean_insert = sum(s * c for s, c in insert_sizes.items()) / sum(insert_sizes.values())
print(f'Mean insert size: {mean_insert:.0f}')
print(f'Min: {min(sizes)}, Max: {max(sizes)}')
| Task | Command |
|---|---|
| Quick counts | samtools flagstat input.bam |
| Per-chrom counts | samtools idxstats input.bam |
| Full stats | samtools stats input.bam |
| Coverage summary | samtools coverage input.bam |
| Per-position depth | samtools depth input.bam |
| Mean depth | samtools depth input.bam | awk '{sum+=$3;n++}END{print sum/n}' |
| Metric | Good | Concerning |
|---|---|---|
| Mapping rate | >95% | <80% |
| Proper pair rate | >90% | <70% |
| Duplicate rate | <20% | >40% |
| Error rate | <1% | >2% |
| Coverage uniformity | <2x CV | >3x CV |