View, convert, and understand SAM/BAM/CRAM alignment files using samtools and pysam. Use when inspecting alignments, converting between formats, or understanding alignment file structure.
Reference examples tested with: pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Read a BAM file" → Open a binary alignment file and iterate over aligned reads with their mapping coordinates, flags, and quality scores.
pysam.AlignmentFile() (pysam)samtools view (samtools)scanBam() (Rsamtools)View and convert alignment files using samtools and pysam.
| Format | Description | Use Case |
|---|---|---|
| SAM | Text format, human-readable | Debugging, small files |
| BAM | Binary compressed SAM | Standard storage format |
| CRAM | Reference-based compression | Long-term archival, smaller than BAM |
@HD VN:1.6 SO:coordinate
@SQ SN:chr1 LN:248956422
@RG ID:sample1 SM:sample1
@PG ID:bwa PN:bwa VN:0.7.17
read1 0 chr1 100 60 50M * 0 0 ACGT... FFFF... NM:i:0
Header lines start with @:
@HD - Header metadata (version, sort order)@SQ - Reference sequence dictionary@RG - Read group information@PG - Program used to create fileAlignment fields (tab-separated):
samtools view input.bam | head
samtools view -h input.bam | head -100
samtools view -H input.bam
samtools view input.bam chr1:1000-2000
samtools view -c input.bam
Goal: Convert between SAM (text), BAM (binary), and CRAM (reference-compressed) alignment formats.
Approach: Use samtools view with format flags (-b for BAM, -C for CRAM, -h for SAM with header). CRAM requires a reference FASTA with -T.
samtools view -h -o output.sam input.bam
samtools view -b -o output.bam input.sam
samtools view -C -T reference.fa -o output.cram input.bam
samtools view -b -T reference.fa -o output.bam input.cram
samtools view -b input.sam > output.bam
| Flag | Decimal | Meaning |
|---|---|---|
| 0x1 | 1 | Paired |
| 0x2 | 2 | Proper pair |
| 0x4 | 4 | Unmapped |
| 0x8 | 8 | Mate unmapped |
| 0x10 | 16 | Reverse strand |
| 0x20 | 32 | Mate reverse strand |
| 0x40 | 64 | First in pair |
| 0x80 | 128 | Second in pair |
| 0x100 | 256 | Secondary alignment |
| 0x200 | 512 | Failed QC |
| 0x400 | 1024 | PCR duplicate |
| 0x800 | 2048 | Supplementary |
samtools flags 147
# 0x93 147 PAIRED,PROPER_PAIR,REVERSE,READ2
| Op | Description |
|---|---|
| M | Alignment match (can be mismatch) |
| I | Insertion to reference |
| D | Deletion from reference |
| N | Skipped region (introns in RNA-seq) |
| S | Soft clipping |
| H | Hard clipping |
| = | Sequence match |
| X | Sequence mismatch |
Example: 50M2I30M = 50 bases match, 2 base insertion, 30 bases match
Goal: Read and manipulate alignment data programmatically in Python.
Approach: Use pysam.AlignmentFile to open BAM/CRAM files, iterate over reads, and access properties like coordinates, flags, CIGAR, and tags.
import pysam
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam:
print(f'{read.query_name}\t{read.reference_name}:{read.reference_start}')
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for sq in bam.header['SQ']:
print(f'{sq["SN"]}: {sq["LN"]} bp')
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam:
print(f'Name: {read.query_name}')
print(f'Flag: {read.flag}')
print(f'Chrom: {read.reference_name}')
print(f'Pos: {read.reference_start}') # 0-based
print(f'MAPQ: {read.mapping_quality}')
print(f'CIGAR: {read.cigarstring}')
print(f'Seq: {read.query_sequence}')
print(f'Qual: {read.query_qualities}')
break
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam:
if read.is_paired and read.is_proper_pair:
if read.is_reverse:
strand = '-'
else:
strand = '+'
print(f'{read.query_name} on {strand} strand')
with pysam.AlignmentFile('input.bam', 'rb') as bam:
for read in bam.fetch('chr1', 1000, 2000):
print(read.query_name)
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('output.sam', 'w', header=infile.header) as outfile:
for read in infile:
outfile.write(read)
with pysam.AlignmentFile('input.bam', 'rb') as infile:
with pysam.AlignmentFile('output.cram', 'wc', reference_filename='reference.fa', header=infile.header) as outfile:
for read in infile:
outfile.write(read)
| Task | samtools | pysam |
|---|---|---|
| View BAM | samtools view file.bam | AlignmentFile('file.bam', 'rb') |
| View header | samtools view -H file.bam | bam.header |
| Count reads | samtools view -c file.bam | sum(1 for _ in bam) |
| Get region | samtools view file.bam chr1:1-1000 | bam.fetch('chr1', 0, 1000) |
| BAM to SAM | samtools view -h -o out.sam in.bam | Open with 'w' mode |
| SAM to BAM | samtools view -b -o out.bam in.sam | Open with 'wb' mode |
| BAM to CRAM | samtools view -C -T ref.fa -o out.cram in.bam | Open with 'wc' mode |