Count reads per gene from aligned BAM files using Subread featureCounts. Use when processing BAM files from STAR/HISAT2 to generate gene-level counts for DESeq2/edgeR.
Reference examples tested with: DESeq2 1.42+, HISAT2 2.2.1+, STAR 2.7.11+, Subread 2.0+, edgeR 4.0+, pandas 2.2+, scanpy 1.10+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturespackageVersion('<pkg>') then ?function_name to verify parameters<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Count reads per gene from my BAM files" → Assign aligned reads to genomic features using a GTF annotation to produce a gene-by-sample count matrix for DE analysis.
featureCounts -a genes.gtf -o counts.txt sample1.bam sample2.bamCount reads mapping to genomic features (genes, exons) from BAM files.
# Single sample
featureCounts -a annotation.gtf -o counts.txt aligned.bam
# Multiple samples (recommended - single matrix output)
featureCounts -a annotation.gtf -o counts.txt sample1.bam sample2.bam sample3.bam
# All BAMs in directory
featureCounts -a annotation.gtf -o counts.txt *.bam
# Count fragments, not reads (required for paired-end)
featureCounts -p --countReadPairs -a annotation.gtf -o counts.txt *.bam
# Check proper pairs only
featureCounts -p --countReadPairs -B -C -a annotation.gtf -o counts.txt *.bam
Flags:
-p - Input is paired-end--countReadPairs - Count fragments instead of reads-B - Only count properly paired reads-C - Don't count chimeric fragments# Unstranded (default)
featureCounts -s 0 -a annotation.gtf -o counts.txt *.bam
# Forward stranded (e.g., dUTP, NSR)
featureCounts -s 1 -a annotation.gtf -o counts.txt *.bam
# Reverse stranded (e.g., Illumina TruSeq, most common)
featureCounts -s 2 -a annotation.gtf -o counts.txt *.bam
Determining strandedness: Use infer_experiment.py from RSeQC or check library prep protocol.
# Count at gene level (default)
featureCounts -t exon -g gene_id -a annotation.gtf -o counts.txt *.bam
# Count at transcript level
featureCounts -t exon -g transcript_id -a annotation.gtf -o counts.txt *.bam
# Count CDS only
featureCounts -t CDS -g gene_id -a annotation.gtf -o counts.txt *.bam
Flags:
-t - Feature type in GTF (default: exon)-g - Meta-feature attribute (default: gene_id)# Discard multi-mappers (default, recommended for DE)
featureCounts -a annotation.gtf -o counts.txt *.bam
# Count multi-mappers (fractional)
featureCounts -M --fraction -a annotation.gtf -o counts.txt *.bam
# Count multi-mappers (full count to each location)
featureCounts -M -a annotation.gtf -o counts.txt *.bam
# Discard reads overlapping multiple features (default)
featureCounts -a annotation.gtf -o counts.txt *.bam
# Count reads overlapping multiple features
featureCounts -O -a annotation.gtf -o counts.txt *.bam
# Fractional count for overlaps
featureCounts -O --fraction -a annotation.gtf -o counts.txt *.bam
# Use multiple threads
featureCounts -T 8 -a annotation.gtf -o counts.txt *.bam
# Use less memory (slower)
featureCounts --largeBAM -a annotation.gtf -o counts.txt *.bam
featureCounts produces two files:
Geneid Chr Start End Strand Length sample1.bam sample2.bam
GENE1 chr1 100 500 + 400 1523 1891
GENE2 chr1 1000 2000 - 1000 892 756
Status sample1.bam sample2.bam
Assigned 1523456 1678234
Unassigned_Unmapped 12345 11234
Unassigned_NoFeatures 234567 245678
# Remove first 6 columns (metadata) to get just counts
cut -f1,7- counts.txt | tail -n +2 > count_matrix.txt
import pandas as pd
counts = pd.read_csv('counts.txt', sep='\t', comment='#')
count_matrix = counts.set_index('Geneid').iloc[:, 5:] # Skip metadata columns
count_matrix.columns = [c.replace('.bam', '') for c in count_matrix.columns]
count_matrix.to_csv('count_matrix.csv')
counts <- read.table('counts.txt', header=TRUE, row.names=1, skip=1)
count_matrix <- counts[, 6:ncol(counts)] # Skip metadata columns
colnames(count_matrix) <- gsub('.bam', '', colnames(count_matrix))
Low assignment rate:
-s)Zero counts for known expressed genes:
-t exon vs -t gene)