Fast miRNA quantification with isomiR detection and A-to-I editing analysis using miRge3. Use when quantifying known miRNAs quickly or analyzing isomiR variants and RNA editing.
Reference examples tested with: numpy 1.26+, pandas 2.2+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signatures<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Quantify miRNAs with isomiR detection" → Fast miRNA annotation and quantification with isomiR variant detection and A-to-I RNA editing analysis from small RNA-seq reads.
miRge3.0 annotate -s sample.fastq -lib human -db mirgenedb -o results/Goal: Quantify known miRNA expression from small RNA-seq FASTQ files.
Approach: Run miRge3 annotation pipeline with adapter trimming, organism-specific libraries, and multi-sample input.
# Run miRge3 on FASTQ files
miRge3.0 annotate \
-s sample1.fastq.gz,sample2.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
-o output_dir \
-a TGGAATTCTCGGGTGCCAAGG \
--threads 8
# Key options:
# -s: Input FASTQ files (comma-separated)
# -lib: Path to miRge3 library
# -on: Organism name
# -db: Database (mirbase or mirgenedb)
# -a: 3' adapter sequence
Goal: Download organism-specific reference libraries required for miRge3 annotation.
Approach: Use miRge3 built-in download command to fetch pre-built bowtie indices and annotations.
# Download pre-built libraries
miRge3.0 --download-library human mirbase
# Libraries include:
# - Bowtie indices for miRNAs, tRNAs, rRNAs
# - miRBase or MirGeneDB annotations
# - A-to-I editing sites
Goal: Identify and quantify isomiR variants including 5'/3' additions, deletions, and internal modifications.
Approach: Enable miRge3 isomiR mode to classify reads by their deviation from canonical miRNA sequences.
# Enable isomiR analysis
miRge3.0 annotate \
-s sample.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
--isomir \
-o output_dir
# IsomiRs include:
# - 5' variants (templated and non-templated)
# - 3' variants (templated and non-templated)
# - Internal modifications
Goal: Detect adenosine-to-inosine RNA editing events in miRNA sequences.
Approach: Enable miRge3 A-to-I detection mode which identifies editing sites and calculates editing frequencies.
# Detect A-to-I editing
miRge3.0 annotate \
-s sample.fastq.gz \
-lib miRge3_libs \
-on human \
-db mirbase \
--AtoI \
-o output_dir
# Outputs editing sites and frequencies
| File | Description |
|---|---|
| miR.Counts.csv | Raw read counts per miRNA |
| miR.RPM.csv | RPM normalized counts |
| isomiR.Counts.csv | IsomiR-level counts |
| isomiR.summary.csv | IsomiR summary per miRNA |
| annotation.report.html | Interactive QC report |
Goal: Run miRge3 quantification programmatically from Python.
Approach: Call the miRge3 annotate function directly with configuration parameters instead of CLI invocation.
from mirge3.annotate import annotate
# Run programmatically
annotate(
samples=['sample1.fastq.gz', 'sample2.fastq.gz'],
lib_path='miRge3_libs',
organism='human',
database='mirbase',
adapter='TGGAATTCTCGGGTGCCAAGG',
output_dir='results',
threads=8
)
Goal: Load miRge3 count matrices and isomiR tables into pandas for downstream analysis.
Approach: Read CSV output files and apply minimum count filtering to remove lowly-expressed miRNAs.
import pandas as pd
def load_mirge3_counts(output_dir):
'''Load miRge3 count matrix'''
counts = pd.read_csv(f'{output_dir}/miR.Counts.csv', index_col=0)
return counts
def load_isomirs(output_dir):
'''Load isomiR-level counts'''
isomirs = pd.read_csv(f'{output_dir}/isomiR.Counts.csv', index_col=0)
return isomirs
# Filter low-expressed miRNAs
def filter_low_counts(counts, min_total=10):
'''Keep miRNAs with total count >= threshold'''
return counts[counts.sum(axis=1) >= min_total]
Goal: Normalize and transform miRNA counts for cross-sample comparison.
Approach: Apply RPM normalization to account for library size, then log2-transform for variance stabilization.
def normalize_rpm(counts):
'''Normalize to reads per million'''
total_per_sample = counts.sum(axis=0)
rpm = counts / total_per_sample * 1e6
return rpm
def log_transform(rpm, pseudocount=1):
'''Log2 transform with pseudocount'''
import numpy as np
return np.log2(rpm + pseudocount)
Goal: Summarize isomiR diversity metrics per canonical miRNA.
Approach: Group isomiR-level counts by parent miRNA and compute total reads, variant count, and dominant isoform.
def summarize_isomirs(isomir_counts):
'''Summarize isomiR diversity per miRNA'''
# Group by canonical miRNA
isomir_counts['miRNA'] = isomir_counts.index.str.extract(r'(hsa-\w+-\d+[a-z]*)')[0]
summary = isomir_counts.groupby('miRNA').agg({
'count': ['sum', 'count', lambda x: x.idxmax()]
})
summary.columns = ['total_reads', 'n_isomirs', 'dominant_isomir']
return summary