Convert raw Nanopore signal data (FAST5/POD5) to nucleotide sequences using Dorado basecaller. Covers model selection, GPU acceleration, modified base detection, and quality filtering. Use when processing raw Nanopore data before alignment. Guppy is deprecated; use Dorado for all new analyses.
Reference examples tested with: samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Basecall my Nanopore data" → Convert raw electrical signal (FAST5/POD5) into nucleotide sequences with quality scores, optionally detecting modified bases.
dorado basecaller sup pod5/ > calls.bam (recommended), dorado basecaller sup,5mCG_5hmCG pod5/ (with modifications)Convert raw electrical signal from Nanopore sequencing into nucleotide sequences.
Dorado is ONT's current production basecaller, replacing Guppy. It offers better accuracy and speed.
dorado basecaller sup pod5_dir/ > calls.bam
dorado basecaller fast pod5_dir/ > calls.bam
dorado basecaller hac pod5_dir/ > calls.bam
dorado basecaller sup pod5_dir/ > calls.bam
| Model | Speed | Accuracy | Use Case |
|---|---|---|---|
| fast | Fastest | Lower | Quick preview |
| hac | Medium | High | General use |
| sup | Slowest | Highest | Publication quality |
dorado download --model [email protected]
dorado basecaller [email protected] pod5_dir/ > calls.bam
dorado download --list
dorado basecaller sup pod5_dir/ --emit-fastq > calls.fastq
dorado basecaller sup,5mCG_5hmCG pod5_dir/ > calls_mods.bam
dorado basecaller sup,5mCG pod5_dir/ > calls_5mc.bam
dorado basecaller sup,6mA pod5_dir/ > calls_6ma.bam
dorado basecaller sup pod5_dir/ --device cuda:0 > calls.bam
dorado basecaller sup pod5_dir/ --device cuda:0,1 > calls.bam
dorado basecaller sup pod5_dir/ --device cpu > calls.bam
dorado basecaller sup pod5_dir/ --batchsize 64 > calls.bam
dorado duplex sup pod5_dir/ > duplex.bam
dorado basecaller sup pod5_dir/ --kit-name SQK-NBD114-24 > calls.bam
dorado demux calls.bam --output-dir demuxed/ --kit-name SQK-NBD114-24
dorado basecaller sup pod5_dir/ --trim adapters > calls.bam
dorado basecaller sup pod5_dir/ --no-trim > calls_untrimmed.bam
dorado basecaller sup pod5_dir/ --resume-from calls.bam > calls_complete.bam
Guppy is deprecated and no longer receiving updates. Use Dorado for all new analyses. Guppy examples below are only for maintaining legacy pipelines.
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_fast.cfg \
--num_callers 8 \
--cpu_threads_per_caller 4
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_hac.cfg \
--device cuda:0
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0
guppy_basecaller --print_workflows
ls /opt/ont/guppy/data/*.cfg
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_modbases_5mc_cg_sup.cfg \
--device cuda:0
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0 \
--barcode_kits SQK-NBD114-24
guppy_basecaller \
-i fast5_dir/ \
-s output_dir/ \
-c dna_r10.4.1_e8.2_400bps_sup.cfg \
--device cuda:0 \
--bam_out \
--index
POD5 is the new format replacing FAST5.
pod5 convert fast5 fast5_dir/*.fast5 --output pod5_dir/
pod5 merge pod5_dir/*.pod5 --output merged.pod5
pod5 inspect reads input.pod5
pod5 inspect summary input.pod5
pod5 subset input.pod5 --output subset.pod5 --read-id-file read_ids.txt
gunzip -c calls.fastq.gz | chopper -q 10 -l 500 | gzip > filtered.fastq.gz
gunzip -c calls.fastq.gz | \
awk 'BEGIN{OFS="\n"} {h=$0; getline seq; getline plus; getline qual;
split(h, a, " "); split(a[4], q, "=");
if(q[2] >= 10) print h, seq, plus, qual}' | \
gzip > q10_filtered.fastq.gz
gunzip -c calls.fastq.gz | NanoFilt -q 10 -l 500 | gzip > filtered.fastq.gz
NanoPlot --fastq calls.fastq.gz -o qc_report/ --plots hex dot
NanoPlot --bam calls.bam -o qc_report/
pycoQC -f sequencing_summary.txt -o pycoqc_report.html
seqkit stats calls.fastq.gz
awk 'NR%4==2 {sum+=length($0); count++} END {print "Reads:", count, "Mean length:", sum/count}' calls.fastq
| Model | Use |
|---|---|
| dna_r10.4.1_e8.2_400bps_fast | Quick analysis |
| dna_r10.4.1_e8.2_400bps_hac | Routine work |
| dna_r10.4.1_e8.2_400bps_sup | High accuracy |
| Model | Use |
|---|---|
| dna_r9.4.1_450bps_fast | Quick analysis |
| dna_r9.4.1_450bps_hac | Routine work |
| dna_r9.4.1_450bps_sup | High accuracy |
Goal: Run the full Nanopore basecalling pipeline from raw signal data through quality-filtered reads with a QC report.
Approach: Convert FAST5 to POD5 if needed, basecall with Dorado, convert to FASTQ, filter with chopper, and generate NanoPlot QC.
#!/bin/bash
INPUT=$1
OUTPUT=$2
MODEL=${3:-sup}
mkdir -p $OUTPUT
if [ -d "$INPUT/fast5" ]; then
echo "Converting FAST5 to POD5..."
pod5 convert fast5 $INPUT/fast5/*.fast5 --output $OUTPUT/pod5/
INPUT_DIR="$OUTPUT/pod5"
else
INPUT_DIR="$INPUT"
fi
echo "Basecalling with $MODEL model..."
dorado basecaller $MODEL $INPUT_DIR > $OUTPUT/calls.bam
echo "Converting to FASTQ..."
samtools fastq $OUTPUT/calls.bam | gzip > $OUTPUT/calls.fastq.gz
echo "Filtering..."
gunzip -c $OUTPUT/calls.fastq.gz | chopper -q 10 -l 500 | gzip > $OUTPUT/filtered.fastq.gz
echo "QC report..."
NanoPlot --fastq $OUTPUT/filtered.fastq.gz -o $OUTPUT/qc/
echo "Done!"
| Model | VRAM Required | Speed (R10.4.1) |
|---|---|---|
| fast | 4 GB | ~450 bases/s |
| hac | 8 GB | ~200 bases/s |
| sup | 12 GB | ~50 bases/s |
dorado basecaller sup pod5_dir/ --batchsize 32 > calls.bam
dorado basecaller fast pod5_dir/ --device cpu > calls.bam
nvidia-smi -l 1
watch -n 1 nvidia-smi