De novo genome assembly from Illumina short reads using SPAdes. Covers bacterial, fungal, and small eukaryotic genome assembly, as well as metagenome and transcriptome assembly modes. Use when assembling genomes from Illumina reads.
Reference examples tested with: FastQC 0.12+, MEGAHIT 1.2+, SPAdes 3.15+
Before using code patterns, verify installed versions match. If versions differ:
<tool> --version then <tool> --help to confirm flagsIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Assemble a genome from Illumina reads" → Build a de novo assembly from short paired-end reads using de Bruijn graph algorithms with multiple k-mer sizes.
spades.py -1 R1.fq.gz -2 R2.fq.gz -o outputSPAdes (St. Petersburg genome Assembler) uses de Bruijn graph approach with multiple k-mer sizes for robust assembly.
conda install -c bioconda spades
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -o output_dir
spades.py -s reads.fastq.gz -o output_dir
spades.py -1 R1.fastq.gz -2 R2.fastq.gz -s unpaired.fastq.gz -o output_dir
spades.py --isolate -1 R1.fq.gz -2 R2.fq.gz -o isolate_assembly
Best for single-organism isolates with uniform coverage.
spades.py --careful -1 R1.fq.gz -2 R2.fq.gz -o careful_assembly
Reduces misassemblies at cost of speed. Recommended for small genomes.
spades.py --meta -1 R1.fq.gz -2 R2.fq.gz -o meta_assembly
For mixed microbial communities with varying coverage.
spades.py --rna -1 R1.fq.gz -2 R2.fq.gz -o rna_assembly
Assembles transcripts from RNA-seq data.
spades.py --plasmid -1 R1.fq.gz -2 R2.fq.gz -o plasmid_assembly
Extracts plasmid sequences from bacterial isolates.
| Option | Description |
|---|---|
-o <dir> | Output directory |
-t <#> | Number of threads (default: 16) |
-m <#> | Memory limit in GB (default: 250) |
-k <#,#,...> | K-mer sizes (auto by default) |
--careful | Reduce misassemblies |
--isolate | Isolate mode for uniform coverage |
--meta | Metagenome mode |
--rna | RNA-seq assembly |
--cov-cutoff <#> | Coverage cutoff (default: off) |
--only-assembler | Skip error correction |
--continue | Resume interrupted run |
spades.py \
--pe1-1 short_R1.fq.gz --pe1-2 short_R2.fq.gz \
--pe2-1 long_R1.fq.gz --pe2-2 long_R2.fq.gz \
-o output_dir
spades.py \
--pe1-1 paired_R1.fq.gz --pe1-2 paired_R2.fq.gz \
--mp1-1 mate_R1.fq.gz --mp1-2 mate_R2.fq.gz \
-o output_dir
spades.py \
-1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
--pacbio pacbio.fq.gz \
-o hybrid_assembly
# Or with Nanopore
spades.py \
-1 illumina_R1.fq.gz -2 illumina_R2.fq.gz \
--nanopore nanopore.fq.gz \
-o hybrid_assembly
SPAdes automatically selects appropriate k-mers based on read length.
# For 150bp reads
spades.py -k 21,33,55,77 -1 R1.fq.gz -2 R2.fq.gz -o output
# For 250bp reads
spades.py -k 21,33,55,77,99,127 -1 R1.fq.gz -2 R2.fq.gz -o output
output_dir/
├── scaffolds.fasta # Final scaffolds (use this)
├── contigs.fasta # Contigs before scaffolding
├── assembly_graph.gfa # Assembly graph
├── spades.log # Log file
├── params.txt # Parameters used
└── K*/ # Intermediate k-mer assemblies
>NODE_1_length_500000_cov_50.5
NODE_1 - Contig/scaffold IDlength_500000 - Sequence lengthcov_50.5 - Average k-mer coverage# Limit memory to 32GB
spades.py -m 32 -1 R1.fq.gz -2 R2.fq.gz -o output
# Use fewer threads
spades.py -t 8 -1 R1.fq.gz -2 R2.fq.gz -o output
spades.py --continue -o output_dir
# If reads already corrected
spades.py --only-assembler -1 R1.fq.gz -2 R2.fq.gz -o output
Goal: Run end-to-end assembly pipelines for specific use cases.
Approach: Combine SPAdes in the appropriate mode with basic statistics reporting.
#!/bin/bash
set -euo pipefail
R1=$1
R2=$2
OUTDIR=$3
THREADS=${4:-16}
echo "=== Bacterial Genome Assembly ==="
# Run SPAdes in isolate mode
spades.py \
--isolate \
--careful \
-t $THREADS \
-1 $R1 -2 $R2 \
-o $OUTDIR
# Basic stats
echo "Assembly statistics:"
grep -c "^>" ${OUTDIR}/scaffolds.fasta
seqkit stats ${OUTDIR}/scaffolds.fasta
#!/bin/bash
set -euo pipefail
R1=$1
R2=$2
OUTDIR=$3
spades.py \
--meta \
-t 32 \
-m 200 \
-1 $R1 -2 $R2 \
-o $OUTDIR
echo "Metagenome assembly complete: ${OUTDIR}/scaffolds.fasta"
spades.py \
--rna \
-t 16 \
-1 rnaseq_R1.fq.gz -2 rnaseq_R2.fq.gz \
-o transcriptome_assembly
| Assembler | Best For |
|---|---|
| SPAdes | Small genomes, bacteria, fungi |
| MEGAHIT | Metagenomes (memory efficient) |
| ABySS | Large genomes |
| Velvet | Legacy, small genomes |
| Trinity | Transcriptomes |
megahit -1 R1.fq.gz -2 R2.fq.gz -o megahit_output -t 16
-m limit--meta mode (more memory efficient)--careful mode--only-assembler if reads pre-corrected