Metagenomic assembly with MEGAHIT, contig binning with MetaBAT2, and MAG quality assessment with CheckM. Includes binning signals, multi-sample strategy, and MIMAG quality tiers.
--minContig 1500 in MetaBAT2.jgi_summarize_bam_contig_depthsreads → MEGAHIT assembly → filter ≥500 bp contigs
→ map reads back → jgi_summarize_bam_contig_depths
→ MetaBAT2/CONCOCT/MaxBin2 binning
→ DAS_Tool dereplication
→ CheckM quality assessment
→ filter HQ MAGs (≥90% complete, <5% contamination)
megahit \
-1 decontam_1.fastq.gz \
-2 decontam_2.fastq.gz \
-o megahit_assembly/ \
--min-contig-len 500 \
-t 16 \
-m 0.5 # use at most 50% of available RAM
# Check stats
seqkit stats megahit_assembly/final.contigs.fa
Key parameter: --k-list 21,29,39,59,79,99,119,141 (default). Multi-k handles extreme coverage variation (1x rare organism vs 1000x abundant organism in the same community).
# Index assembly and map reads back
bowtie2-build megahit_assembly/final.contigs.fa contigs_index
bowtie2 -x contigs_index -1 decontam_1.fastq.gz -2 decontam_2.fastq.gz \
-p 16 --no-unal | samtools sort -@ 8 -o contigs_mapped.bam
samtools index contigs_mapped.bam
# Generate depth file (required by MetaBAT2)
jgi_summarize_bam_contig_depths \
--outputDepth depths.txt \
contigs_mapped.bam
# Output columns: contigName, contigLen, totalAvgDepth, sampleDepth, sampleDepthVar
# MetaBAT2
metabat2 \
-i megahit_assembly/final.contigs.fa \
-a depths.txt \
-o bins/bin \
--minContig 1500 \
--minClsSize 100000 \ # minimum 100 kb bin size
-t 8 \
--saveCls
# Run all three binners then combine with DAS_Tool
das_tool \
-i metabat2_bins,concoct_bins,maxbin2_bins \
-l MetaBAT2,CONCOCT,MaxBin2 \
-c megahit_assembly/final.contigs.fa \
-o dastool_output/ \
-t 8 \
--write_bins
Binning signals used by MetaBAT2:
checkm lineage_wf bins/ checkm_out/ -x fa -t 16 --pplacer_threads 4
checkm qa checkm_out/lineage.ms checkm_out/ -o 2 > checkm_summary.txt
| Tier | Completeness | Contamination |
|---|---|---|
| High quality | ≥ 90% | < 5% |
| Medium quality | ≥ 50% | < 10% |
| Low quality | < 50% | — |
CheckM uses ~100 conserved single-copy marker genes expected exactly once in any complete bacterial genome. Marker gene set is selected based on phylogenetic placement of each bin.
| Binner | Signal used | Strengths |
|---|---|---|
| MetaBAT2 | TNF + coverage | Fast, widely used |
| CONCOCT | TNF + coverage (PCA) | Good for low-coverage data |
| MaxBin2 | Marker gene abundance + coverage | Marker-guided |
| DAS_Tool | Combines all above | Best final MAG set |