Taxonomic profiling of shotgun metagenomes: host decontamination, Kraken2 classification, Bracken abundance re-estimation
| Feature | 16S amplicon | Shotgun |
|---|---|---|
| Resolution | Genus level | Species/strain level |
| Functional info | No | Yes |
| Host contamination | N/A | Major issue |
| Cost/sample | ~$30-80 | ~$200-500 |
Tool: Bowtie2 with --un-conc-gz to output non-host read pairs.
Typical host contamination: gut 1-10%, skin 10-30%, BAL 40-80%, blood 60-95%.
Assigns reads to LCA of all genomes sharing k-mer sequences.
kraken2 --db kraken2_db/ --paired --gzip-compressed --threads 16 \
--confidence 0.1 --minimum-hit-groups 3 \
--report kraken2_report.txt --output kraken2_output.txt \
decontam_1.fastq.gz decontam_2.fastq.gz
Key parameters:
--confidence 0.1: min fraction of k-mers for classification (reduces false assignments)--minimum-hit-groups 3: min minimizer groups before assignmentReport columns: % reads in clade, reads at exact taxon, reads in clade, rank code (S/G/F/O/C/P/K), taxonomy ID, scientific name.
Redistributes reads assigned to higher LCA nodes back to species level.
bracken -d kraken2_db/ -i kraken2_report.txt \
-o bracken_species.txt -w bracken_species_report.txt \
-r 150 -l S -t 10
| Parameter | Meaning |
|---|---|
-r 150 | Read length (match your data) |
-l S | Level: S=species, G=genus, F=family |
-t 10 | Min reads threshold at species level |
Use fraction_total_reads column for downstream diversity/differential abundance analysis.