Shotgun metagenomics workflow with host-depletion-aware QC, taxonomic profiling, functional profiling, AMR follow-up, and reproducible community output tables.
Reference examples assume:
fastp 0.23+kraken2 2.1+bracken 2.8+metaphlan 4+humann 3.9+Verify the environment first:
kraken2 --version, bracken -v, metaphlan --version, humann --versionUse this skill for shotgun metagenomics when the user needs:
kraken2 + bracken is a common pragmatic routehumannresults/taxonomy/bracken_species.tsvresults/taxonomy/bracken_genus.tsvresults/function/pathabundance.tsvresults/amr/amr_summary.tsvqc/read_processing_summary.tsvfastp \
-i sample_R1.fastq.gz \
-I sample_R2.fastq.gz \
-o qc/sample.clean.R1.fastq.gz \
-O qc/sample.clean.R2.fastq.gz \
--html qc/sample.fastp.html \
--json qc/sample.fastp.json
kraken2 \
--db $KRAKEN_DB \
--paired qc/sample.clean.R1.fastq.gz qc/sample.clean.R2.fastq.gz \
--report results/taxonomy/sample.kraken.report \
--output results/taxonomy/sample.kraken.out \
--confidence 0.1
At minimum, inspect read quality, adapter content, and retained reads. For host-associated samples, remove host reads before community interpretation.
Use a k-mer or marker-based profiler. Document the database and version because abundance results depend strongly on the reference.
Convert raw classification to species or genus abundance tables suitable for cohort comparison.
Run pathway or AMR profiling only after confirming taxonomic QC and read retention are reasonable.
Save per-sample tables and merged matrices with clear metadata joins.
results/
├── taxonomy/
│ ├── sample.kraken.report
│ ├── bracken_species.tsv
│ └── bracken_genus.tsv
├── function/
│ └── pathabundance.tsv
└── amr/
└── amr_summary.tsv
qc/
├── read_processing_summary.tsv
└── sample.fastp.html
scikit-bio