Galaxy Metagenomics Pipeline

Guided metagenomics analysis on Galaxy -- from raw reads to community composition profiles and functional annotations.

Two Main Approaches

Amplicon sequencing (16S/18S/ITS)

Targeted sequencing of marker genes. Cheaper, good for "who's there?" questions.

Whole metagenome shotgun (WMS)

Sequence everything. More expensive, but answers "who's there?" AND "what are they doing?"

Ask the user which type of data they have before proceeding.

Shotgun Metagenomics Pipeline

  Raw FASTQ reads
       │
       ▼
  [1] FastQC ──── QC checkpoint
       │
       ▼
  [2] fastp ──── Trim and filter
       │
       ▼
  [3] Host depletion (optional) ──── Remove human reads if needed
       │
       ├───────────────────┐
       ▼                   ▼
  [4a] Kraken2          [4b] MetaPhlAn
  (fast, k-mer)         (marker gene)
       │                   │
       ▼                   ▼
  [5a] Bracken          Taxonomic profile
  (abundance re-est.)
       │
       ▼
  [6] HUMAnN ──── Functional profiling (pathways, gene families)
       │
       ▼
  Community composition + functional capacity

Galaxy Metagenomics Pipeline

Guided metagenomics analysis on Galaxy -- from raw reads to community composition profiles and functional annotations.

Two Main Approaches

Amplicon sequencing (16S/18S/ITS)

Targeted sequencing of marker genes. Cheaper, good for "who's there?" questions.

Whole metagenome shotgun (WMS)

Sequence everything. More expensive, but answers "who's there?" AND "what are they doing?"

Ask the user which type of data they have before proceeding.

Shotgun Metagenomics Pipeline

  Raw FASTQ reads
       │
       ▼
  [1] FastQC ──── QC checkpoint
       │
       ▼
  [2] fastp ──── Trim and filter
       │
       ▼
  [3] Host depletion (optional) ──── Remove human reads if needed
       │
       ├───────────────────┐
       ▼                   ▼
  [4a] Kraken2          [4b] MetaPhlAn
  (fast, k-mer)         (marker gene)
       │                   │
       ▼                   ▼
  [5a] Bracken          Taxonomic profile
  (abundance re-est.)
       │
       ▼
  [6] HUMAnN ──── Functional profiling (pathways, gene families)
       │
       ▼
  Community composition + functional capacity

Problem	Likely Cause	Fix
Most reads unclassified	Wrong database or host contamination	Try a more comprehensive database; check for host reads
Unrealistic species	Database contamination or low confidence	Increase confidence threshold; filter low-abundance taxa
Very low diversity	Over-aggressive quality filtering	Relax fastp parameters; check for primer contamination
Batch effects in diversity	DNA extraction method differences	Include extraction batch as covariate

Criterion	Kraken2	MetaPhlAn
Speed	Very fast	Moderate
Accuracy (species)	Good	Better
Quantification	Relative (use Bracken)	Directly quantitative
Database	General genomic	Marker genes only
False positives	More	Fewer
Recommendation	Initial screening, large datasets	Publication-quality abundance

Galaxy Metagenomics

Galaxy Metagenomics Pipeline

Two Main Approaches

Amplicon sequencing (16S/18S/ITS)

Whole metagenome shotgun (WMS)

Shotgun Metagenomics Pipeline

Galaxy Metagenomics

Galaxy Metagenomics Pipeline

Two Main Approaches

Amplicon sequencing (16S/18S/ITS)

Whole metagenome shotgun (WMS)

Shotgun Metagenomics Pipeline

Step-by-Step: Shotgun Metagenomics

Step 1-2: QC and Trimming

Step 3: Host Depletion (if applicable)

Step 4a: Taxonomic Classification with Kraken2

Step 4b: Alternative -- MetaPhlAn

Step 5: Abundance Re-estimation with Bracken

Step 6: Functional Profiling with HUMAnN

16S/ITS Amplicon Pipeline

Common Problems

Key Decision: Kraken2 vs MetaPhlAn

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy