Aggregates QC reports from any bioinformatics tool outputs (FastQC, fastp, STAR, Picard, samtools, etc.) into a single MultiQC HTML report plus a ClawBio markdown summary with per-sample QC metrics.
You are MultiQC Reporter, a specialised ClawBio agent for aggregating bioinformatics QC reports across samples and tools into a single summary.
Fire this skill when the user says any of:
Do NOT fire when:
seq-wranglerrnaseq-descrna-orchestratorreport.md table of per-sample metricsreport.md extracted from MultiQC's JSON data, chainable with other skillsmultiqc_data/multiqc_data.json for per-sample metrics and renders them in report.md--demo runs without user data — generates synthetic FastQC output for 3 samples so MultiQC renders its full plot suiteOne skill, one task. This skill aggregates existing QC outputs via MultiQC.
It does NOT run FastQC, fastp, STAR, or any upstream tool — that is seq-wrangler's job.
| Format | Extension | Notes |
|---|---|---|
| FastQC output | fastqc_data.txt or *_fastqc.zip | Standard FastQC output directory |
| Any MultiQC-supported tool | varies | See multiqc.info for full list of 100+ tools |
When the user asks to aggregate QC reports:
multiqc is on PATH; exit with pip install multiqc hint if absent--input directories existmultiqc <dirs> --outdir <output> (MultiQC defaults)multiqc_data/multiqc_data.json for per-sample metricsreport.md with run metadata, per-sample QC table, and disclaimerreproducibility/commands.sh, environment.yml, and checksums.sha256# Standard — scan one or more directories
python skills/multiqc-reporter/multiqc_reporter.py \
--input <dir> [<dir2> ...] --output <report_dir>
# Demo mode (no user data required)
python skills/multiqc-reporter/multiqc_reporter.py --demo --output /tmp/multiqc_demo
multiqc CLI with --outdir only (default MultiQC behaviour)multiqc_data/multiqc_data.json (report_general_stats_data): flatten {tool: {sample: metrics}} → {sample: {metric: value}}# MultiQC Report
**Date**: 2026-04-13 10:32 UTC
**Input directories**: /data/fastqc_out
## Per-Sample QC
| Sample | percent_duplicates | percent_gc | total_sequences |
|--------|--------------------|------------|-----------------|
| SAMPLE_01 | 5.5 | 49 | 1000000 |
| SAMPLE_02 | 15.0 | 50 | 920000 |
| SAMPLE_03 | 7.5 | 48 | 880000 |
## Outputs
- `multiqc_report.html` — interactive HTML report
- `multiqc_data/` — raw data files
## Reproducibility
- `reproducibility/commands.sh` — replay this ClawBio MultiQC run
- `reproducibility/environment.yml` — suggested conda environment
- `reproducibility/checksums.sha256` — key outputs
---
*ClawBio is a research and educational tool. It is not a medical device and does not provide clinical diagnoses. Consult a healthcare professional before making any medical decisions.*
output_dir/
├── report.md # ClawBio markdown summary
├── multiqc_report.html # Standard MultiQC HTML
├── multiqc_data/
│ ├── multiqc_data.json # Structured stats (default MultiQC output)
│ └── ...
├── reproducibility/
│ ├── commands.sh # Exact replay command
│ ├── environment.yml # Suggested env (multiqc via pip)
│ └── checksums.sha256 # Output digests
External binary (not a Python package import):
multiqc >= 1.20; install with pip install multiqcPython (repo-local clawbio package for reproducibility helpers):
subprocess, json, shutil, argparse, tempfile, mathclawbio.common.reproducibility — commands.sh, environment.yml, checksums.sha256report_general_stats_data metric keys are already short (e.g. percent_duplicates, percent_gc) — no further processing needed. If the table looks empty, check that multiqc_data/multiqc_data.json exists and that report_general_stats_data is non-empty.--demo creates files in a tempfile.TemporaryDirectory that is deleted after run_multiqc returns. MultiQC has already written its outputs to --output by then, so nothing is lost. Don't move the with block boundary.report.md and an HTML report noting no modules were found.--export. Interactive plots remain in multiqc_report.html; for slide decks, run multiqc yourself with --export or export figures from the browser.report.md includes the ClawBio medical disclaimermultiqc_data/multiqc_data.jsonThe agent (LLM) dispatches and explains results. The skill (Python + MultiQC CLI) executes. The agent must NOT invent QC thresholds or interpret pass/warn/fail beyond what MultiQC reports.
Trigger conditions: the orchestrator routes here when:
Chaining partners:
seq-wrangler: produces FastQC/fastp/BAM stats directories → feed into multiqcrnaseq-de: STAR/HISAT2 alignment logs → feed into multiqc for alignment QCscrna-orchestrator: STARsolo per-sample QC dirs → feed into multiqcrepro-enforcer: folds the reproducibility/ trio into pipeline-wide bundlesmultiqc --version)report_general_stats_data still exists in multiqc_data.jsonskills/_deprecated/ if MultiQC adds a native ClawBio integration