Name: Bio Read Qc Contamination Screening
Author: majiayu000

Search skills.../

Bio Read Qc Contamination Screening | Skills Pool

# Database locations
DATABASE	Human	/path/to/human/genome
DATABASE	Mouse	/path/to/mouse/genome
DATABASE	Ecoli	/path/to/ecoli/genome
DATABASE	PhiX	/path/to/phix/genome
DATABASE	Adapters	/path/to/adapters
DATABASE	rRNA	/path/to/rrna

# Aligner (bowtie2 recommended)
BOWTIE2	/path/to/bowtie2

# Or use BWA
# BWA	/path/to/bwa

# Threads
THREADS	8

# Download common screening databases
fastq_screen --get_genomes

# Downloads to ~/fastq_screen_databases/
# Includes: Human, Mouse, Rat, E.coli, PhiX, Adapters, etc.

# Number of reads to sample (default 100000)
fastq_screen --subset 200000 sample.fastq.gz

# Use all reads (slow)
fastq_screen --subset 0 sample.fastq.gz

# Set threads
fastq_screen --threads 8 sample.fastq.gz

# Paired-end (screen R1 only by default)
fastq_screen sample_R1.fastq.gz

# Force screening both pairs
fastq_screen --paired sample_R1.fastq.gz sample_R2.fastq.gz

# Generate PNG plot (default)
fastq_screen sample.fastq.gz

# No plot (text only)
fastq_screen --nograph sample.fastq.gz

# Generate additional mapping statistics
fastq_screen --tag sample.fastq.gz

# Filter reads by mapping (keep unmapped to all genomes)
fastq_screen --filter 0000 sample.fastq.gz

# Keep only reads mapping to first genome (e.g., Human)
fastq_screen --filter 1--- sample.fastq.gz

# Example: Keep reads mapping only to Human (first genome)
# Human:1, all others:0
fastq_screen --filter 10000 sample.fastq.gz

# Keep reads NOT mapping to anything (clean reads)
fastq_screen --filter 00000 sample.fastq.gz

#Fastq_screen version: 0.15.3
Genome	#Reads_processed	#Unmapped	%Unmapped	#One_hit_one_genome	%One_hit_one_genome	#Multiple_hits_one_genome	%Multiple_hits_one_genome	#One_hit_multiple_genomes	%One_hit_multiple_genomes	Multiple_hits_multiple_genomes	%Multiple_hits_multiple_genomes
Human	100000	2000	2.00	95000	95.00	1000	1.00	1500	1.50	500	0.50
Mouse	100000	98000	98.00	100	0.10	50	0.05	1500	1.50	350	0.35

# Screen all samples
for f in *.fastq.gz; do
    fastq_screen --outdir screen_results/ "$f"
done

# Aggregate with MultiQC
multiqc screen_results/

# Index a FASTA file
bowtie2-build reference.fa reference

# Add to config
# DATABASE	MyGenome	/path/to/reference

Genome	Purpose
Human (GRCh38)	Human samples
Mouse (GRCm39)	Mouse samples
E. coli	Bacterial contamination
PhiX	Illumina spike-in
Adapters	Library prep
rRNA	Ribosomal RNA
Vectors	Cloning vectors
Mycoplasma	Cell culture contamination

# Download databases
fastq_screen --get_genomes

# Screen samples
fastq_screen --outdir screen_results/ --threads 8 *.fastq.gz

# Check results
multiqc screen_results/

# Screen and tag reads
fastq_screen --tag sample.fastq.gz

# Filter to keep only Human reads (assuming Human is first database)
fastq_screen --filter 3----- --tag sample.fastq.gz

# Or use BBDuk for removal
bbduk.sh in=sample.fastq.gz out=clean.fastq.gz \
    ref=contaminants.fa k=31 hdist=1

Code	Meaning
0	Did not map to genome
1	Mapped uniquely
2	Mapped more than once
3	Mapped (unique or multi)
-	Ignore this genome

File	Description
`*_screen.txt`	Tab-delimited results
`*_screen.png`	Visualization
`*_screen.html`	HTML report

Sample Type	Expected Pattern
Human sample	>90% Human, <1% others
Mouse sample	>90% Mouse, <1% others
Human + PhiX	>80% Human, ~10% PhiX
Contaminated	Significant % to unexpected genome

Pattern	Likely Cause
High adapter %	Library prep issue
High PhiX %	Spike-in not removed
High E.coli %	Bacterial contamination
High rRNA %	rRNA depletion failed
Multiple species	Sample swap or contamination

Bio Read Qc Contamination Screening

Contamination Screening

FastQ Screen Overview

Basic Usage

Configuration File

Bio Read Qc Contamination Screening

Contamination Screening

FastQ Screen Overview

Basic Usage

Configuration File

Pre-built Databases

Screening Options

Output Options

Filter Codes

Output Files

Results Format

Interpreting Results

Expected Results by Sample Type

Common Issues

MultiQC Integration

Custom Database Setup

Create Bowtie2 Index

Common Databases to Include

Example Workflows

Standard Screening

Remove Contamination

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio

Bio Read Qc Contamination Screening

Contamination Screening

FastQ Screen Overview

Basic Usage

Configuration File

Bio Read Qc Contamination Screening

Contamination Screening

FastQ Screen Overview

Basic Usage

Configuration File

Pre-built Databases

Screening Options

Output Options

Filter Codes

Output Files

Results Format

Interpreting Results

Expected Results by Sample Type

Common Issues

MultiQC Integration

Custom Database Setup

Create Bowtie2 Index

Common Databases to Include

Example Workflows

Standard Screening

Remove Contamination

Related Skills

Automation Audit Ops

Github Qa Labels

Jupyter Notebook

Tidb Integrationtest Recorder

Quality Nonconformance

Hugging Face Trackio