Convert between sequence file formats (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when changing file formats or preparing data for different tools.
Reference examples tested with: BioPython 1.83+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Convert this file to a different format" → Read records in one format, optionally add missing annotations, and write in the target format.
SeqIO.convert() for direct conversion, or SeqIO.parse() + SeqIO.write() when modifications are needed (BioPython)seqkit seq (SeqKit) for FASTA/FASTQ; samtools view for SAM/BAM/CRAMConvert sequence files between formats using Biopython's Bio.SeqIO module.
from Bio import SeqIO
Convert between formats in a single call. Most efficient method.
count = SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
print(f'Converted {count} records')
Parameters:
in_file - Input filename or handlein_format - Input format stringout_file - Output filename or handleout_format - Output format stringReturns: Number of records converted
| From | To | Notes |
|---|---|---|
| GenBank | FASTA | Loses annotations, keeps sequence |
| FASTA | GenBank | Need to add molecule_type |
| FASTQ | FASTA | Loses quality scores |
| FASTA | FASTQ | Need to add quality scores |
| GenBank | EMBL | Usually works directly |
| Stockholm | FASTA | Alignment to sequences |
SeqIO.convert('input.gb', 'genbank', 'output.fasta', 'fasta')
SeqIO.convert('sequence.gb', 'genbank', 'sequence.fasta', 'fasta')
SeqIO.convert('reads.fastq', 'fastq', 'reads.fasta', 'fasta')
Goal: Convert FASTA to GenBank format, which requires molecule_type annotation.
Approach: Stream records through a generator that injects the missing annotation, then write.
Reference (BioPython 1.83+):
records = SeqIO.parse('input.fasta', 'fasta')
def add_molecule_type(records):
for record in records:
record.annotations['molecule_type'] = 'DNA'
yield record
SeqIO.write(add_molecule_type(records), 'output.gb', 'genbank')
Goal: Convert FASTA to FASTQ by assigning uniform placeholder quality scores.
Approach: Stream records through a generator that adds phred_quality to each, then write as FASTQ.
Reference (BioPython 1.83+):
def add_quality(records, quality=30):
for record in records:
record.letter_annotations['phred_quality'] = [quality] * len(record.seq)
yield record
records = SeqIO.parse('input.fasta', 'fasta')
SeqIO.write(add_quality(records), 'output.fastq', 'fastq')
Goal: Convert all files of one format in a directory to another format.
Approach: Glob for input files, apply SeqIO.convert() to each, and report per-file counts.
Reference (BioPython 1.83+):
from pathlib import Path
for gb_file in Path('.').glob('*.gb'):
fasta_file = gb_file.with_suffix('.fasta')
count = SeqIO.convert(str(gb_file), 'genbank', str(fasta_file), 'fasta')
print(f'{gb_file.name}: {count} records')
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
from Bio import AlignIO
AlignIO.convert('alignment.sto', 'stockholm', 'alignment.phy', 'phylip')
Can convert directly (no modifications needed):
Requires adding data:
May lose data:
| Error | Cause | Solution |
|---|---|---|
ValueError: missing molecule_type | FASTA to GenBank | Add molecule_type annotation |
ValueError: missing quality scores | FASTA to FASTQ | Add phred_quality to letter_annotations |
KeyError: 'phred_quality' | Wrong FASTQ variant | Try 'fastq-sanger', 'fastq-illumina' |
Converting formats?
├── Simple conversion (no data changes)?
│ └── Use SeqIO.convert() directly
├── Need to add annotations?
│ └── Parse, modify records, then write
├── Need to transform sequences?
│ └── Parse, apply transformation, then write
└── Multiple files?
└── Loop with SeqIO.convert() or batch generator