Write biological sequences to files (FASTA, FASTQ, GenBank, EMBL) using Biopython Bio.SeqIO. Use when saving sequences, creating new sequence files, or outputting modified records.
Reference examples tested with: BioPython 1.83+, pysam 0.22+, samtools 1.19+
Before using code patterns, verify installed versions match. If versions differ:
pip show <package> then help(module.function) to check signaturesIf code throws ImportError, AttributeError, or TypeError, introspect the installed package and adapt the example to match the actual API rather than retrying.
"Write sequences to a file" → Serialize SeqRecord objects into a formatted sequence file.
SeqIO.write() (BioPython)writeXStringSet() (Biostrings)Write SeqRecord objects to sequence files using Biopython's Bio.SeqIO module.
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
Write one or more SeqRecord objects to a file.
SeqIO.write(records, 'output.fasta', 'fasta')
Parameters:
records - Single SeqRecord, list, or iterator of SeqRecordshandle - Filename (string) or file handleformat - Output format stringReturns: Number of records written (integer)
Get a string representation without writing to file.
formatted = record.format('fasta')
print(formatted)
Goal: Construct in-memory sequence records suitable for writing to any format.
Approach: Create SeqRecord with at minimum a Seq and id. Add letter_annotations for FASTQ, annotations['molecule_type'] for GenBank/EMBL.
"Create a sequence record from scratch" → Wrap a Seq string in a SeqRecord with metadata fields.
SeqRecord(Seq(...), id=...) (BioPython)record = SeqRecord(Seq('ATGCGATCGATCG'), id='seq1')
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
name='sequence_one',
description='Example sequence for demonstration'
)
from Bio.SeqFeature import SeqFeature, FeatureLocation
record = SeqRecord(
Seq('ATGCGATCGATCG'),
id='seq1',
annotations={'molecule_type': 'DNA'}
)
record.features.append(
SeqFeature(FeatureLocation(0, 9), type='gene', qualifiers={'gene': ['exampleGene']})
)
| Format | String | Notes |
|---|---|---|
| FASTA | 'fasta' | Most universal, sequence + header only |
| FASTQ | 'fastq' | Requires quality scores in letter_annotations |
| GenBank | 'genbank' | Requires annotations and molecule_type |
| EMBL | 'embl' | Similar requirements to GenBank |
| Tab | 'tab' | Simple ID + sequence tabular format |
record = SeqRecord(Seq('ATGC'), id='my_seq', description='test sequence')
SeqIO.write(record, 'output.fasta', 'fasta')
records = [
SeqRecord(Seq('ATGC'), id='seq1'),
SeqRecord(Seq('GCTA'), id='seq2'),
SeqRecord(Seq('TTAA'), id='seq3')
]
count = SeqIO.write(records, 'output.fasta', 'fasta')
print(f'Wrote {count} records')
with open('output.fasta', 'w') as handle:
SeqIO.write(records, handle, 'fasta')
Goal: Transform sequences in-memory and write the modified versions to a new file.
Approach: Parse input, apply transformation via generator, write output. Using a generator avoids loading all records into memory.
"Modify sequences and save" → Parse records, transform each, write to new file with SeqIO.write().
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
def uppercase_record(rec):
return SeqRecord(rec.seq.upper(), id=rec.id, description=rec.description)
records = SeqIO.parse('input.fasta', 'fasta')
modified = (uppercase_record(rec) for rec in records)
SeqIO.write(modified, 'output.fasta', 'fasta')
with open('output.fasta', 'a') as handle:
SeqIO.write(new_records, handle, 'fasta')
record = SeqRecord(Seq('ATGCGATCG'), id='read1')
record.letter_annotations['phred_quality'] = [30, 30, 28, 25, 30, 30, 28, 25, 30]
SeqIO.write(record, 'output.fastq', 'fastq')
record = SeqRecord(Seq('ATGCGATCGATCG'), id='SEQ001', name='example')
record.annotations['molecule_type'] = 'DNA'
record.annotations['topology'] = 'linear'
record.annotations['organism'] = 'Example organism'
SeqIO.write(record, 'output.gb', 'genbank')
| Error | Cause | Solution |
|---|---|---|
TypeError: SeqRecord expected | Passed raw string/Seq | Wrap in SeqRecord object |
ValueError: missing molecule_type | GenBank without annotations | Add record.annotations['molecule_type'] = 'DNA' |
ValueError: missing quality scores | FASTQ without phred_quality | Add quality scores to letter_annotations |
ValueError: Sequences must all be the same length | PHYLIP with unequal lengths | Pad or trim sequences first |
Must have quality scores:
record.letter_annotations['phred_quality'] = [30] * len(record.seq)
Must have molecule_type:
record.annotations['molecule_type'] = 'DNA' # or 'RNA', 'protein'
All sequences must be same length. IDs truncated to 10 characters.