Name: Bio Alignment Pairwise
Author: GPTomics

Search skills.../

Bio Alignment Pairwise | Skills Pool

from Bio.Align import PairwiseAligner
from Bio.Seq import Seq
from Bio import SeqIO

Mode	Algorithm	Use Case
`global`	Needleman-Wunsch	Full-length alignment, similar-length sequences
`local`	Smith-Waterman	Find best matching regions, different-length sequences
`global` + free end gaps	Semi-global	Overlap detection, fragment-to-reference alignment

Scenario	Align As	Rationale
Nucleotide identity >70%	DNA	Sufficient signal at nucleotide level
Nucleotide identity <70%	Protein	Codon degeneracy masks signal at DNA level; protein alignment is ~3x more sensitive
Noncoding sequences (UTRs, intergenic)	DNA	No protein translation possible
Coding sequences for dN/dS analysis	Protein first, then back-translate codons (PAL2NAL)	Preserves reading frame for selection analysis

# Basic aligner with defaults
aligner = PairwiseAligner()

# Configure mode and scoring
aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

# For protein alignment with substitution matrix
from Bio.Align import substitution_matrices
aligner = PairwiseAligner(mode='global', substitution_matrix=substitution_matrices.load('BLOSUM62'))

seq1 = Seq('ACCGGTAACGTAG')
seq2 = Seq('ACCGTTAACGAAG')

# Get all optimal alignments
alignments = aligner.align(seq1, seq2)
print(f'Found {len(alignments)} optimal alignments')
print(alignments[0])  # Print first alignment

# Get score only (faster for large sequences)
score = aligner.score(seq1, seq2)

target            0 ACCGGTAACGTAG 13
                  0 |||||.||||.|| 13
query             0 ACCGTTAACGAAG 13

alignment = alignments[0]

# Basic properties
print(alignment.score)                    # Alignment score
print(alignment.shape)                    # (num_seqs, alignment_length)
print(len(alignment))                     # Alignment length

# Get aligned sequences with gaps
target_aligned = alignment[0, :]          # First sequence (target) with gaps
query_aligned = alignment[1, :]           # Second sequence (query) with gaps

# Get coordinate mapping
print(alignment.aligned)                  # Array of aligned segment coordinates
print(alignment.coordinates)              # Full coordinate array

alignment = alignments[0]
counts = alignment.counts()

print(f'Identities: {counts.identities}')
print(f'Mismatches: {counts.mismatches}')
print(f'Gaps: {counts.gaps}')

# Calculate percent identity
total_aligned = counts.identities + counts.mismatches
percent_identity = counts.identities / total_aligned * 100
print(f'Percent identity: {percent_identity:.1f}%')

aligner = PairwiseAligner(mode='global', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

from Bio.Align import substitution_matrices
blosum62 = substitution_matrices.load('BLOSUM62')
aligner = PairwiseAligner(mode='global', substitution_matrix=blosum62, open_gap_score=-11, extend_gap_score=-1)

aligner = PairwiseAligner(mode='local', match_score=2, mismatch_score=-1, open_gap_score=-10, extend_gap_score=-0.5)

# Free end gaps on query -- for aligning a fragment against a full-length reference
# or detecting overlap between reads
aligner = PairwiseAligner(mode='global')
aligner.query_left_open_gap_score = 0
aligner.query_left_extend_gap_score = 0
aligner.query_right_open_gap_score = 0
aligner.query_right_extend_gap_score = 0

# Free end gaps on BOTH sequences -- for overlap detection between two reads
aligner = PairwiseAligner(mode='global')
aligner.end_gap_score = 0.0

Divergence Level	BLOSUM	PAM	When To Use
Very close (<20% divergence)	BLOSUM80, BLOSUM90	PAM30	Recently duplicated genes, strain comparison
Moderate	BLOSUM62 (default)	PAM120	General-purpose, most analyses
Distant (>50% divergence)	BLOSUM45, BLOSUM50	PAM250	Remote homology detection

from Bio.Align import substitution_matrices
print(substitution_matrices.load())  # List all 30 available matrices

blosum62 = substitution_matrices.load('BLOSUM62')  # General protein (default)
blosum80 = substitution_matrices.load('BLOSUM80')  # Close homologs
blosum45 = substitution_matrices.load('BLOSUM45')  # Distant homologs
nuc44 = substitution_matrices.load('NUC.4.4')      # DNA with IUPAC support

from Bio import SeqIO

records = list(SeqIO.parse('sequences.fasta', 'fasta'))
seq1, seq2 = records[0].seq, records[1].seq

aligner = PairwiseAligner(mode='global', match_score=1, mismatch_score=-1)
alignments = aligner.align(seq1, seq2)

# Limit number of alignments returned (memory efficient)
aligner.max_alignments = 100

for i, alignment in enumerate(alignments):
    print(f'Alignment {i+1}: score={alignment.score}')
    if i >= 4:
        break

alignment = alignments[0]
substitutions = alignment.substitutions

# View as array (rows=target, cols=query)
print(substitutions)

# Access specific substitution counts
# substitutions['A', 'T'] gives count of A aligned to T

alignment = alignments[0]

# Various output formats
print(format(alignment, 'fasta'))     # FASTA format
print(format(alignment, 'clustal'))   # Clustal format
print(format(alignment, 'psl'))       # PSL format (BLAT)
print(format(alignment, 'sam'))       # SAM format

Parameter	Description	Typical DNA	Typical Protein
`match_score`	Score for identical bases	1-2	Use matrix
`mismatch_score`	Penalty for mismatches	-1 to -3	Use matrix
`open_gap_score`	Cost to start a gap	-5 to -15	-10 to -12
`extend_gap_score`	Cost per gap extension	-0.5 to -2	-0.5 to -1
`substitution_matrix`	Scoring matrix	N/A	BLOSUM62

Error	Cause	Solution
`OverflowError`	Too many optimal alignments	Set `aligner.max_alignments`
Low scores	Wrong scoring scheme	Use substitution matrix for proteins
No alignments in local mode	Scores all negative	Ensure `match_score` > 0

Method	Denominator	Best For
PID1	Aligned positions + internal gaps	Gap-aware, conservative
PID2	Aligned residue pairs (excluding gaps)	Always highest value
PID3	Shorter sequence length	Length-normalized
PID4	Mean sequence length	Best correlation with structural similarity

Bio Alignment Pairwise

Version Compatibility

Pairwise Sequence Alignment

Required Import

Bio Alignment Pairwise

Version Compatibility

Pairwise Sequence Alignment

Required Import

Core Concepts

Choosing the Right Mode

DNA vs Protein Alignment

Creating an Aligner

Performing Alignments

Alignment Output Format

Accessing Alignment Data

Alignment Counts (Identities, Mismatches, Gaps)

Common Scoring Configurations

DNA/RNA Alignment

Protein Alignment

Local Alignment (Find Best Region)

Semiglobal (Overlap/Fragment Alignment)

Substitution Matrix Selection

Affine Gap Penalties: Biological Rationale

Working with SeqRecord Objects

Iterating Over Multiple Alignments

Substitution Matrix from Alignment

Export Alignment to Different Formats

Quick Reference: Scoring Parameters

Common Errors

Percent Identity: Definitions Matter

Interpreting Alignment Significance

When Alignment Is NOT Appropriate

Brenda Database

Clinical Decision Support Documents

Healthcare Cdss Patterns

Nanoclaw Repl

Deep Research

Data Analyst

Bio Alignment Pairwise

Version Compatibility

Pairwise Sequence Alignment

Required Import

Bio Alignment Pairwise

Version Compatibility

Pairwise Sequence Alignment

Required Import

Core Concepts

Choosing the Right Mode

DNA vs Protein Alignment

Creating an Aligner

Performing Alignments

Alignment Output Format

Accessing Alignment Data

Alignment Counts (Identities, Mismatches, Gaps)

Common Scoring Configurations

DNA/RNA Alignment

Protein Alignment

Local Alignment (Find Best Region)

Semiglobal (Overlap/Fragment Alignment)

Substitution Matrix Selection

Affine Gap Penalties: Biological Rationale

Working with SeqRecord Objects

Iterating Over Multiple Alignments

Substitution Matrix from Alignment

Export Alignment to Different Formats

Quick Reference: Scoring Parameters

Common Errors

Percent Identity: Definitions Matter

Interpreting Alignment Significance

When Alignment Is NOT Appropriate

Related Skills

Brenda Database

Clinical Decision Support Documents

Healthcare Cdss Patterns

Nanoclaw Repl

Deep Research

Data Analyst