Name: Python Bio Variables
Author: Pavel-Kravchenko

Search skills.../

Python Bio Variables | Skills Pool

a, b = [], []

variable_name = value

# Assign multiple variables at once
a_count, t_count, g_count, c_count = 250, 245, 280, 275
print(f"A: {a_count}, T: {t_count}, G: {g_count}, C: {c_count}")

# Swap two variables (elegant Python idiom)
sense_strand = "ATGCGA"
antisense_strand = "TACGCT"

sense_strand, antisense_strand = antisense_strand, sense_strand
print(f"After swap -- sense: {sense_strand}, antisense: {antisense_strand}")

sequence = "ATGCGATCG"
print(f"id of sequence: {id(sequence)}")

# Reassignment creates a new object (strings are immutable)
sequence = sequence + "AAA"
print(f"id after concatenation: {id(sequence)}")
print(f"New value: {sequence}")

Type        Example                    Bioinformatics use
--------    -----------------------    -----------------------------------
int         42                         Sequence length, read count
float       0.487                      GC content, E-value, p-value
str         "ATGCGA"                   DNA/RNA/protein sequences, gene names
bool        True / False               Is the sequence valid? Has a stop codon?
NoneType    None                       Missing data, function with no return

# Typical bioinformatics integers
sequence_length = 3088286401          # human genome length in bp
num_genes = 20000                     # approximate protein-coding genes
read_depth = 30                       # sequencing coverage
chromosome_number = 23                # human haploid chromosome count

print(f"Human genome:  {sequence_length:,} bp")   # comma-separated formatting
print(f"Protein-coding genes: ~{num_genes:,}")
print(f"Target coverage: {read_depth}x")

# Typical bioinformatics floats
gc_content = 0.508                    # GC fraction
melting_temp = 72.3                   # PCR primer Tm in degrees C
e_value = 1.5e-42                     # BLAST E-value (scientific notation)
p_value = 0.0031                      # statistical significance

print(f"GC content: {gc_content}")
print(f"GC content as %: {gc_content * 100:.1f}%")
print(f"Melting temp: {melting_temp} C")
print(f"E-value: {e_value}")
print(f"E-value formatted: {e_value:.2e}")
print(f"p-value: {p_value}")

# Three ways to create strings
dna = "ATGCGATCGATCG"          # double quotes
rna = 'AUGCGAUCGAUCG'          # single quotes (identical behavior)
protein = """MKWVTFISLLLLFSSAYS"""  # triple quotes (can span multiple lines)

# Multi-line string (useful for FASTA headers, etc.)
fasta_entry = """>sp|P04637|P53_HUMAN Cellular tumor antigen p53
MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGP
DEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYPQGLNGTVNLPGRNSFEV"""

print(fasta_entry[:80] + "...")

Index:    0   1   2   3   4   5   6   7
Seq:      A   T   G   C   G   A   T   C
Neg idx: -8  -7  -6  -5  -4  -3  -2  -1

dna = "ATGAAACCCGGGTAA"

# Extract the start codon (first 3 nucleotides)
start_codon = dna[0:3]
print(f"Start codon: {start_codon}")       # ATG

# Extract the stop codon (last 3 nucleotides)
stop_codon = dna[-3:]
print(f"Stop codon:  {stop_codon}")        # TAA

# Extract the coding region between start and stop
coding = dna[3:-3]
print(f"Coding region: {coding}")          # AAACCCGGG

# Every third nucleotide (first position of each codon)
first_positions = dna[0::3]
print(f"First codon positions: {first_positions}")  # AACGT

# Reverse the sequence
reversed_dna = dna[::-1]
print(f"Reversed: {reversed_dna}")         # AATGGGCCCAAAGTA

# replace() -- replace all occurrences of a substring
dna = "ATGCGATCG"
rna = dna.replace("T", "U")    # DNA to RNA transcription
print(f"DNA: {dna}")
print(f"RNA: {rna}")

# startswith() and endswith() -- check sequence boundaries
cds = "ATGAAACCCGGGTAA"
print(f"Starts with ATG (start codon)? {cds.startswith('ATG')}")
print(f"Ends with TAA (stop codon)?    {cds.endswith('TAA')}")

# Check for any stop codon
has_stop = cds.endswith(("TAA", "TAG", "TGA"))
print(f"Ends with any stop codon?      {has_stop}")

# split() and join() -- essential for parsing biological file formats

# Parse a FASTA header
header = ">sp|P04637|P53_HUMAN Cellular tumor antigen p53"
parts = header.split("|")
print(f"Database: {parts[0][1:]}")
print(f"Accession: {parts[1]}")
print(f"Entry name: {parts[2].split()[0]}")

# Join codons with a separator
codons = ["ATG", "AAA", "CCC", "GGG", "TAA"]
formatted = " - ".join(codons)
print(f"\nCodons: {formatted}")

# strip() -- remove whitespace (critical when reading files)
messy_line = "  ATGCGATCG  \n"
clean = messy_line.strip()
print(f"Before strip: '{messy_line}'")
print(f"After strip:  '{clean}'")

Python Bio Variables

Variables

Pitfalls

Python Bio Variables

Variables

Pitfalls

Variables and Data Types

Variables

Naming rules and conventions

Multiple assignment

Variables are references, not boxes

Data Types Overview

Integers (`int`)

Floating-Point Numbers (`float`)

Floating-point precision warning

Strings (`str`)

Creating strings

String indexing

String slicing

String immutability

Essential string methods for bioinformatics

Pitfalls

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy

Python Bio Variables

Variables

Pitfalls

Python Bio Variables

Variables

Pitfalls

Variables and Data Types

Variables

Naming rules and conventions

Multiple assignment

Variables are references, not boxes

Data Types Overview

Integers (int)

Floating-Point Numbers (float)

Floating-point precision warning

Strings (str)

Creating strings

String indexing

String slicing

String immutability

Essential string methods for bioinformatics

Pitfalls

Nanoclaw Repl

Bioinformatics

Smart Explore

Vector Database Engineer

Skin Health Analyzer

Scanpy

Integers (`int`)

Floating-Point Numbers (`float`)

Strings (`str`)