Simulate DNA assembly methods (Golden Gate, Gibson, BioBrick, LCR) in silico using DnaCauldron. Use when predicting assembly outcomes, validating construct designs, catching assembly errors before wet-lab work, generating assembly reports, or simulating restriction-based and homology-based cloning workflows.
DnaCauldron is a Python framework for simulating DNA assembly reactions computationally. It predicts the outcomes of cloning methods such as Golden Gate (Type IIs restriction enzyme assembly), Gibson Assembly, BioBrick Standard Assembly, BASIC Assembly, and Ligase Cycling Reaction (LCR) — all without stepping into a wet lab. By modeling the biochemistry of restriction digestion, ligation, homology-based recombination, and overhang compatibility, DnaCauldron produces the exact sequences that would result from a given set of parts and assembly instructions.
The value of in silico assembly simulation is substantial. Wet-lab cloning experiments are expensive, time-consuming, and error-prone. A single Golden Gate assembly with incorrect overhang design can waste days of work and hundreds of dollars in reagents. DnaCauldron catches these design flaws before any physical experiment begins: mismatched overhangs, incompatible restriction sites, insufficient homology regions, and topological errors are all flagged immediately. The framework can process hundreds of assemblies in seconds, enabling high-throughput construct design pipelines.
DnaCauldron is developed by the Edinburgh Genome Foundry and is widely used in synthetic biology laboratories and automated DNA foundries. It integrates with Biopython's SeqRecord ecosystem, reads and writes standard GenBank and FASTA formats, and generates detailed HTML and PDF assembly reports with annotated sequence maps, assembly graphs, and quality metrics. The library supports both simple single-step assemblies and complex multi-level hierarchical assembly plans.
Build and use the provided Docker image for zero-configuration setup:
docker build -t dnacauldron-skill ~/.agents/skills/dnacauldron
docker run --rm -v $(pwd):/workspace dnacauldron-skill python /scripts/assembly_wrapper.py --help
Run an assembly simulation:
docker run --rm -v $(pwd):/workspace dnacauldron-skill python /scripts/assembly_wrapper.py \
--method golden_gate \
--parts parts/*.gb \
--enzyme BsmBI \
--output /workspace/results
Verify the environment:
docker run --rm dnacauldron-skill python /scripts/setup_check.py
uv pip install "dnacauldron[reports]" biopython
Or with pip:
pip install "dnacauldron[reports]" biopython matplotlib pandas
Verify installation:
import dnacauldron as dc
print(dc.__version__)
Golden Gate assembly uses Type IIs restriction enzymes (e.g., BsaI, BsmBI, Esp3I) that cut outside their recognition sequence, generating user-defined 4-base overhangs. Parts are designed so that the overhangs of adjacent fragments are complementary, enabling ordered, scarless, multi-part assembly in a single reaction.
Basic Golden Gate assembly:
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.SeqFeature import SeqFeature, FeatureLocation
# Define parts with BsaI recognition sites and 4-bp overhangs
# BsaI recognizes GGTCTC and cuts 1/5 downstream
# Part format: ...GGTCTC(N1)XXXX[part_sequence]YYYY(N1)GAGACC...
# where XXXX and YYYY are the 4-bp overhangs
promoter = SeqRecord(
Seq("ATTGGTCTCAAATGATCGATCGATCGTAGCTAGCTAGCATCGATCGTTACCGAGACCTTCA"),
id="promoter",
annotations={"topology": "linear"}
)
cds = SeqRecord(
Seq("ATTGGTCTCATACCATGAAAGGTTTCGCTACCGTTGAAGCGCTGAAATAAAGGTGAGACCTTCA"),
id="coding_sequence",
annotations={"topology": "linear"}
)
terminator = SeqRecord(
Seq("ATTGGTCTCAAGGTAATCGATCGTAGCTTTTTTTTTTGCATCAGGCGAGACCTTCA"),
id="terminator",
annotations={"topology": "linear"}
)
backbone = SeqRecord(
Seq("ATTGGTCTCAAGGCGTAGCTAGCTAGCATCGATCGATCAATGGGTCTCAAATGTTCA"),
id="backbone",
annotations={"topology": "circular"}
)
# Create a sequence repository
repository = dc.SequenceRepository(
collections={
"parts": {
"promoter": promoter,
"coding_sequence": cds,
"terminator": terminator,
"backbone": backbone,
}
}
)
# Define the Type IIs assembly
assembly = dc.Type2sRestrictionAssembly(
parts=["promoter", "coding_sequence", "terminator", "backbone"],
enzyme="BsaI"
)
# Run the simulation
simulation = assembly.simulate(sequence_repository=repository)
# Inspect results
for record in simulation.construct_records:
print(f"Construct: {record.id}, Length: {len(record)} bp")
print(f"Sequence (first 60 bp): {str(record.seq[:60])}")
Golden Gate with BsmBI enzyme:
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# BsmBI recognizes CGTCTC and cuts 1/5 downstream
part_1 = SeqRecord(
Seq("AACGTCTCAAATGATCGATCGATCGTAGCTGCTAGAACCGTCTCATTCA"),
id="part_1",
annotations={"topology": "linear"}
)
part_2 = SeqRecord(
Seq("AACGTCTCAAAGCATCGATCGATCGATCGATCGATAGCTGCGTCTCATTCA"),
id="part_2",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={"parts": {"part_1": part_1, "part_2": part_2}}
)
assembly = dc.Type2sRestrictionAssembly(
parts=["part_1", "part_2"],
enzyme="BsmBI"
)
simulation = assembly.simulate(sequence_repository=repository)
print(f"Number of constructs: {len(simulation.construct_records)}")
Gibson Assembly joins DNA fragments with overlapping ends (typically 15-40 bp of homology) using a mix of exonuclease, polymerase, and ligase. DnaCauldron simulates this by identifying homology regions at fragment boundaries and predicting the fusion product.
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Define fragments with overlapping ends (30 bp homology regions)
# The last 30 bp of fragment_1 matches the first 30 bp of fragment_2, etc.
overlap_ab = "ATCGATCGATCGATCGATCGATCGATCGAT" # 30 bp overlap
fragment_a = SeqRecord(
Seq("ATGATCGATCGTAGCTAGCTAGCATCGATCG" + overlap_ab),
id="fragment_a",
annotations={"topology": "linear"}
)
fragment_b = SeqRecord(
Seq(overlap_ab + "GCTAGCTAGCATCGATCGATCGATCGATCG"),
id="fragment_b",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={
"parts": {
"fragment_a": fragment_a,
"fragment_b": fragment_b,
}
}
)
# Define Gibson assembly
assembly = dc.GibsonAssembly(
parts=["fragment_a", "fragment_b"],
homology_min_size=15, # minimum overlap in bp
homology_max_size=80 # maximum overlap in bp
)
simulation = assembly.simulate(sequence_repository=repository)
for record in simulation.construct_records:
print(f"Gibson product: {record.id}, Length: {len(record)} bp")
Multi-fragment Gibson Assembly:
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Three fragments with pairwise overlaps
overlap_1 = "GCTAGCTAGCATCGATCGATCGATCGATCG" # 30 bp
overlap_2 = "ATCGATCGTAGCTAGCTAGCATCGATCGAT" # 30 bp
frag_1 = SeqRecord(
Seq("ATGATCGATCGATCGATCGTAGCTAGCTAG" + overlap_1),
id="gibson_frag_1",
annotations={"topology": "linear"}
)
frag_2 = SeqRecord(
Seq(overlap_1 + "TTCGAAGCTTGCATGCCTGCAGGTCGACTC" + overlap_2),
id="gibson_frag_2",
annotations={"topology": "linear"}
)
frag_3 = SeqRecord(
Seq(overlap_2 + "CTAGAGTCGACCTGCAGGCATGCAAGCTTCG"),
id="gibson_frag_3",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={
"parts": {
"gibson_frag_1": frag_1,
"gibson_frag_2": frag_2,
"gibson_frag_3": frag_3,
}
}
)
assembly = dc.GibsonAssembly(
parts=["gibson_frag_1", "gibson_frag_2", "gibson_frag_3"],
homology_min_size=20,
homology_max_size=50
)
simulation = assembly.simulate(sequence_repository=repository)
for record in simulation.construct_records:
print(f"Multi-fragment Gibson product: {record.id}")
print(f"Length: {len(record)} bp")
BioBrick Standard Assembly uses the four restriction enzymes EcoRI, XbaI, SpeI, and PstI. Parts are flanked by a standard prefix (EcoRI-NotI-XbaI) and suffix (SpeI-NotI-PstI). Assembly joins two BioBrick parts by digesting the upstream part with EcoRI and SpeI and the downstream part with XbaI and PstI, then ligating.
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# BioBrick parts with standard prefix and suffix
# Prefix: GAATTC GCGGCCGC TCTAGA (EcoRI - NotI - XbaI)
# Suffix: ACTAGT GCGGCCGC CTGCAG (SpeI - NotI - PstI)
biobrick_prefix = "GAATTCGCGGCCGCTCTAGA"
biobrick_suffix = "ACTAGTGCGGCCGCCTGCAG"
part_rbs = SeqRecord(
Seq(biobrick_prefix + "AAAGAGGAGAAATACTAG" + biobrick_suffix),
id="BBa_B0034",
annotations={"topology": "linear"}
)
part_gfp = SeqRecord(
Seq(biobrick_prefix + "ATGCGTAAAGGAGAAGAACTTTTCACTGGAGTTGTC" + biobrick_suffix),
id="BBa_E0040",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={
"parts": {
"BBa_B0034": part_rbs,
"BBa_E0040": part_gfp,
}
}
)
assembly = dc.BioBrickStandardAssembly(
parts=["BBa_B0034", "BBa_E0040"]
)
simulation = assembly.simulate(sequence_repository=repository)
for record in simulation.construct_records:
print(f"BioBrick composite: {record.id}, Length: {len(record)} bp")
BASIC (Biopart Assembly Standard for Idempotent Cloning) is a linker-based assembly method. DnaCauldron supports BASIC assembly through its extensible architecture.
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# BASIC assembly uses standardized linkers between parts
# Parts are flanked by integrated prefix (iP) and suffix (iS) sequences
basic_part_1 = SeqRecord(
Seq("ATGATCGATCGTAGCTAGCTAGCATCGATCGATCG"),
id="basic_part_1",
annotations={"topology": "linear"}
)
basic_part_2 = SeqRecord(
Seq("GCTAGCTAGCATCGATCGATCGATCGATCGATCGAT"),
id="basic_part_2",
annotations={"topology": "linear"}
)
# BASIC assembly can be set up through the generic assembly interface
# Consult the DnaCauldron documentation for BASIC-specific classes if available
repository = dc.SequenceRepository(
collections={
"parts": {
"basic_part_1": basic_part_1,
"basic_part_2": basic_part_2,
}
}
)
DnaCauldron supports combining multiple assembly methods in a single workflow, enabling hierarchical assembly strategies where Golden Gate produces Level 1 constructs that feed into Gibson reactions for Level 2 assembly.
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Step 1: Golden Gate assembly of Level 1 parts
level0_parts = {}
for i in range(4):
level0_parts[f"L0_part_{i}"] = SeqRecord(
Seq(f"ATTGGTCTCAAATG{'ATCG' * 25}TACCGAGACCTTCA"),
id=f"L0_part_{i}",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={"parts": level0_parts}
)
# Level 1 Golden Gate
level1_assembly = dc.Type2sRestrictionAssembly(
parts=list(level0_parts.keys()),
enzyme="BsaI"
)
level1_sim = level1_assembly.simulate(sequence_repository=repository)
if level1_sim.construct_records:
print(f"Level 1 assembly produced {len(level1_sim.construct_records)} construct(s)")
for rec in level1_sim.construct_records:
print(f" {rec.id}: {len(rec)} bp")
DnaCauldron can generate comprehensive HTML and PDF reports with assembly diagrams, sequence annotations, and diagnostic information. This requires the dnacauldron[reports] extras.
import dnacauldron as dc
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
# Set up parts and assembly
promoter = SeqRecord(
Seq("ATTGGTCTCAAATGATCGATCGATCGTAGCTAGCTAGCATCGATCGTTACCGAGACCTTCA"),
id="promoter",
annotations={"topology": "linear"}
)
gene = SeqRecord(
Seq("ATTGGTCTCATACCATGAAAGGTTTCGCTACCGTTGAAGCGCTGAAATAAAGGTGAGACCTTCA"),
id="gene",
annotations={"topology": "linear"}
)
repository = dc.SequenceRepository(
collections={"parts": {"promoter": promoter, "gene": gene}}
)
assembly = dc.Type2sRestrictionAssembly(
parts=["promoter", "gene"],
enzyme="BsaI"
)
simulation = assembly.simulate(sequence_repository=repository)
# Generate a full assembly report
simulation.write_report(target="/workspace/assembly_report")
# This creates a directory with:
# - assembly_report/report.html (interactive HTML report)
# - assembly_report/assembly_graph.pdf
# - assembly_report/construct_sequences/ (GenBank files)
# - assembly_report/errors/ (if any issues detected)
Writing construct records to GenBank files:
from Bio import SeqIO
# Write each construct to a GenBank file
for i, record in enumerate(simulation.construct_records):
output_file = f"/workspace/construct_{i+1}.gb"
SeqIO.write(record, output_file, "genbank")
print(f"Wrote construct to {output_file}")
Writing construct records to FASTA:
from Bio import SeqIO
for i, record in enumerate(simulation.construct_records):
output_file = f"/workspace/construct_{i+1}.fasta"
SeqIO.write(record, output_file, "fasta")
print(f"Wrote FASTA to {output_file}")
Simulate a 3-part Golden Gate assembly (promoter + CDS + terminator) into a backbone vector.
import dnacauldron as dc
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import os
# Step 1: Load parts from GenBank files (if available) or define inline
parts_dir = "/workspace/parts"
if os.path.isdir(parts_dir):
# Load from GenBank files
part_records = {}
for gb_file in os.listdir(parts_dir):
if gb_file.endswith(".gb") or gb_file.endswith(".gbk"):
record = SeqIO.read(os.path.join(parts_dir, gb_file), "genbank")
part_records[record.id] = record