Design novel protein therapeutics (binders, enzymes, scaffolds) using AI-guided de novo design. Uses RFdiffusion for backbone generation, ProteinMPNN for sequence design, ESMFold/AlphaFold2 for validation. Use when asked to design protein binders, therapeutic proteins, or engineer protein function.
AI-guided de novo protein design using RFdiffusion backbone generation, ProteinMPNN sequence optimization, and structure validation for therapeutic protein development.
KEY PRINCIPLES:
Therapeutic protein design starts with the target interaction. What binding surface do you need to cover? A small pocket = nanobody or peptide. A large flat surface = designed protein. Stability, immunogenicity, and manufacturability constrain the design space.
When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when user asks to:
Phase 1: Target Characterization
Get structure (PDB, EMDB cryo-EM, AlphaFold), identify binding epitope
Phase 2: Backbone Generation (RFdiffusion)
Define constraints, generate >= 5 backbones, filter by geometry
Phase 3: Sequence Design (ProteinMPNN)
Design >= 8 sequences per backbone, sample with temperature control
Phase 4: Structure Validation (ESMFold/AlphaFold2)
Predict structure, compare to backbone, assess pLDDT/pTM
Phase 5: Developability Assessment
Aggregation, pI, expression prediction
Phase 6: Report Synthesis
Ranked candidates, FASTA, experimental recommendations
[TARGET]_protein_design_report.md first with section headers[TARGET]_designed_sequences.fasta and [TARGET]_top_candidates.csvEvery design MUST include: Sequence, Length, Target, Method, and Quality Metrics (pLDDT, pTM, MPNN score, binding prediction).
| Tool | Purpose | Key Parameter |
|---|---|---|
NvidiaNIM_rfdiffusion | Backbone generation | diffusion_steps (NOT num_steps) |
NvidiaNIM_proteinmpnn | Sequence design | pdb_string (NOT pdb) |
ESMFold_predict_structure | Fast validation | sequence (NOT seq) |
NvidiaNIM_alphafold2 | High-accuracy validation | sequence, algorithm |
NvidiaNIM_esm2_650m | Sequence embeddings | sequences, format |
| Tool | Wrong | Correct |
|---|---|---|
NvidiaNIM_rfdiffusion | num_steps=50 | diffusion_steps=50 |
NvidiaNIM_proteinmpnn | pdb=content | pdb_string=content |
ESMFold_predict_structure | seq="MVLS..." | sequence="MVLS..." |
NvidiaNIM_alphafold2 | seq="MVLS..." | sequence="MVLS..." |
NVIDIA_API_KEY environment variable required| Tool | Purpose | Key Parameters |
|---|---|---|
PDBe_get_uniprot_mappings | Find PDB structures | uniprot_id |
RCSBData_get_entry | Download PDB file | pdb_id |
alphafold_get_prediction | Get AlphaFold DB structure | accession |
emdb_search | Search cryo-EM maps | query |
emdb_get_entry | Get entry details | entry_id |
UniProt_get_entry_by_accession | Get target sequence | accession |
InterPro_get_protein_domains | Get domains | accession |
| Tier | Criteria |
|---|---|
| T1 (best) | pLDDT >85, pTM >0.8, low aggregation, neutral pI |
| T2 | pLDDT >75, pTM >0.7, acceptable developability |
| T3 | pLDDT >70, pTM >0.65, developability concerns |
| T4 | Failed validation or major developability issues |