Predict and analyze protein 3D structure from amino acid sequence using ESMFold and AlphaFold. Covers de novo structure prediction (ESMFold for sequences up to ~800 residues), AlphaFold model retrieval, quality assessment (pLDDT scores), experimental structure comparison (RCSB), variant structural impact (ProtVar), and sequence physicochemical property calculation (ProtParam). Use when asked to predict protein structure from sequence, assess structure quality, compare predictions to experimental structures, or evaluate how mutations affect protein structure.
End-to-end workflow for protein structure prediction starting from a sequence or UniProt accession. Combines ESMFold de novo prediction, AlphaFold database retrieval, experimental structure benchmarking from RCSB, ProtVar variant impact assessment, and ProtParam sequence property calculation.
KEY PRINCIPLES:
qualifier parameter (UniProt accession)When uncertain about any scientific fact, SEARCH databases first rather than reasoning from memory. A database-verified answer is always more reliable than a guess.
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
Apply when users ask:
Not for (use tooluniverse-protein-structure-retrieval instead): retrieval-only tasks where user provides a PDB ID or wants to browse experimental structures without prediction.
| Parameter | Required | Description | Example |
|---|---|---|---|
| sequence | Yes (for ESMFold) | Amino acid sequence (single-letter FASTA) | MVLSPADKTNVK... |
| uniprot_id | Yes (for AlphaFold) | UniProt accession | P04637, P69905 |
| variant | No | Variant notation for structural impact | P04637 R175H, TP53 R175H |
| max_length | No | ESMFold limit: ~800 residues recommended | — |
Phase 0: Input preparation (sequence retrieval if needed)
|
Phase 1: Sequence properties (ProtParam_calculate)
|
Phase 2: De novo prediction (ESMFold_predict_structure)
|
Phase 3: AlphaFold reference (alphafold_get_prediction + alphafold_get_summary)
|
Phase 4: Experimental structure comparison (RCSBAdvSearch_search_structures, RCSBData_get_entry)
|
Phase 5: Variant structural impact (ProtVar_map_variant + ProtVar_get_function) [if variant provided]
|
Phase 6: Quality synthesis and interpretation
Objective: Obtain or verify the protein sequence needed for ESMFold prediction.
Use it directly for ESMFold_predict_structure. Check length:
800 residues: ESMFold may fail or produce lower quality; recommend using AlphaFold instead
Retrieve sequence from UniProt_get_entry_by_accession:
accession: UniProt accessionsequence.value field from the responseNote: If only a name is given (not accession), first resolve with UniProt_search or MyGene_query_genes to get the UniProt accession, then fetch the sequence.
Objective: Calculate physicochemical properties before prediction to contextualize results.
ProtParam_calculate:
sequence: amino acid sequence string (single-letter code)Objective: Predict 3D structure from sequence using Meta's ESM-2 language model.
ESMFold_predict_structure:
sequence: amino acid sequence stringESMFold_predict_structure with the sequence| pLDDT Range | Interpretation | Reliability |
|---|---|---|
| >90 | Very high confidence | Equivalent to experimental quality |
| 70-90 | High confidence | Backbone reliable, side chains approximate |
| 50-70 | Low confidence | Potentially disordered or flexible region |
| <50 | Very low confidence | Likely intrinsically disordered; do not interpret |
| pTM Score | Fold Confidence |
|---|---|
| >0.8 | High confidence global fold |
| 0.5-0.8 | Moderate; some domains may be uncertain |
| <0.5 | Low global fold confidence |
Objective: Retrieve precomputed AlphaFold2 model for comparison and higher-accuracy reference.
alphafold_get_prediction:
qualifier (or alias uniprot_id / uniprot_accession): UniProt accession (e.g., "P04637")alphafold_get_summary:
qualifier (or alias uniprot_id / uniprot_accession): UniProt accessionalphafold_get_annotations (optional):
qualifier: UniProt accessionalphafold_get_prediction and alphafold_get_summaryObjective: Check whether experimental structures exist in PDB and how predictions compare.
RCSBAdvSearch_search_structures (search by protein/gene name):
query: protein name or gene symbollimit: number of results (default 10)RCSBData_get_entry (details for a specific PDB ID):
pdb_id: 4-character PDB identifierObjective: Assess how a specific amino acid substitution affects the predicted structure.
ProtVar_map_variant:
variant: string notation like "P04637 R175H" or HGVS notationProtVar_get_function:
accession: UniProt accessionposition: integer residue positionvariant_aa: mutant amino acid (single letter)ProtVar_map_variant to resolve the variant and confirm positionProtVar_get_function with wild-type position to get domain context| Tier | Evidence |
|---|---|
| T1 | Clinical/functional data for this exact variant (from ProtVar) |
| T2 | Variant at experimentally characterized active site or binding interface |
| T3 | Computational pathogenicity prediction (PolyPhen, SIFT from ProtVar) |
| T4 | Position in predicted structured region only |
Protein summary — name, length, pI, stability index (from ProtParam)
Structure prediction summary table:
| Method | Mean pLDDT | pTM/Global Score | Coverage | Notes |
|---|---|---|---|---|
| ESMFold | X.X | X.X | 100% (full seq) | — |
| AlphaFold | X.X | — | 100% | version vN |
| Experimental (best) | N/A | N/A | XX% | PDB: XXXX, Xray, X.X A |
Confidence map — regions of high vs low confidence; highlight disordered regions
Experimental structure comparison — does PDB have coverage? How does prediction align?
Variant impact (if applicable) — domain context, pathogenicity, structural consequence
Recommendations:
| Tool | Key Parameter | Notes |
|---|---|---|
ESMFold_predict_structure | sequence | Raw amino acid string, no spaces, no FASTA header |
alphafold_get_prediction | qualifier or uniprot_id | UniProt accession (e.g., "P04637") |
alphafold_get_summary | qualifier or uniprot_id | Same UniProt accession |
ProtParam_calculate | sequence | Same sequence string |
ProtVar_map_variant | variant | Format: "<UniProt_ID> <AA><pos><AA>" e.g., "P04637 R175H" |
ProtVar_get_function | position | Integer residue number |
| Situation | Fallback |
|---|---|
| ESMFold fails (sequence too long > 800 aa) | Use AlphaFold model only; note length limitation |
| AlphaFold no entry for UniProt ID | Use ESMFold prediction only |
| RCSB search returns no results | Note no experimental structure; proceed with predictions |
| No UniProt accession available | Use ESMFold from raw sequence; skip AlphaFold |
| ProtVar variant not found | Manually assess position from domain annotation in Phase 4 |
| Database | Coverage | What it provides |
|---|---|---|
| ESMFold | Any protein sequence (up to ~800 aa) | De novo structure prediction from sequence alone |
| AlphaFold DB | UniProt reviewed proteins (>200M entries) | Precomputed predictions with per-residue pLDDT |
| RCSB PDB | ~220,000 experimental structures | Ground-truth experimental coordinates for comparison |
| ProtVar | All UniProt proteins | Variant impact, domain context, clinical annotations |
| ProtParam | Any sequence | Physicochemical sequence properties |