Generate diverse lead compounds for a specific protein target using structure-based drug design with MolCraft. Use this skill when: (1) Designing drug candidates for a known protein target (PDB ID or disease name), (2) Generating structurally diverse molecules with optimized binding affinity, (3) Filtering candidates based on user-defined criteria (docking, ADMET, drug-likeness), (4) Iteratively refining leads through regeneration when criteria are not met.
PharMolix1,042 スター2026/03/19
職業
カテゴリ
計算化学
スキル内容
Generate diverse, drug-like lead compounds targeting a specific protein using AI-powered structure-based drug design.
When to Use
User provides a PDB ID or disease name and wants drug candidates
User wants to design molecules for a specific protein target
User needs diverse leads with user-defined property criteria
User wants iterative refinement with regeneration loop
Inputs
Parameter
Type
Required
Description
target
str
Yes
PDB ID (e.g., "4xli") or disease name
num_candidates
int
No
Initial candidates to generate (default: 40)
target_leads
int
No
Desired number of final leads (default: 20)
User Criteria (Filtering Thresholds)
関連 Skill
Criterion
Default
Description
docking_threshold
-10.0
Maximum docking score (kcal/mol), more negative = better
qed_min
0.4
Minimum QED score (0-1), higher = more drug-like
lipinski_min
4
Minimum Lipinski rules obeyed (0-4), 4 = no violations
side_effects_max
18
Maximum SIDER side effect categories predicted
similarity_max
0.7
Maximum Tanimoto similarity between selected leads
Workflow
Phase 1: Target Identification
└── Path A: PDB ID provided → Download structure directly
└── Path B: Disease/target name provided → Agent-based discovery:
├── Agent searches web for PDB structures
├── Agent examines each PDB's ligands
├── Agent searches literature to validate ligand is a true binder
│ └── Fallback (if 3 search attempts fail):
│ └── Judge by molecular weight:
│ • MW ≥ 150 Da → Likely drug-like binder (accept)
│ • MW 100-150 Da → Fragment (accept with caution)
│ • MW < 100 Da → Likely solvent/ion (exclude)
├── Agent ranks by resolution, returns best PDB ID
└── If no valid PDB found → Ask user for PDB ID
Phase 2: Structure Preparation
└── Extract protein chains and ligands
└── Define binding pocket (from reference ligand)
Phase 3: De Novo Generation
└── Generate candidates using MolCraft
└── Save candidates to SDF files
Phase 4: Docking
└── Dock all candidates (AutoDock Vina)
Phase 5: Property + ADMET Calculation
└── Drug-likeness: QED, SA, LogP, Lipinski
└── ADMET: BBB penetration, Side effects (SIDER)
Phase 6: Filtering & Diversity Selection
└── Apply user criteria → Filter candidates
└── Greedy diversity selection (Tanimoto)
└── Regeneration check → Iterate if needed
Phase 7: PLIP Interaction Analysis (selected molecules only)
└── Analyze protein-ligand interactions for selected leads
└── Report hydrophobic contacts, H-bonds, π-stacking, salt bridges
Phase 8: Visualization (selected molecules only)
└── 2D molecule structures (RDKit)
└── 3D rotating complex GIF (PyMOL, requires installation)
Core Implementation
Phase 1-2: Target Retrieval & Pocket Definition
from open_biomed.tools.tool_registry import TOOLS
from open_biomed.data import Pocket
# Download PDB structure
pdb_tool = TOOLS["protein_pdb_request"]
pdb_file, _ = pdb_tool.run(accession="4xli", mode="file_only")
# Extract protein and ligand
extract_tool = TOOLS["extract_molecules_from_pdb_file"]
results, _ = extract_tool.run(pdb_file=pdb_file[0])
# results[0] contains list of (type, chain_id, entity) tuples
protein = [r[2] for r in results[0] if r[0] == "protein"][0]
ligand = [r[2] for r in results[0] if r[0] == "molecule"][0]
# Define pocket from reference ligand
pocket = Pocket.from_protein_ref_ligand(protein, ligand, radius=10.0)
pocket.estimated_num_atoms = ligand.get_num_atoms()
Phase 3: Molecule Generation
from open_biomed.core.pipeline import InferencePipeline
from pytorch_lightning import seed_everything
pipeline = InferencePipeline(
task="structure_based_drug_design",
model="molcraft",
model_ckpt="./checkpoints/molcraft/last_updated.ckpt",
device="cuda:0"
)
candidates = []
for i in range(num_candidates):
seed_everything(i * 1000 + 42)
outputs = pipeline.run(pocket=pocket)
if outputs and outputs[0] and outputs[0][0]:
mol = outputs[0][0]
mol._add_smiles()
candidates.append(mol)
similarity_tool = TOOLS["molecule_similarity"]
# Apply user criteria
filtered = [i for i, mol in enumerate(candidates) if
mol.docking_score <= docking_threshold and
mol.qed >= qed_min and
mol.lipinski >= lipinski_min and
mol.num_side_effects <= side_effects_max]
# Build similarity matrix
n = len(filtered)
sim_matrix = [[0.0] * n for _ in range(n)]
for i in range(n):
for j in range(i+1, n):
sim, _ = similarity_tool.run(
molecule_1=candidates[filtered[i]],
molecule_2=candidates[filtered[j]])
sim_matrix[i][j] = sim_matrix[j][i] = sim[0]
# Greedy diversity selection
selected = [filtered[0]]
for idx in filtered[1:]:
is_diverse = all(
similarity_matrix[idx][s] <= similarity_max
for s in selected)
if is_diverse:
selected.append(idx)
Regeneration Loop
while len(selected) < target_leads and attempts < max_attempts:
print(f"Only {len(selected)} leads, need {target_leads}")
print("Options: 1) Generate more, 2) Relax criteria, 3) Accept")
# User chooses action
if user_choice == "generate":
new_candidates = generate_more(num_additional)
candidates.extend(new_candidates)
# Re-run from Phase 4
elif user_choice == "relax":
qed_min = max(0.3, qed_min - 0.1)
side_effects_max += 3
# Re-filter