Find, characterize, and source small molecules for chemical biology and drug discovery. Covers compound identification (PubChem, ChEMBL), structure search, binding affinity data, ADMET/drug-likeness prediction, and commercial availability (eMolecules, Enamine). Use when asked to find compounds, assess drug-likeness, search by structure, retrieve binding affinities, or source chemicals.
Systematic small molecule identification, characterization, and sourcing using PubChem, ChEMBL, BindingDB, ADMET-AI, SwissADME, eMolecules, and Enamine. Covers the full pipeline from compound name to structure, activity, ADMET properties, and commercial procurement.
Domain Reasoning
Drug-likeness is not a binary property. Lipinski's Rule of 5 was derived from orally administered, passively absorbed drugs and has many well-known exceptions: natural products, macrocycles, PROTACs, and many approved drugs violate one or more rules. The relevant question is not "does this pass Ro5?" but "does this compound's physicochemical profile match the requirements of the target, the intended route of administration, and the therapeutic context?" Focus on the specific requirements, not rigid rules.
LOOK UP DON'T GUESS
Compound identity (CID, ChEMBL ID, SMILES): call PubChem_get_CID_by_compound_name and ChEMBL_search_molecules; do not assume IDs from memory.
ADMET properties: run SwissADME_calculate_adme or on the actual SMILES; do not estimate logP, TPSA, or bioavailability.
関連 Skill
ADMETAI_predict_*
Binding affinities against a target: query ChEMBL_search_activities or BindingDB_get_ligands_by_uniprot; never cite IC50 values from memory.
Commercial availability: check eMolecules_search or Enamine_search_catalog; do not assume availability.
KEY PRINCIPLES:
Resolve identity first - Always get CID and ChEMBL ID before research
SMILES required for property prediction - Extract canonical SMILES from PubChem early
English names in tools - Use IUPAC or common English names; avoid abbreviations in tool calls
BindingDB is often unavailable - Fall back to ChEMBL activities when BindingDB times out
eMolecules/Enamine return URLs - These tools generate search URLs, not direct data; note this to user
COMPUTE, DON'T DESCRIBE
When analysis requires computation (statistics, data processing, scoring, enrichment), write and run Python code via Bash. Don't describe what you would do — execute it and report actual results. Use ToolUniverse tools to retrieve data, then Python (pandas, scipy, statsmodels, matplotlib) to analyze it.
# Step 1: Name -> CID (PubChem canonical identity)
PubChem_get_CID_by_compound_name(compound_name="imatinib")
# -> CID: 5291
# Step 2: Get SMILES and properties (needed for all downstream tools)
PubChem_get_compound_properties_by_CID(
cid="5291",
properties="MolecularFormula,MolecularWeight,CanonicalSMILES,InChIKey,IUPACName"
)
# -> canonical SMILES, InChIKey (global identifier)
# Step 3: Get ChEMBL ID (for activity data)
ChEMBL_search_molecules(query="imatinib")
# -> ChEMBL ID (e.g., "CHEMBL941")
# Step 4: Get all synonyms (brand names, INN, etc.)
PubChem_get_compound_synonyms_by_CID(cid="5291")
ID resolution priority:
Start with PubChem CID (most universal)
Get ChEMBL ID (for bioactivity data)
Use canonical SMILES for structure-based searches and ADMET
Phase 2: Structure-Based Search
Similarity search (find analogs):
PubChem_search_compounds_by_similarity(
smiles="CANONICAL_SMILES",
threshold=85 # Tanimoto threshold 0-100; 85 = highly similar
)
# Returns: list of CIDs of similar compounds
ChEMBL_search_similar_molecules(query="CHEMBL941") # Or SMILES
# Returns: ChEMBL entries sorted by similarity
Substructure search (find compounds containing a scaffold):
PubChem_search_compounds_by_substructure(smiles="SCAFFOLD_SMILES")
# Returns: CIDs of compounds containing the scaffold
Phase 3: Bioactivity and Binding Affinity
Get all activities for a compound (across all targets):
# First find target ChEMBL ID
ChEMBL_search_targets(query="EGFR", organism="Homo sapiens")
# -> target_chembl_id, e.g., "CHEMBL203"
ChEMBL_get_target_activities(
target_chembl_id="CHEMBL203"
)
# Returns: all compounds with binding data against this target
BindingDB (when available — often times out):
BindingDB_get_ligands_by_uniprot(uniprot_id="P00533") # EGFR
# Returns: Ki, IC50, Kd data with literature references
# Note: BindingDB REST API is frequently unavailable; fall back to ChEMBL
pChEMBL Value interpretation:
pChEMBL
IC50 / Ki
Affinity
>= 9
<= 1 nM
Very potent
>= 7
<= 100 nM
Potent
>= 6
<= 1 µM
Moderate
>= 5
<= 10 µM
Weak
< 5
> 10 µM
Inactive
Phase 4: Drug-likeness and ADMET
SwissADME (comprehensive, requires SMILES string — not list):
When you have a novel compound and want to predict targets:
SwissTargetPrediction_predict(
operation="predict",
smiles="CANONICAL_SMILES"
)
# Returns: predicted protein targets with probability scores
# Note: SwissTargetPrediction uses structure-similarity to known drug-target pairs
# May time out for complex molecules
Phase 6: Commercial Availability
eMolecules (aggregates 200+ suppliers — returns search URL, not direct data):
eMolecules_search(query="compound_name")
# -> Returns search_url to visit on eMolecules.com
eMolecules_search_smiles(smiles="CANONICAL_SMILES")
# -> Returns URL for exact/similar structure search
Enamine (37B+ make-on-demand compounds — returns URL when API unavailable):
Enamine_search_catalog(query="compound_name")
# -> If API available: returns catalog entries with catalog_id, price
# -> If API unavailable: returns search_url for manual search
Enamine_search_smiles(smiles="CANONICAL_SMILES")
# -> Exact or similarity structure search
Enamine_get_libraries()
# -> Lists available Enamine screening collections
Note: eMolecules and Enamine APIs frequently return search URLs rather than live data. Present these to the user as "search here" links.
Tool Parameter Reference
Tool
Required Params
Notes
PubChem_get_CID_by_compound_name
compound_name
Returns list of CIDs; take first or most relevant
PubChem_get_CID_by_SMILES
smiles
Use canonical SMILES
PubChem_get_compound_properties_by_CID
cid, properties
cid as string; properties comma-separated
PubChem_search_compounds_by_similarity
smiles
threshold (int 0-100, default 90)
PubChem_search_compounds_by_substructure
smiles
Returns CIDs matching scaffold
ChEMBL_search_molecules
query
Name, ChEMBL ID, or InChIKey
ChEMBL_get_molecule
chembl_id
Full format: "CHEMBL941" not "941"
ChEMBL_search_similar_molecules
query
SMILES or ChEMBL ID
ChEMBL_search_activities
molecule_chembl_id OR target_chembl_id
Use pchembl_value__gte=6 to filter potent
ChEMBL_get_drug_mechanisms
drug_chembl_id or drug_name
For approved drugs only
ChEMBL_search_targets
query
Add organism="Homo sapiens" to filter human
ChEMBL_get_target_activities
target_chembl_id
Returns all ligands for target
SwissADME_calculate_adme
operation="calculate_adme", smiles
SMILES as string (not list)
SwissADME_check_druglikeness
operation="check_druglikeness", smiles
SMILES as string
Common Patterns
Pattern 1: Full Compound Profile
Input: Compound name (e.g., "imatinib")
Flow:
1. PubChem_get_CID_by_compound_name -> CID + SMILES
2. ChEMBL_search_molecules -> ChEMBL ID
3. PubChem_get_compound_properties_by_CID -> physicochemical props
4. SwissADME_calculate_adme / ADMETAI_predict_* -> ADMET profile
5. ChEMBL_search_activities(molecule_chembl_id) -> binding data
6. ChEMBL_get_drug_mechanisms -> MOA (if approved drug)
Output: Complete compound profile with identity, ADMET, and activity data
Pattern 2: Analog Discovery
Input: Reference compound SMILES
Flow:
1. PubChem_search_compounds_by_similarity(smiles, threshold=85) -> similar CIDs
2. ChEMBL_search_similar_molecules(query=smiles) -> ChEMBL analogs
3. For each hit: PubChem_get_compound_properties_by_CID -> properties
4. SwissADME_check_druglikeness -> filter by drug-likeness
Output: Ranked list of analogs with activity data and drug-likeness scores
Pattern 3: Target-Based Compound Search
Input: Target name (e.g., "EGFR")
Flow:
1. ChEMBL_search_targets(query="EGFR", organism="Homo sapiens") -> target_chembl_id
2. ChEMBL_get_target_activities(target_chembl_id) -> all ligands with Ki/IC50
3. Filter by pchembl_value >= 7 (potent compounds)
4. For top hits: SwissADME_check_druglikeness -> assess drug-likeness
5. eMolecules_search(query=compound_name) -> check commercial availability
Output: Prioritized list of potent, drug-like, commercially available compounds