Programmatic access to the PubChem database (via PUG-REST API and PubChemPy) for searching chemical compounds, retrieving physicochemical properties, performing structure similarity/substructure searches, and obtaining bioactivity data.
Flexible compound search by name, CID, SMILES, InChI, or formula.
Property retrieval via PubChem PUG-REST and PubChemPy (e.g., MW, LogP, Canonical SMILES).
Structure search:
Bioactivity retrieval linked to PubChem BioAssay records.
Rate-limit aware implementation (respects PubChem’s limit of max 5 requests/sec).
Python function interface for seamless integration into scientific pipelines.
Install the required Python packages:
uv pip install pubchempy requests
pubchempy (version: not pinned)requests (version: not pinned)Primary module:
scripts/pubchem_ops.pypython -c "from scripts.pubchem_ops import get_properties; print(get_properties(query_value='Aspirin', query_type='name'))"
Or in Python:
from scripts.pubchem_ops import get_properties
result = get_properties(query_value="Aspirin", query_type="name")
print(result)
python -c "from scripts.pubchem_ops import structure_search; print(structure_search(query_value='CC(=O)OC1=CC=CC=C1C(=O)O', search_type='similarity'))"
Or in Python:
from scripts.pubchem_ops import structure_search
smiles = "CC(=O)OC1=CC=CC=C1C(=O)O"
result = structure_search(query_value=smiles, search_type="similarity")
print(result)
python -c "from scripts.pubchem_ops import get_bioactivity; print(get_bioactivity(cid=2244))"
Or in Python:
from scripts.pubchem_ops import get_bioactivity
result = get_bioactivity(cid=2244)
print(result)
Primary script: scripts/pubchem_ops.py
Data sources / endpoints:
pubchem.ncbi.nlm.nih.gov/rest/pugPubChemPySupported operations:
get_properties: retrieve physicochemical properties by name/CID/SMILES/InChI/formula.structure_search: perform similarity or substructure search.get_bioactivity: retrieve assay and bioactivity-related data by CID.Input constraints:
query_type must match supported types (e.g., name, cid, smiles, inchi, formula).search_type must be similarity or substructure.Error handling:
None if compound is not found.Troubleshooting considerations:
pubchem.ncbi.nlm.nih.gov.Additional reference:
references/api_reference.md