Convert between IUPAC names, SMILES strings, molecular formulas, and common names for chemical compounds. Supports SMILES validation, batch processing, structure standardization, and cheminformatics database preparation for drug discovery workflows.
Interconvert between different chemical structure representations including IUPAC names, SMILES strings, molecular formulas, and common names. Essential for cheminformatics workflows, database standardization, and compound registration in drug discovery and chemical research.
Key Capabilities:
This skill accepts: compound names (common or IUPAC), SMILES strings, or InChI identifiers. Batch input via CSV or plain text list is also supported.
If the request does not involve converting or validating chemical structure identifiers — for example, asking to predict biological activity, perform docking, or interpret spectra — do not proceed. Instead respond:
"Chemical Structure Converter is designed to interconvert chemical identifiers (names, SMILES, formulas). Please provide a compound name or SMILES string. For other cheminformatics tasks, use a more appropriate tool."
python -m py_compile scripts/main.py
python scripts/main.py --help
Fallback: If no identifier is provided, respond: "No chemical identifier provided. Please supply a compound name (--name), SMILES string (--smiles), or IUPAC name (--iupac). Cannot convert without an input identifier."
from scripts.main import ChemicalStructureConverter
converter = ChemicalStructureConverter()
data = converter.name_to_identifiers("aspirin")
# → IUPAC: 2-acetoxybenzoic acid, SMILES: CC(=O)Oc1ccccc1C(=O)O, Formula: C9H8O4, MW: 180.16
| From → To | Use Case |
|---|---|
| Name → SMILES | Literature to database |
| SMILES → IUPAC | Machine to human readable |
| IUPAC → SMILES | Chemical registration |
| SMILES → Formula | Quick MW calculation |
is_valid, message = converter.validate_smiles("CC(=O)Oc1ccccc1C(=O)O")
# → True, "Valid SMILES syntax"
| Check | Example Error |
|---|---|
| Parentheses | C(=O — missing closing |
| Ring closures | C1CC — ring not closed |
| Atom validity | @ — invalid character |
for compound in compound_list:
data = converter.name_to_identifiers(compound)
if not data:
print(f"Warning: '{compound}' not found in database")
# Convert by compound name
python scripts/main.py --name aspirin
# Convert SMILES to IUPAC
python scripts/main.py --smiles "CC(=O)Oc1ccccc1C(=O)O"
# Validate SMILES
python scripts/main.py --smiles "CCO" --validate
# List all compounds
python scripts/main.py --list
| Parameter | Type | Required | Description |
|---|---|---|---|
--name, -n | string | No | Compound name |
--smiles, -s | string | No | SMILES string |
--iupac, -i | string | No | IUPAC name |
--validate | flag | No | Validate SMILES syntax |
--list, -l | flag | No | List available compounds |
Every final response must make these explicit:
https://pubchem.ncbi.nlm.nih.gov/compound/{compound_name} and https://www.chemspider.com/Search.aspx?q={compound_name}. For programmatic lookup, query the PubChem REST API: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/name/{name}/JSON. The script should automatically query this endpoint when a compound is not found locally.scripts/main.py fails, report the failure point and provide manual fallback guidance.X/N compounds converted successfully, Y failed (list failed compound names).DB_VERSION in scripts/main.py. To add compounds, update the COMPOUND_DB dict and increment DB_VERSION.C = aliphatic carbon, c = aromatic carbon= = double bond, # = triple bond() = branching, [] = explicit valence/charge@ = anticlockwise (S), @@ = clockwise (R)Known Limitation: Local database contains common compounds only. Integrate PubChem API for production use.