Convert between IUPAC names, SMILES strings, and molecular formulas for chemical compounds. Supports structure validation, identifier interconversion, and cheminformatics data preparation for drug discovery and chemical research workflows.
Interconvert between different chemical structure representations including IUPAC names, SMILES strings, molecular formulas, and common names. Essential for cheminformatics workflows, database standardization, and compound registration in drug discovery and chemical research.
Key Capabilities:
✅ Use this skill when:
❌ Do NOT use when:
Related Skills:
chemical-storage-sorter, adme-property-predictormolecular-docking-predictor, bio-ontology-mapperUpstream Skills:
chemical-storage-sorter: Classify chemicals by hazard group before storage registrationadme-property-predictor: Convert structures to standardized formats before ADME predictionsafety-data-sheet-reader: Extract chemical names from SDS for structure lookupDownstream Skills:
molecular-docking-predictor: Convert compound libraries to 3D structures for dockingbio-ontology-mapper: Map chemical structures to standardized ontologies (ChEBI, PubChem)lab-inventory-tracker: Register standardized chemical identifiers in inventoryComplete Workflow:
Literature/Patent → chemical-structure-converter → adme-property-predictor → molecular-docking-predictor → Hit Selection
Convert chemical structures between different representation formats for database interoperability.
from scripts.main import ChemicalStructureConverter
converter = ChemicalStructureConverter()
# Convert compound name to all available identifiers
chemical_name = "aspirin"
data = converter.name_to_identifiers(chemical_name)
if data:
print(f"Compound: {chemical_name}")
print(f"IUPAC Name: {data['iupac']}")
print(f"SMILES: {data['smiles']}")
print(f"Formula: {data['formula']}")
print(f"Molecular Weight: {data['mw']} g/mol")
# Output:
# Compound: aspirin
# IUPAC Name: 2-acetoxybenzoic acid
# SMILES: CC(=O)Oc1ccccc1C(=O)O
# Formula: C9H8O4
# Molecular Weight: 180.16 g/mol
Supported Conversions:
| From → To | Method | Use Case |
|---|---|---|
| Name → SMILES | Database lookup | Literature to database |
| SMILES → IUPAC | Structure recognition | Machine to human readable |
| IUPAC → SMILES | Name parsing | Chemical registration |
| SMILES → Formula | Atom counting | Quick MW calculation |
Best Practices:
Common Issues and Solutions:
Issue: Compound not in local database
Issue: Multiple valid SMILES for same compound
Validate SMILES syntax to ensure structural integrity before computational processing.
from scripts.main import ChemicalStructureConverter
converter = ChemicalStructureConverter()
# Validate SMILES strings
smiles_examples = [
"CC(=O)Oc1ccccc1C(=O)O", # Aspirin - valid
"CCO", # Ethanol - valid
"C(=O", # Invalid - unclosed parenthesis
"C1CCCCC", # Invalid - unclosed ring
]
for smiles in smiles_examples:
is_valid, message = converter.validate_smiles(smiles)
status = "✅ Valid" if is_valid else "❌ Invalid"
print(f"{smiles:<30} {status}: {message}")
# Output:
# CC(=O)Oc1ccccc1C(=O)O ✅ Valid: Valid SMILES syntax
# CCO ✅ Valid: Valid SMILES syntax
# C(=O ❌ Invalid: Mismatched parentheses
# C1CCCCC ❌ Invalid: Ring closure error
Validation Checks:
| Check | Description | Example Error |
|---|---|---|
| Parentheses | Matching ( and ) | C(=O - missing closing |
| Brackets | Matching [ and ] | [Na+ - missing closing |
| Ring closures | Matching digits | C1CC - ring not closed |
| Atom validity | Recognized elements | @ - invalid character |
| Valence | Chemical validity | C(C)(C)(C)(C)C - 5 bonds to C |
Best Practices:
Common Issues and Solutions:
Issue: Valid syntax but chemically impossible
Issue: Tautomeric ambiguity
Process multiple chemical structures simultaneously for database standardization.
from scripts.main import ChemicalStructureConverter
converter = ChemicalStructureConverter()
# Batch process compound list
compound_list = [
"aspirin",
"caffeine",
"glucose",
"ethanol",
"unknown_compound"
]
results = []
for compound in compound_list:
data = converter.name_to_identifiers(compound)
if data:
results.append({
'name': compound,
'iupac': data['iupac'],
'smiles': data['smiles'],
'formula': data['formula'],
'mw': data['mw']
})
else:
print(f"⚠️ Warning: '{compound}' not found in database")
# Display results table
print("\n" + "="*80)
print(f"{'Name':<20} {'Formula':<15} {'MW':<10} {'SMILES'}")
print("="*80)
for r in results:
print(f"{r['name']:<20} {r['formula']:<15} {r['mw']:<10.2f} {r['smiles'][:40]}")
Best Practices:
Common Issues and Solutions:
Issue: Synonym confusion
Issue: Mixture or salt forms
Extract molecular formulas and calculate basic properties from SMILES or names.
from scripts.main import ChemicalStructureConverter
converter = ChemicalStructureConverter()
# Analyze compound properties
compounds = ["aspirin", "caffeine", "glucose"]
print("Molecular Properties:")
print("-" * 70)
print(f"{'Compound':<15} {'Formula':<12} {'MW (g/mol)':<12} {'Heavy Atoms'}")
print("-" * 70)
for name in compounds:
data = converter.name_to_identifiers(name)
if data:
# Count heavy atoms (non-hydrogen) from formula
formula = data['formula']
heavy_atoms = sum(int(c) for c in formula if c.isdigit())
if heavy_atoms == 0: # Single atoms like C, O
heavy_atoms = len([c for c in formula if c.isupper()])
print(f"{name:<15} {data['formula']:<12} {data['mw']:<12.2f} {heavy_atoms}")
Calculated Properties:
| Property | Calculation | Use Case |
|---|---|---|
| Molecular Weight | Sum of atomic weights | Dosing, filtering |
| Heavy Atoms | Non-hydrogen atoms | Size estimation |
| Formula | Atom count from structure | Database indexing |
| Rotatable Bonds | Count rotatable bonds | Flexibility index |
Best Practices:
Common Issues and Solutions:
Issue: Hydrates and solvates
Standardize chemical representations for database consistency.
from scripts.main import ChemicalStructureConverter
def standardize_compound_entry(name: str, converter) -> dict:
"""
Standardize compound entry with all identifiers.
Returns standardized entry or None if not found.
"""
data = converter.name_to_identifiers(name)
if not data:
return None
# Create standardized entry
standardized = {
'common_name': name.lower(),
'iupac_name': data['iupac'],
'smiles': data['smiles'],
'inchi': f"InChI=1S/{data['formula']}", # Placeholder
'molecular_formula': data['formula'],
'molecular_weight': data['mw'],
'standardized_date': '2026-02-09',
'source': 'local_database'
}
return standardized
# Example usage
converter = ChemicalStructureConverter()
entry = standardize_compound_entry("aspirin", converter)
if entry:
print("Standardized Entry:")
for key, value in entry.items():
print(f" {key}: {value}")
Standardization Rules:
| Rule | Standard Form | Example |
|---|---|---|
| Common names | Lowercase | "aspirin" not "Aspirin" |
| IUPAC | Full systematic name | "2-acetoxybenzoic acid" |
| SMILES | Canonical | No stereochemistry if unspecified |
| Formula | Hill system | C, H, then alphabetical |
Best Practices:
Common Issues and Solutions:
Issue: Multiple valid representations
Prepare chemical data for import into cheminformatics databases.
import json
from scripts.main import ChemicalStructureConverter
def prepare_database_import(compound_names: list, converter) -> list:
"""
Prepare compound list for database import.
Returns list of standardized database records.
"""
records = []
for name in compound_names:
data = converter.name_to_identifiers(name)
if data:
record = {
'compound_id': f"CMPD_{len(records)+1:04d}",
'common_name': name,
'iupac_name': data['iupac'],
'smiles': data['smiles'],
'molecular_formula': data['formula'],
'molecular_weight': data['mw'],
'status': 'active'
}
records.append(record)
else:
print(f"⚠️ Skipped: {name} (not in database)")
return records
# Generate database import file
converter = ChemicalStructureConverter()
compounds = ["aspirin", "caffeine", "glucose", "ethanol"]
db_records = prepare_database_import(compounds, converter)
# Export to JSON for database import
with open('chemical_database_import.json', 'w') as f:
json.dump(db_records, f, indent=2)
print(f"\nExported {len(db_records)} compounds to database import file")
Database Schema Example:
CREATE TABLE compounds (
compound_id VARCHAR(20) PRIMARY KEY,
common_name VARCHAR(255),
iupac_name VARCHAR(500),
smiles VARCHAR(1000),
molecular_formula VARCHAR(50),
molecular_weight DECIMAL(10,4),
created_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Best Practices:
Common Issues and Solutions:
Issue: Character encoding problems
From compound names to standardized database:
# Step 1: Convert single compound
python scripts/main.py --name aspirin
# Step 2: Validate SMILES
python scripts/main.py --smiles "CC(=O)Oc1ccccc1C(=O)O" --validate
# Step 3: Convert IUPAC to SMILES
python scripts/main.py --iupac "ethanol"
# Step 4: List available compounds
python scripts/main.py --list
Python API Usage:
from scripts.main import ChemicalStructureConverter
import pandas as pd
def process_compound_library(
compound_list: list,
output_file: str = "compound_library.csv"
) -> pd.DataFrame:
"""
Process compound library for cheminformatics analysis.
Args:
compound_list: List of compound names
output_file: Output CSV filename
Returns:
DataFrame with standardized compound data
"""
converter = ChemicalStructureConverter()
records = []
not_found = []
print("Processing compound library...")
print("="*60)
for compound in compound_list:
data = converter.name_to_identifiers(compound)
if data:
records.append({
'name': compound,
'iupac': data['iupac'],
'smiles': data['smiles'],
'formula': data['formula'],
'mw': data['mw']
})
print(f"✅ {compound}")
else:
not_found.append(compound)
print(f"❌ {compound} - not found")
print("="*60)
# Create DataFrame
df = pd.DataFrame(records)
# Export to CSV
df.to_csv(output_file, index=False)
print(f"\nExported {len(df)} compounds to {output_file}")
if not_found:
print(f"\n⚠️ {len(not_found)} compounds not found:")
for comp in not_found:
print(f" - {comp}")
return df
# Process library
library = ["aspirin", "caffeine", "glucose", "ethanol", "unknown_drug"]
df = process_compound_library(library, "my_library.csv")
print("\nLibrary Summary:")
print(f"Total compounds: {len(df)}")
print(f"Average MW: {df['mw'].mean():.2f} g/mol")
print(f"MW range: {df['mw'].min():.2f} - {df['mw'].max():.2f} g/mol")
Expected Output Files:
chemical_data/
├── compound_library.csv # Standardized compound data
├── missing_compounds.txt # List of compounds not found
├── database_import.json # JSON format for database import
└── validation_report.txt # SMILES validation results
Scenario: Converting compound names from publications to SMILES for database entry.
{
"task": "literature_to_database",
"source": "Journal article compound list",
"input_format": "Common names and IUPAC",
"output_format": "SMILES for database",
"volume": "50 compounds",
"quality_check": "Validate all SMILES"
}
Workflow:
Output Example:
Literature Conversion Results:
Total compounds: 50
Successfully converted: 47 (94%)
Manual review needed: 3
- Compound_23: ambiguous name
- Compound_31: salt form unclear
- Compound_45: stereochemistry unspecified
Database ready: 47 compounds exported
Scenario: Preparing compound library for virtual screening pipeline.
{
"task": "virtual_screening_prep",
"library_size": "10,000 compounds",
"source_formats": ["SDF", "SMILES", "MOL"],
"target_format": "Canonical SMILES",
"requirements": [
"Validate all structures",
"Remove duplicates",
"Calculate properties",
"Flag reactive groups"
]
}
Workflow:
Output Example:
Virtual Screening Library Preparation:
Input: 10,000 compounds
After validation: 9,847 (153 invalid SMILES removed)
After deduplication: 9,520 (327 duplicates removed)
Property Distribution:
MW range: 150-650 Da
Average MW: 387.5 Da
MW < 500: 8,234 compounds (86%)
Ready for docking: 9,520 compounds
Scenario: Extracting and standardizing compounds from patent text.
{
"task": "patent_extraction",
"source": "US Patent with IUPAC names",
"compounds": "25 specific compounds",
"challenge": "Complex IUPAC names",
"output": "SMILES for SAR analysis"
}
Workflow:
Output Example:
Patent Compound Extraction:
Patent: US10,XXX,XXX
Compounds extracted: 25
Successfully converted: 22 (88%)
Novel compounds identified: 3
- Compound A: New scaffold
- Compound B: Known scaffold, new substitution
- Compound C: Prodrug of known compound
SAR Table Generated: 22 compounds × 5 properties
Scenario: Standardizing existing chemical inventory with mixed naming.
{
"task": "inventory_cleanup",
"current_state": "Mixed naming conventions",
"compounds": "500 chemicals",
"issues": [
"Inconsistent naming",
"Missing SMILES",
"Duplicate entries"
]
}
Workflow:
Output Example:
Inventory Cleanup Results:
Original entries: 500
Unique compounds: 487 (13 duplicates removed)
Standardization:
- Common names standardized: 487
- SMILES added: 423
- IUPAC names added: 487
- MW calculated: 487
Data Quality Improvement:
Completeness: 65% → 100%
Consistency: 40% → 98%
Pre-Conversion:
During Conversion:
Post-Conversion:
Database Import:
Input Data Issues:
❌ Ambiguous names → Multiple compounds match name
❌ Mixtures and salts → Complex structures unclear
❌ Stereochemistry omitted → Racemic vs pure unclear
❌ Hydrates vs anhydrous → Different molecular weights
Conversion Errors:
❌ Invalid SMILES → Unbalanced parentheses or brackets
❌ Loss of stereochemistry → Chiral centers become racemic
❌ Tautomeric ambiguity → Keto/enol forms differ
❌ Aromaticity errors → Kekulé vs aromatic forms
Database Issues:
❌ Duplicate entries → Same compound multiple times
❌ Character encoding → Special characters corrupted
❌ Missing fields → Required data not populated
❌ Inconsistent formatting → Mixed naming conventions
Problem: Compound not found in database
Problem: SMILES validation fails
Problem: Stereochemistry lost in conversion
Problem: Multiple SMILES for same compound
Problem: Molecular weight mismatch
Available in references/ directory:
External Resources:
Located in scripts/ directory:
main.py - Chemical structure conversion and validation engineSMILES Notation:
C = aliphatic carbonc = aromatic carbon= = double bond# = triple bond() = branching[] = explicit valence/charge@ = anticlockwise (S)@@ = clockwise (R)IUPAC Naming:
Molecular Formula (Hill System):
| Parameter | Type | Default | Required | Description |
|---|---|---|---|---|
--name, -n | string | - | No | Compound name |
--smiles, -s | string | - | No | SMILES string |
--iupac, -i | string | - | No | IUPAC name |
--validate | flag | - | No | Validate SMILES syntax |
--list, -l | flag | - | No | List available compounds |
# Convert by compound name
python scripts/main.py --name aspirin
# Convert SMILES to IUPAC
python scripts/main.py --smiles "CC(=O)Oc1ccccc1C(=O)O"
# Validate SMILES
python scripts/main.py --smiles "CCO" --validate
# List all compounds
python scripts/main.py --list
| Risk Indicator | Assessment | Level |
|---|---|---|
| Code Execution | Python script executed locally | Low |
| Network Access | No external API calls | Low |
| File System Access | No file access | Low |
| Data Exposure | No sensitive data | Low |
# Python 3.7+
# No additional packages required (uses standard library)
Last Updated: 2026-02-09
Skill ID: 185
Version: 2.0 (K-Dense Standard)