Analyze data with `adme-property-predictor` using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.
adme-property-predictor using a reproducible workflow, explicit validation, and structured outputs for review-ready interpretation.scripts/main.py.references/ for task-specific guidance.Python: 3.10+. Repository baseline for current packaged skills.dataclasses: unspecified. Declared in requirements.txt.rdkit: unspecified. Declared in requirements.txt.cd "20260318/scientific-skills/Data Analytics/adme-property-predictor"
python -m py_compile scripts/main.py
python scripts/main.py --help
Example run plan:
CONFIG block or documented parameters if the script uses fixed settings.python scripts/main.py with the validated inputs.See ## Workflow above for related details.
scripts/main.py.references/ contains supporting rules, prompts, or checklists.Use this command to verify that the packaged script entry point can be parsed before deeper execution.
python -m py_compile scripts/main.py
Use these concrete commands for validation. They are intentionally self-contained and avoid placeholder paths.
python -m py_compile scripts/main.py
# Example invocation: python scripts/main.py --help
# Example invocation: python scripts/main.py --input "Audit validation sample with explicit symptoms, history, assessment, and next-step plan." --format json
Comprehensive pharmacokinetic prediction tool that assesses drug-likeness and ADME properties of small molecules using validated cheminformatics models, molecular descriptors, and structure-property relationships.
Key Capabilities:
Upstream Skills:
chemical-structure-converter: Convert between SMILES, InChI, MOL formatslipinski-rule-filter: Initial rule-based drug-likeness screeningchemical-structure-converter: Generate 3D conformers for structure-based predictionssmiles-de-salter: Remove salt counterions before analysisDownstream Skills:
drug-candidate-evaluator: Multi-parameter optimization including ADMEtoxicity-structure-alert: Assess safety alongside ADMEtarget-novelty-scorer: Evaluate target uniqueness for selected candidatesbiotech-pitch-deck-narrative: Create investor materials with PK dataComplete Workflow:
Chemical Structure Converter (prepare structures) →
Lipinski Rule Filter (initial filtering) →
ADME Property Predictor (this skill, detailed PK) →
Drug Candidate Evaluator (integrated scoring) →
Toxicity Structure Alert (safety check)
Predict intestinal absorption, solubility, and permeability:
from scripts.adme_predictor import ADMEPredictor
predictor = ADMEPredictor()
# Predict absorption properties
absorption = predictor.predict_absorption(
smiles="CC(=O)Oc1ccccc1C(=O)O", # Aspirin
properties=["all"] # or specific: ["hia", "caco2", "solubility"]
)
print(absorption.summary())
Predicted Properties:
| Property | Model | Units | Interpretation |
|---|---|---|---|
| HIA | ML + physicochemical | % | Human intestinal absorption; >80% good |
| Caco-2 | QSPR | 10⁻⁶ cm/s | Permeability; >70 high, <25 low |
| Solubility | QSPR | mg/mL | Aqueous solubility; >0.1 mg/mL acceptable |
| LogS | QSPR | unitless | Intrinsic solubility; >-4 acceptable |
| Lipinski Pass | Rule-based | boolean | Passes all 5 rules |
| Veber Pass | Rule-based | boolean | PSA <140, rotatable bonds <10 |
Best Practices:
Common Issues and Solutions:
Issue: Lipinski pass but poor solubility
Issue: Caco-2 predicts high absorption but HIA low
Predict tissue distribution, protein binding, and brain penetration:
# Predict distribution properties
distribution = predictor.predict_distribution(
smiles="CC(=O)Oc1ccccc1C(=O)O",
properties=["vd", "ppb", "bbb"]
)
# Access specific predictions
vd = distribution.volume_of_distribution
bbb = distribution.blood_brain_barrier
ppb = distribution.plasma_protein_binding
Predicted Properties:
| Property | Model | Units | Interpretation |
|---|---|---|---|
| Vd | QSPR | L/kg | Volume of distribution; 0.1-10 typical |
| PPB | ML | % | Plasma protein binding; >90% high, <50% low |
| BBB | LogBB | unitless | Brain penetration; >0.3 penetrant |
| fu | Calculated | fraction | Free (unbound) fraction; 1 - PPB/100 |
Best Practices:
Common Issues and Solutions:
Issue: BBB predictions unreliable for certain chemotypes
Issue: PPB overestimated for acidic drugs
Predict metabolic stability, CYP interactions, and liability sites:
# Predict metabolism properties
metabolism = predictor.predict_metabolism(
smiles="CC(=O)Oc1ccccc1C(=O)O",
include_site_prediction=True
)
# Check CYP interactions
cyp_profile = metabolism.cyp_profile
stability = metabolism.metabolic_stability
Predicted Properties:
| Property | Model | Output | Interpretation |
|---|---|---|---|
| CYP Inhibition | ML | IC50 or class | Potential DDI; <1 μM high risk |
| CYP Substrate | Classification | Boolean/Probability | Metabolized by specific CYP |
| Stability | ML | T1/2 or class | Microsomal/ hepatocyte stability |
| Liability Sites | Reactivity models | Atom indices | Soft spots for metabolism |
| MAO Substrate | Classification | Boolean | Monoamine oxidase substrate |
Best Practices:
Common Issues and Solutions:
Issue: False negatives for time-dependent inhibition (TDI)
Issue: Metabolic site prediction shows multiple hotspots
Predict clearance routes and elimination kinetics:
# Predict excretion properties
excretion = predictor.predict_excretion(
smiles="CC(=O)Oc1ccccc1C(=O)O",
properties=["clearance", "half_life", "route"]
)
# Access predictions
clearance = excretion.clearance_ml_min_kg
t12 = excretion.half_life_hours
route = excretion.primary_route
Predicted Properties:
| Property | Model | Units | Interpretation |
|---|---|---|---|
| CL | QSPR | mL/min/kg | Clearance; <5 low, 5-15 moderate, >15 high |
| T1/2 | QSPR | hours | Half-life; 2-8h typical for oral drugs |
| Route | Classification | renal/biliary/mixed | Primary excretion pathway |
| LogD | QSPR | unitless | Distribution coefficient; affects clearance |
Best Practices:
Common Issues and Solutions:
Issue: Clearance predictions highly variable
Issue: Route prediction contradicts structure
Overall assessment combining all ADME properties:
# Generate comprehensive drug-likeness score
druglikeness = predictor.calculate_druglikeness(
smiles="CC(=O)Oc1ccccc1C(=O)O",
methods=["qed", "muegge", "golden_triangle"]
)
# Multi-parameter optimization
mpo_score = predictor.mpo_score(
smiles="CC(=O)Oc1ccccc1C(=O)O",
target_profile={"hia": >80, "bbb": <0.3, "t12": "2-8h"}
)
Scoring Methods:
| Method | Description | Range | Good Score |
|---|---|---|---|
| QED | Quantitative Estimation of Drug-likeness | 0-1 | >0.6 |
| Muegge | Bioavailability score | 0-6 | >4 |
| MPO | Multi-Parameter Optimization | 0-10 | >6 |
Best Practices:
Common Issues and Solutions:
Issue: Drug-likeness score conflicts with project needs
Analyze compound libraries efficiently:
# Batch process library
results = predictor.batch_predict(
input_file="library.smi", # SMILES file
properties=["all"],
output_format="csv",
n_workers=4 # Parallel processing
)
# Filter by criteria
filtered = results.filter(
lipinski_pass=True,
hia__gt=80,
t12__between=(2, 8)
)
# Rank by multi-parameter score
ranked = results.rank(by="mpo_score", ascending=False)
Best Practices:
Common Issues and Solutions:
Issue: Batch processing runs out of memory
Issue: Some compounds fail prediction
From SMILES to prioritized candidates:
# Step 1: Predict ADME for single compound
# Example invocation: python scripts/main.py \
--smiles "CC(=O)Oc1ccccc1C(=O)O" \
--properties all \
--output aspirin_adme.json
# Step 2: Batch process compound library
# Example invocation: python scripts/main.py \
--input library.smi \
--properties absorption,distribution \
--format csv \
--output library_adme.csv
# Step 3: Filter and rank
# Example invocation: python scripts/main.py \
--input library_adme.csv \
--filter "lipinski_pass=True,hia>80" \
--rank-by qed \
--top-n 100 \
--output top_candidates.csv
Python API Usage:
from scripts.adme_predictor import ADMEPredictor
from scripts.batch_processor import BatchProcessor
# Initialize
predictor = ADMEPredictor()
batch = BatchProcessor()
# Single compound analysis
aspirin = predictor.predict_all("CC(=O)Oc1ccccc1C(=O)O")
print(f"HIA: {aspirin.absorption.hia}%")
print(f"Half-life: {aspirin.excretion.t12} hours")
# Batch screening
results = batch.process(
input_file="library.smi",
predictor=predictor,
properties=["absorption", "distribution"],
n_workers=4
)
# Filter good candidates
good_candidates = results[
(results.lipinski_pass == True) &
(results.hia > 80) &
(results.bbb < 0.3) &
(results.t12.between(2, 8))
]
Expected Output Files:
output/
├── aspirin_adme.json # Single compound detailed results
├── library_adme.csv # Batch screening results
├── top_candidates.csv # Filtered and ranked candidates
Pre-Prediction Checks:
During Prediction:
Post-Prediction Verification:
Before Making Decisions:
For Regulatory Submissions:
Over-Reliance Issues:
❌ Treating predictions as experimental facts → Poor decision making
❌ Single model dependency → Miss model-specific biases
❌ Ignoring prediction confidence → False sense of certainty
Input Issues:
❌ Invalid or non-canonical SMILES → Wrong compound analyzed
❌ Analyzing salt forms → Properties skewed by counterion
smiles-de-salter; analyze free base/acid❌ Ignoring stereochemistry → Inaccurate predictions for chiral drugs
Interpretation Issues:
❌ Focusing on single property → Miss overall profile
❌ Rigid cutoff application → Discard good candidates
❌ Ignoring property correlations → Unrealistic optimization
Domain Issues:
❌ Applying to biologics → Completely inappropriate
❌ Extrapolating beyond training set → Unreliable predictions
Workflow Issues:
❌ No experimental validation → Continue with false leads
❌ Not documenting model versions → Irreproducible results
Problem: All predictions show "out of domain" warning
Problem: Extreme predictions (negative solubility, >100% absorption)
Problem: Batch processing extremely slow
Problem: Inconsistent predictions across runs
Problem: Properties contradict each other
Problem: Cannot process certain file formats
chemical-structure-converterAvailable in references/ directory:
lipinski_rules.md - Detailed explanation of Rule of 5 and variantsqsar_models.md - Technical documentation of predictive modelsadme_databases.md - Experimental ADME data sources for validationproperty_ranges.md - Acceptable ranges for marketed drugs by classmodel_validation.md - Validation statistics and applicability domainscheminformatics_basics.md - Introduction to molecular descriptorsLocated in scripts/ directory:
main.py - CLI interface for ADME predictionadme_predictor.py - Core prediction engineabsorption.py - Absorption property modelsdistribution.py - Distribution property modelsmetabolism.py - Metabolism prediction modelsexcretion.py - Excretion and clearance modelsdruglikeness.py - QED, MPO, and other scoring functionsbatch_processor.py - Library screening and parallel processingvalidator.py - Input validation and applicability domain checkingPrediction Speed:
| Task | Time | Hardware |
|---|---|---|
| Single compound | 0.5-2 sec | CPU |
| 100 compounds | 30-60 sec | CPU |
| 1000 compounds | 5-10 min | CPU |
| 1000 compounds | 2-3 min | 4-core parallel |
| 10,000 compounds | 30-60 min | 4-core parallel |
System Requirements:
Optimization Tips:
Model Accuracy (Typical):
⚠️ CRITICAL DISCLAIMER: These predictions are computational estimates for prioritization and guidance only. They do NOT replace experimental ADME studies required for regulatory submissions or clinical decision-making. Always validate predictions with appropriate in vitro and in vivo assays before advancing compounds.
| Parameter | Type | Default | Description |
|---|---|---|---|
--smiles | str | Required | SMILES string of the molecule |
--properties | str | ["all"] | Specific properties to calculate |
--format | str | "json" | Output format |
--input | str | Required | Input CSV file with SMILES column |
--output | str | Required | Output file for results |
Every final response should make these items explicit when they are relevant:
scripts/main.py fails, report the failure point, summarize what still can be completed safely, and provide a manual fallback.This skill accepts requests that match the documented purpose of adme-property-predictor and include enough context to complete the workflow safely.
Do not continue the workflow when the request is out of scope, missing a critical input, or would require unsupported assumptions. Instead respond:
adme-property-predictoronly handles its documented workflow. Please provide the missing required inputs or switch to a more suitable skill.
Use the following fixed structure for non-trivial requests:
If the request is simple, you may compress the structure, but still keep assumptions and limits explicit when they affect correctness.