A standardized CLI wrapper for RDKit 3D/2D conformer generation that samples multiple conformers per molecule (ETKDGv3, default 10), optimizes each with a force field (MMFF94s/UFF), keeps the lowest-energy conformer, automatically falls back to 2D layout on total embedding failure with a printed warning, and writes results to SDF or XYZ format. USE WHEN you need to generate 3D (or 2D fallback) molecular geometries from SMILES datasets (.csv/.smi) for downstream tasks such as docking, visualization, or 3D-descriptor computation.
This skill provides practical command patterns for RDKit 3D/2D conformer generation
using the standardized CLI wrapper: <skill_path>/scripts/rdkit_conf_helper.py.
Key behaviors (important for Agents):
--num-confs conformers (default 10) per molecule
via EmbedMultipleConfs, optimizes each with the chosen force field, and keeps the
lowest-energy one. Set --num-confs 1 to revert to single-conformer behavior.Compute2DCoords is used instead
and a [WARN] line is printed to stderr for that molecule.*.skipped.csv (no crash).*.fallback.csv.[INFO] Done: <N_3d> 3D, <N_2d> 2D-fallback, <N_skip> skipped (total input: <N>)[RESULT] conf_sdf=/abs/path.sdf[RESULT] conf_xyz=/abs/path.xyz[RESULT] fallback_csv=/abs/path.fallback.csv (only if any 2D fallbacks occurred)[RESULT] skipped_csv=/abs/path.skipped.csv (only if any SMILES were skipped)Check CLI help:
uv run <skill_path>/scripts/rdkit_conf_helper.py --help
uv run <skill_path>/scripts/rdkit_conf_helper.py conf --help
Disable environment printing (optional):
uv run <skill_path>/scripts/rdkit_conf_helper.py --no-env conf --smiles "CCO" --output out.sdf
Single SMILES:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--smiles "CCO" \
--output /tmp/CCO.sdf
Single SMILES with a custom molecule name:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--smiles "c1ccccc1" \
--name benzene \
--output /tmp/benzene.sdf
From CSV (default SMILES column: smiles):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--smiles-col smiles \
--output data.sdf
From CSV with a name column:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--smiles-col smiles \
--name-col compound_id \
--output data.sdf
From SMI (second token per line is used as name automatically):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file molecules.smi \
--output molecules.sdf
Default (10 conformers sampled, lowest-energy kept):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --output data.sdf
Single conformer (fastest, least thorough):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --num-confs 1 --output data.sdf
Increase sampling for flexible or macrocyclic molecules:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --num-confs 50 --output data.sdf
MMFF94s (default, falls back to UFF if unavailable):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff mmff94s --output data.mmff.sdf
UFF (universal force field):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff uff --output data.uff.sdf
Skip force-field optimization (raw ETKDG geometry only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --ff none --output data.etkdg_raw.sdf
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--format xyz \
--output data.xyz
Large or macrocyclic molecules sometimes fail standard ETKDG; try random initial coordinates:
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file macrocycles.csv \
--use-random-coords \
--max-attempts 500 \
--output macrocycles.sdf
Use a different random seed (reproducibility):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --seed 123 --output data.seed123.sdf
Non-deterministic embedding (seed = -1):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --seed -1 --output data.sdf
By default explicit H atoms are added before embedding for more accurate 3D geometry.
Use --no-hs to keep the molecule as-is (heavy atoms only):
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv --no-hs --output data.noh.sdf
uv run <skill_path>/scripts/rdkit_conf_helper.py conf \
--file data.csv \
--output data.sdf \
--error-log logs/skipped.csv \
--fallback-log logs/used_2d.csv
For each molecule, the script runs the following steps in order:
Chem.MolFromSmiles.Chem.AddHs) -- skipped with --no-hs.EmbedMultipleConfs, --num-confs candidates,
default 10): tries ETKDGv3, then ETKDGv2, then ETDG, then ETDG+useRandomCoords
as a fallback chain until at least one conformer is embedded.--ff is not none): each successfully embedded
conformer is individually optimized. MMFF94s transparently falls back to UFF if
parameters are unavailable for that molecule.--ff none, the first embedded conformer
is kept without energy ranking.Compute2DCoords (Z=0 for all atoms), prints a [WARN] to stderr, and records
the molecule in the fallback log.SDF output (--format sdf, default):
--name, --name-col, or auto-generated mol_<i>) is
written to the SDF header line.XYZ output (--format xyz):
--no-hs is used, hydrogen atoms are absent from the XYZ.Fallback log (*.fallback.csv):
idx, smiles, name, dim (always 2), ff (always 2d_fallback), note.Skipped log (*.skipped.csv):
idx, smiles, error.When using this skill for users:
.csv requires a SMILES column (default smiles).smi uses the first token per line as SMILES, second token (if present) as name--smiles "[C@@H](O)(F)Cl"--smiles-col for the SMILES column--name-col (optional) for molecule identifiers to embed in SDF/XYZ headers[INFO] Done: summary line for the 3D/2D/skip breakdown.*.fallback.csv:
--use-random-coords or --max-attempts tuning for the affected SMILES.[RESULT] ...=/abs/path in stdout.RDKIT_CONF_HELPER_TRACE=1 uv run <skill_path>/scripts/rdkit_conf_helper.py ...RDKit conformer generation can produce initial molecular geometries for
Packmol packing or as starting structures for GPUMD molecular simulations.
XYZ output can be converted to GPUMD model.xyz format via dpdata or ASE.