Generate inorganic crystal structure candidates for computational materials discovery workflows. Use this skill whenever the user wants to build, explore, or diversify a pool of inorganic structures for DFT screening, high-throughput calculations, machine learning dataset construction, or property-guided search. This skill covers the COMPLETE candidate generation pipeline from elements to structures - composition discovery (elements-only entry) -> seed structure creation -> chemical space exploration -> configurational ordering -> defect generation -> ensemble augmentation.
This skill guides the systematic generation of inorganic crystal structure candidates using a suite of tools for composition discovery and structure generation. The methodology is: discover compositions → prototype → explore chemistry → add disorder → resolve disorder → add defects → augment, selecting the appropriate branch(es) for the discovery goal.
The core philosophy: candidate generation is a funnel. Start broad (many compositions,
many chemistries, many configurations), then narrow using physical filters (charge neutrality,
Ewald energy, thermodynamic stability from MP). Always track structures in the ASE database
using ase_store_result so nothing is recomputed.
Entry Points:
composition_enumeratorGenerates all charge-balanced compositions from element lists and oxidation state constraints. Use when you know which elements to explore but not which compositions exist.
Key parameters:
elements: list of element symbols, e.g. ['Li', 'Mn', 'P', 'O']oxidation_states: dict mapping elements to allowed oxidation states, e.g. {'Li': [1], 'Mn': [2, 3], 'P': [5], 'O': [-2]}max_formula_units: cap on formula unit count (default 6, increase for complex compositions)max_atoms_per_formula: hard limit on total atoms (default 30, prevents combinatorial explosion)anion_cation_ratio_max: maximum anion:cation ratio (default 4.0, excludes Li₁Mn₁₀P₁₀O₅₀-type nonsense)min_cation_fraction: minimum cation fraction (default 0.05, excludes Li₀.₀₁O₀.₉₉-type nonsense)require_all_elements: if True, only returns compositions containing ALL specified elements (default True)allow_mixed_valence: if True, allows mixed oxidation states (e.g., Mn²⁺/Mn³⁺ in mixed-valence manganates) (default True)sort_by: 'atoms' (fewest atoms first), 'anion_ratio' (lowest O/cation ratio), 'alphabetical'output_format: 'minimal' (formula strings) or 'detailed' (full metadata)Returns:
{
"success": True,
"count": 12,
"compositions": [
"Li3PO4",
"LiMnPO4", # Target composition! (olivine battery cathode)
"LiFePO4", # If Fe added to oxidation_states
...
],
# OR with output_format='detailed':
"compositions": [
{
"formula": "LiMnPO4",
"reduced_formula": "LiMnPO4",
"num_atoms": 7,
"cation_count": 3,
"anion_count": 4,
"anion_cation_ratio": 1.33,
"oxidation_states": {"Li": 1, "Mn": 2, "P": 5, "O": -2},
"charge": 0
},
...
]
}
Chemical filters explained:
max_atoms_per_formula=30: Prevents unrealistically large formulas (e.g., Li₁₀Mn₁₀P₁₀O₄₀ with 70 atoms)anion_cation_ratio_max=4.0: Prevents anion-heavy compositions (e.g., LiMn₁₀P₁₀O₅₀ with ratio ≈ 2.4)min_cation_fraction=0.05: Prevents trace cation compositions (e.g., Li₀.₀₁Mn₀.₉₉O unphysical)When to adjust defaults:
max_formula_units for complex phases (spinels, Ruddlesden-Popper)anion_cation_ratio_max for metal-rich compoundsrequire_all_elements=False to include binaries (e.g., Li₂O, Mn₃O₄) alongside ternaries/quaternariesWorkflow:
composition_enumerator with target elementsstability_analyzer (eliminate compositions far above convex hull)pymatgen_substitution_predictor — ICSD-Based SubstitutionPredicts likely element substitutions using data mining from 100k+ ICSD structures. Use when you have a known composition and want to find chemically reasonable analogues.
Key parameters:
composition: starting composition, e.g. 'LiFePO4'to_this_composition: if False (default), finds what this composition can become; if True, finds what can transform INTO this compositionthreshold: probability cutoff (0.001 = permissive, 0.1 = strict)max_suggestions: limit number of suggestions (default None = unlimited)group_by_probability: if True, returns {high: [...], medium: [...], low: [...]}Returns:
{
"success": True,
"original_composition": "LiFePO4",
"direction": "from_this_composition",
"suggestions": {
"high": [{"formula": "LiMnPO4", "probability": 0.85}, ...],
"medium": [{"formula": "LiCoPO4", "probability": 0.45}, ...],
"low": [{"formula": "LiNiPO4", "probability": 0.02}, ...]
}
}
Use case: Template-based discovery
# Find what LiFePO₄ can transform into
result = pymatgen_substitution_predictor('LiFePO4', threshold=0.01)
# Extract high-confidence suggestions
target_formulas = [s['formula'] for s in result['suggestions']['high']]
# For each, check MP for structures
for formula in target_formulas:
mp_result = mp_search_materials(formula=formula)
if mp_result['count'] > 0:
# Structure exists in MP, can use directly
Limitation: ICSD substitution patterns are conservative (based on existing materials).
For truly novel compositions (e.g., where no close analogues exist in the database),
use composition_enumerator instead.
mp_search_materials — Template Structure SearchQueries Materials Project for structures matching composition/chemistry constraints. Use to find structural templates for target element systems.
Key parameters for template search:
elements: list of elements, e.g. ['Li', 'Fe', 'P', 'O'] (finds all Li-Fe-P-O compounds)num_elements: constrain to binary (2), ternary (3), quaternary (4), etc.crystal_system: 'cubic', 'tetragonal', 'orthorhombic', etc. (omit for all)spacegroup_number: specific space group (e.g., 225 for fluorite)is_stable: True (only thermodynamically stable), False (include metastable)limit: max results (default 100)Template discovery workflow:
# Step 1: Find analogues with similar chemistry (Fe instead of Mn)
li_fe_p_o = mp_search_materials(
elements=['Li', 'Fe', 'P', 'O'],
num_elements=4,
is_stable=True,
limit=50
)
# Step 2: Extract stoichiometric patterns
patterns = set()
for mat in li_fe_p_o['materials']:
# Identify pattern: LiMPO₄, Li₃M₂(PO₄)₃, etc.
patterns.add(mat['composition_reduced'])
# Step 3: Use patterns to constrain composition_enumerator
if 'NaMnPO4' in patterns:
# Found olivine pattern (AMPO₄)! Prioritize compositions around 7 atoms
result = composition_enumerator(
elements=['Li', 'Mn', 'P', 'O'],
oxidation_states={'Li': [1], 'Mn': [2], 'P': [5], 'O': [-2]},
max_formula_units=8 # Allows LiMnPO₄ (7 atoms)
)
When no direct templates exist:
If mp_search_materials(['Li', 'Mn', 'P', 'O']) returns 0 results, try:
['Na', 'Mn', 'P', 'O'] or ['Li', 'Fe', 'P', 'O']['alkali', 'TM', 'P', 'O'] where TM = transition metalcomposition_enumerator (exhaustive enumeration)pymatgen_prototype_builder — Seed StructureBuilds an ideal crystal from a spacegroup number/symbol, species list, and lattice parameters. This is the entry point for any workflow that starts from scratch rather than an existing structure.
Key parameters:
spacegroup: int (1–230) or Hermann-Mauguin symbol, e.g. 225 or "Fm-3m"species: list of element symbols (['La', 'Mn', 'O', 'O', 'O']) or Wyckoff dictlattice_parameters: [a, b, c, alpha, beta, gamma] in Å and degrees; [a] works for cubicwyckoff_positions: optional dict mapping Wyckoff labels to species/coordsoutput_format: 'dict' (default, pass to other tools), 'poscar', 'cif', 'ase'Returns: structures[i].structure — pass directly to substitution, enumeration, or defect tools.
wyckoff_positions proximity gotcha: Passing a Wyckoff dict (e.g. {'1a': 'Ba', '1b': 'Ti', '3c': 'O'}) can raise
"sites less than 0.01 Å apart" for multi-species prototypes where pymatgen auto-generates
overlapping fractional coords. Preferred approach: supply explicit species and coords lists
instead, and use validate_proximity=False when debugging a new prototype before finalising
lattice parameters.
pymatgen_substitution_generator — Ordered Enumeration of Site ReplacementsReplaces elements in existing structures by FULLY replacing specific sites, generating ORDERED structures with integer occupancy. Best for isostructural analogue screening across a fixed lattice topology.
CRITICAL: This tool creates ORDERED structures, NOT fractional occupancy.
fraction=0.15 means "fully replace 15% of sites" (1 site out of 6 → integer occupancy){'Ni': {'replace_with': 'Mn', 'fraction': 0.2}} with 5 Ni sites generates
5 structures, each with 1 different Ni site fully replaced by Mnpymatgen_disorder_generatorKey parameters:
substitutions: {'Li': 'Na'} (full swap), {'Li': ['Na', 'K']} (one variant per replacement),
{'Li': {'replace_with': 'Na', 'fraction': 0.5}} (FULLY replace 50% of Li sites, not partial occupancy)n_structures: variants to generate per substitution combination (default 5).
For deterministic full swaps (fraction=1.0) set this to 1 — higher values only
produce identical duplicates. For partial site replacement (fraction < 1.0), generates
different random orderings. Total output = n_structures × num_combinations, capped by max_attempts.max_attempts: hard cap on total output count (default 50). If you supply N
substitution options with n_structures=k, set max_attempts ≥ N × k or outputs
will be silently truncated. Example: 8 B-site metals with n_structures=1 needs
max_attempts=8 (or higher); with n_structures=3 needs max_attempts=24.enforce_charge_neutrality: set True for ionic materialssite_selector: 'all', 'wyckoff_4a', 'coordination_6', etc.preserve_disorder: preserves existing disorder in inputs (does NOT create fractional occupancy)When to use over ion_exchange_generator: when you want exploratory doping without
strict stoichiometry adjustment and charge neutrality is handled manually or checked post-hoc.
When to use over disorder_generator: when you need ordered enumerated configurations
for exhaustive DFT calculations or supercell enumeration, NOT for creating statistical disorder.
pymatgen_ion_exchange_generator — Charge-Neutral SubstitutionReplaces a mobile ion (e.g. Li⁺) with one or more ions, automatically adjusting stoichiometry so that total ionic charge is conserved. Only charge-neutral structures are returned by default.
Key parameters:
replace_ion: element to replace, e.g. 'Li'with_ions: ['Na', 'K'] (equal weight) or {'Na': 0.6, 'Mg': 0.4} (weighted split)exchange_fraction: fraction of sites to exchange (0–1), default 1.0allow_oxidation_state_change: False (default) = only neutral structures returnedmax_structures: cap on returned structures per input (default 10)Prototypical use cases: Li → Na/K battery cathode analogues, Ca²⁺ → La³⁺ doping in oxides.
pymatgen_disorder_generator — Add Configurational Disorder (Order → Disorder)***** REQUIRED TOOL FOR FRACTIONAL SITE OCCUPANCY *****
Converts fully ordered structures into disordered structures with FRACTIONAL site occupancies. This is the ONLY tool for creating partial substitution materials like Li[Ni₀.₈Mn₀.₂]O₂ where ALL sites of an element have mixed occupancy (e.g., every Ni site becomes 80% Ni + 20% Mn).
CRITICAL: This tool creates FRACTIONAL OCCUPANCY, not ordered enumeration.
{'Ni': {'Ni': 0.8, 'Mn': 0.2}} creates Li₃[Ni₂.₄Mn₀.₆]O₆ (statistical disorder)pymatgen_substitution_generator insteadThis is the inverse of enumeration: creates the disordered input structures needed for SQS generation or systematic enumeration.
Key parameters:
site_substitutions: dict mapping elements to their disordered (fractional) occupancies
{element: {species1: fraction1, species2: fraction2, ...}}{'Ni': {'Ni': 0.8, 'Mn': 0.2}} — Li[Ni₀.₈Mn₀.₂]O₂{'Co': {'Ni': 0.333, 'Mn': 0.333, 'Co': 0.334}} — NMCcomposition_tolerance)site_selector: strategy for which sites receive disorder (default: 'all_equivalent')
'all_equivalent': apply to all symmetry-equivalent sites (recommended)'wyckoff_4a': specific Wyckoff position'first_site': only first occurrence (breaks symmetry, use with caution)validate_charge_neutrality: True (default) — warns if disorder creates charge imbalancecomposition_tolerance: tolerance for fraction sums (default 0.01 = 1%)Typical workflow:
# Step 1: Get ordered structure from Materials Project
mp_result = mp_get_material_properties(
material_ids=["mp-1097088"], # LiNiO₂
properties=["structure"]
)
# Step 2: Add disorder for partial substitution Li[Ni₀.₈Mn₀.₂]O₂
disordered = pymatgen_disorder_generator(
input_structures=mp_result["properties"][0]["structure"],
site_substitutions={"Ni": {"Ni": 0.8, "Mn": 0.2}} # 80% Ni, 20% Mn on TM sites
)
# Step 3: Generate SQS for DFT calculations
sqs = pymatgen_sqs_generator(
input_structures=disordered["structures"],
supercell_size=16,
n_structures=3
)
Use cases:
When to use: When you have an ordered parent structure (from MP or prototype) and need to model a specific composition with fractional occupancies. Output is deterministic (one disordered structure per input) and ready for SQS or enumeration.
disorder_generator vs substitution_generatorThe Critical Distinction:
| Aspect | pymatgen_disorder_generator | pymatgen_substitution_generator |
|---|---|---|
| Output type | Fractional occupancy (statistical disorder) | Integer occupancy (ordered enumeration) |
| Site occupancy | Multiple species on same site with fractions summing to 1 | Single species per site (occu=1) |
| Example | Site has 80% Ni + 20% Mn | Site 1 has 100% Mn, Sites 2-5 have 100% Ni |
| Formula | Li₃[Ni₂.₄Mn₀.₆]O₆ (fractional) | LiNi₄MnO₁₀ (integer, ordered) |
| Output count | 1 disordered structure per input | Multiple ordered configurations per input |
| Use for | SQS generation, VCA calculations, statistical models | Supercell enumeration, exhaustive DFT, specific orderings |
Decision Tree:
Do you need partial substitution like Li[Ni₀.₈Mn₀.₂]O₂?
├─ YES: Do you want fractional occupancy (every site has 80%Ni+20%Mn)?
│ ├─ YES → Use `pymatgen_disorder_generator`
│ │ site_substitutions={'Ni': {'Ni': 0.8, 'Mn': 0.2}}
│ │ → Output: 1 structure with fractional occupancy
│ │ → Then: pymatgen_sqs_generator for quasirandom supercells
│ │
│ └─ NO: Want ordered enumeration (1 specific Ni replaced per structure)?
│ └─ YES → Use `pymatgen_substitution_generator`
│ substitutions={'Ni': {'replace_with': 'Mn', 'fraction': 0.2}}
│ → Output: 5 structures, each with different Ni site replaced
│ → Then: Run DFT on each ordered configuration
│
└─ NO: Complete substitution (all Li → Na)?
└─ Use `pymatgen_substitution_generator`
substitutions={'Li': 'Na'}
→ Output: 1 structure with all Li replaced by Na
Common Pitfalls:
substitution_generator with fraction=0.15 expecting fractional occupancy
disorder_generator when you want to explore specific orderings
preserve_disorder=True in substitution_generator expecting it to create disorder
Research examples:
disorder_generator → sqs_generatordisorder_generatorsubstitution_generator with ['Ca','Sr','Ba']pymatgen_enumeration_generator — Exhaustive Ordering of Disordered StructuresTakes structures with fractional site occupancies and returns all symmetry-inequivalent ordered supercell approximants, ranked by Ewald energy or cell size.
Key parameters:
supercell_size: supercell multiplier (1–4, default 2); creates a supercell of size
[supercell_size, supercell_size, 1] to accommodate fractional occupancies. Keep ≤ 2
for ternaries to avoid combinatorial explosionn_structures: max ordered structures returned per input (default 20, max 500)sort_by: 'ewald' (default, lowest energy first), 'num_sites', 'random'add_oxidation_states: auto-assign oxidation states for Ewald ranking (default True)refine_structure: re-symmetrize before enumeration (recommended, default True)When to use over sqs_generator: when you need the complete ordered-configuration pool,
want to identify the ground-state ordering, or are building a cluster expansion training set.
pymatgen_sqs_generator — Special Quasirandom StructuresFinds a small ordered supercell whose Warren-Cowley pair correlations best mimic a perfectly random alloy. Returns the single best quasirandom approximant per input, not the full ordered-configuration space.
Key parameters:
supercell_size: target formula units in SQS cell (default 8; use 8–16 for binary, 12–24 for ternary)supercell_matrix: explicit [nx, ny, nz] or 3×3 matrix (overrides supercell_size)n_structures: independent SQS candidates per input (default 3); ranked by sqs_errorn_mc_steps: Monte Carlo steps per candidate (default 50 000; increase for multicomponent)n_shells: correlation shells in objective function (default 4)seed: set for reproducibilityuse_mcsqs: use ATAT mcsqs binary if available (better quality for large systems)When to use over enumeration_generator: target system is a solid solution / high-entropy
material where disorder is the physical state being modelled, not a defect to be minimised.
pymatgen_defect_generator — Point Defect SupercellsTakes a perfect bulk host structure and generates one supercell per symmetry-inequivalent defect site. Supports vacancies, substitutional dopants, and interstitials.
Key parameters:
vacancy_species: ['Li', 'O'] — generate V_Li, V_O defectssubstitution_species: {'Fe': ['Mn', 'Co']} — Mn_Fe and Co_Fe substitutionalsinterstitial_species: ['Li'] — find void sites and insert Licharge_states: {'V_Li': [-1, 0, 1]} — metadata only; structures are always neutral geometrysupercell_min_atoms: target atoms in defect supercell (default 64; 64–128 for plane-wave DFT)inequivalent_only: True (default) — generate only symmetry-distinct defectsDownstream: feed outputs to pymatgen_perturbation_generator to rattle defect geometries,
or save directly to the ASE database via ase_store_result.
pymatgen_perturbation_generator — Structural Ensemble / AugmentationApplies random atomic displacements ("rattling") and/or lattice strain to create ensembles of perturbed structures. Does not change composition.
Key parameters:
displacement_max: max displacement per atom in Å (default 0.1; typical range 0.05–0.2)strain_percent: None (off), scalar (uniform), [min, max] (random range), or
6-element Voigt tensor [e_xx, e_yy, e_zz, e_xy, e_xz, e_yz]n_structures: perturbed copies per input (default 10, max 200)seed: for reproducibilityPrimary uses:
When to use this phase:
Skip this phase if:
Use composition_enumerator to generate ALL charge-balanced compositions:
# Example: Discover Li-Mn-P-O battery cathode compositions
result = composition_enumerator(
elements=['Li', 'Mn', 'P', 'O'],
oxidation_states={
'Li': [1], # Li⁺
'Mn': [2, 3], # Mn²⁺, Mn³⁺
'P': [5], # P⁵⁺ (phosphate)
'O': [-2] # O²⁻
},
max_formula_units=6,
max_atoms_per_formula=30,
require_all_elements=True, # Only quaternary Li-Mn-P-O, not ternaries
sort_by='atoms', # Simplest compositions first
output_format='detailed'
)
# Result: ~12 compositions including LiMnPO₄, Li₃Mn(PO₄)₂, Mn₃(PO₄)₂, etc.
compositions = result['compositions']
Next step: Filter by stability
stable_compositions = []
for comp in compositions:
stability = stability_analyzer(composition=comp['formula'])
if stability['is_stable'] or stability['energy_above_hull'] < 0.1:
stable_compositions.append(comp['formula'])
# Feed to Phase 1 or query MP for existing structures
Find Materials Project structures with similar chemistry, extract patterns:
# Step 1: Search for analogues (Na or Fe instead of Li/Mn)
na_templates = mp_search_materials(
elements=['Na', 'Mn', 'P', 'O'],
num_elements=4,
is_stable=True
)
if na_templates['count'] == 0:
# Try Fe instead of Mn (well-known LiFePO4)
fe_templates = mp_search_materials(
elements=['Li', 'Fe', 'P', 'O'],
num_elements=4,
is_stable=True
)
# Step 2: Extract stoichiometric patterns
patterns = {}
for mat in fe_templates['materials']:
formula = mat['composition_reduced']
patterns[formula] = mat['spacegroup_number']
print(f"Found patterns: {patterns}")
# Example output: {'LiFePO4': 62, 'Li3PO4': 61, ...}
# Step 3: Use patterns to guide composition_enumerator
if 'LiFePO4' in patterns:
# Olivine pattern exists (AMPO₄) → prioritize LiMnPO₄
target_formulas = ['LiMnPO4']
if 'Li3PO4' in patterns:
# Phosphate pattern exists → Li₃PO₄ likely!
target_formulas.append('Li3PO4')
# Proceed to Phase 1 with these target compositions
Find statistically likely substitutions from known materials:
# Starting from known La₂WO₆ structure
substitutions = pymatgen_substitution_predictor(
composition='La2WO6',
to_this_composition=False, # What can La₂WO₆ become?
threshold=0.01,
group_by_probability=True
)
# Extract high-confidence suggestions
high_prob = substitutions['suggestions']['high']
target_formulas = [s['formula'] for s in high_prob]
# Check which ones exist in MP
for formula in target_formulas:
mp_result = mp_search_materials(formula=formula)
if mp_result['count'] > 0:
print(f"{formula}: exists in MP (mp-id: {mp_result['materials'][0]['material_id']})")
else:
print(f"{formula}: novel composition candidate!")
Limitation: Substitution predictor is conservative (only suggests observed patterns). For truly novel compositions, use Strategy 1 (enumeration).
START: Have elements, need compositions
│
├─ Known analogue exists? (e.g., LiFePO₄ for Li-Mn-P-O)
│ ├─ YES → Strategy 3 (substitution_predictor) + Strategy 2 (MP templates)
│ └─ NO → Strategy 1 (composition_enumerator)
│
├─ Chemical system well-studied? (battery cathodes, perovskites)
│ ├─ YES → Strategy 2 (MP templates) first, then Strategy 1 if gaps
│ └─ NO → Strategy 1 (composition_enumerator)
│
└─ Exploratory discovery? (don't know what to expect)
└─ Strategy 1 (composition_enumerator) → filter by stability
Output of Phase 0: Ranked list of compositions Next phase: For each composition, go to Phase 1 (build structure) or Phase 2 (if MP structure exists)
Start here if no structure exists yet.
pymatgen_prototype_builder(
spacegroup=225, # Fm-3m (rock-salt)
species=['Li', 'O'],
lattice_parameters=[4.33] # cubic: [a]
)
If a known structure already exists (from mp_get_material_properties, a CIF file, or the
ASE database), skip this step and pass that structure directly.
Common prototypes:
| Prototype | SG # | Symbol | Example |
|---|---|---|---|
| Rock-salt | 225 | Fm-3m | NaCl, LiF, MgO |
| Perovskite | 221 | Pm-3m | BaTiO₃, SrTiO₃ |
| Spinel | 227 | Fd-3m | MgAl₂O₄, LiMn₂O₄ |
| Layered oxide (α-NaFeO₂) | 166 | R-3m | LiCoO₂, LiNiO₂ |
| Olivine | 62 | Pnma | LiFePO₄, LiMnPO₄ |
| Rutile | 136 | P4₂/mnm | TiO₂, SnO₂ |
| Wurtzite | 186 | P6₃mc | ZnO, GaN |
| Fluorite | 225 | Fm-3m | CaF₂, CeO₂ |
Choose the branch based on whether charge-neutrality must be enforced:
Branch A — Exploratory (charge balance not enforced):
pymatgen_substitution_generator(
input_structures=seed_structure,
substitutions={'Li': ['Na', 'K', 'Rb'], 'Fe': ['Mn', 'Co', 'Ni']},
n_structures=10,
enforce_charge_neutrality=False
)
Use when: screening isostructural analogues, building diverse training sets.
Branch B — Charge-neutral (ionic materials):
pymatgen_ion_exchange_generator(
input_structures=seed_structure,
replace_ion='Li',
with_ions={'Na': 0.5, 'Mg': 0.5},
exchange_fraction=1.0,
max_structures=20
)
Use when: battery cathode analogues, any case where the oxidation-state bookkeeping must be exact.
Both branches accept lists of input structures — pipe multiple seeds through in one call.
If Phase 2 produced or if you started from a disordered structure:
Ground-state search (small cells, complete enumeration):
pymatgen_enumeration_generator(
input_structures=disordered_structs,
supercell_size=2,
n_structures=50,
sort_by='ewald'
)
Solid-solution modelling (large / high-entropy systems):
pymatgen_sqs_generator(
input_structures=disordered_struct,
supercell_size=16,
n_structures=5,
n_mc_steps=200000,
seed=42
)
Decision rule:
supercell_size=2.Fork off from any ordered structure to study point defects:
pymatgen_defect_generator(
input_structure=ordered_structure,
vacancy_species=['Li'],
substitution_species={'Fe': ['Mn', 'Co']},
interstitial_species=['Li'],
charge_states={'V_Li': [-1, 0, 1]},
supercell_min_atoms=128
)
Important: Pass only a single, ordered, defect-free host structure. The tool generates one supercell per inequivalent defect site automatically — do not pre-expand the cell.
Apply to any structure from Phases 1–4 to:
pymatgen_perturbation_generator(
input_structures=ordered_or_defect_structures,
displacement_max=0.1,
strain_percent=[-2.0, 2.0],
n_structures=20,
seed=0
)
For defect geometries, use displacement_max=0.05–0.1 Å (subtle rattling). For
ML data augmentation, 0.1–0.2 Å with random strain is typical.
Always store generated structures so they can be queried later without regeneration:
ase_store_result(
db_path='candidates.db',
atoms_dict=structure['structure'], # MUST use output_format='ase' — see note below
key_value_pairs={
'generator': 'substitution',
'compound': structure['formula'], # NOT 'formula' — see reserved keys below
'campaign': 'cathode_screen_2026',
'source_structure': 'LiCoO2_mp-24850'
}
)
output_format must be 'ase' when feeding into ase_store_result:
ase_store_result requires ASE-native keys (numbers, positions, cell, pbc), which
are only produced when the upstream pymatgen tool is called with output_format='ase'.
Using the default output_format='dict' produces a pymatgen Structure.as_dict() object
(with @module, @class, sites, lattice, etc.) that will be rejected with:
"atoms_dict missing required keys: ['numbers']".
Always set output_format='ase' on any pymatgen tool whose result goes directly to ase_store_result.
ASE reserved key names — never use these in key_value_pairs:
ASE's db.write() will raise ValueError: Bad key for any of the following built-in column