Skip to content

Tutorial 9 · ASE · pymatgen · RDKit

What you'll learn: how to convert TrainCraft structures to and from pymatgen and RDKit — enabling you to use materials databases, symmetry analysis, cheminformatics, and external tools alongside TrainCraft.

Prerequisites: Tutorial 1. Requires pixi install -e science.


The converter module

traincraft.core.converter provides four functions:

from traincraft.core.converter import (
    ase_to_pymatgen,   # ase.Atoms → pymatgen Structure or Molecule
    pymatgen_to_ase,   # pymatgen Structure/Molecule → ase.Atoms
    ase_to_rdkit,      # ase.Atoms → rdkit.Chem.Mol (non-periodic only)
    rdkit_to_ase,      # rdkit.Chem.Mol → ase.Atoms
)

These are also available as methods on the Structure class:

from traincraft import Structure

s = Structure.from_ase(atoms)
pmg  = s.to_pymatgen()    # → pymatgen Structure or Molecule
rdmol = s.to_rdkit()      # → rdkit.Chem.Mol (non-periodic only)

s2 = Structure.from_pymatgen(pmg)
s3 = Structure.from_rdkit(rdmol)

ASE ↔ pymatgen

Periodic structure (crystal/slab)

from ase.build import bulk
from traincraft.core.converter import ase_to_pymatgen, pymatgen_to_ase

cu = bulk("Cu", "fcc", a=3.61, cubic=True)
pmg = ase_to_pymatgen(cu)   # → pymatgen.core.Structure

print(type(pmg).__name__)    # "Structure"
print(pmg.composition)       # Comp: Cu4
print(pmg.lattice)           # Lattice(a=3.61, b=3.61, c=3.61, ...)

# Round-trip
cu_back = pymatgen_to_ase(pmg)

Non-periodic molecule

from ase.build import molecule
from traincraft.core.converter import ase_to_pymatgen

h2o = molecule("H2O")
mol = ase_to_pymatgen(h2o)   # → pymatgen.core.Molecule (not Structure)
print(type(mol).__name__)     # "Molecule"
print(mol.formula)            # "H2 O1"

Partially periodic (slab)

A slab periodic in xy but not z is treated as non-periodic by pymatgen (it becomes a Molecule). pymatgen has no direct analogue for partial periodicity. If you need a pymatgen Structure for a slab, set pbc=True first (using set_pbc transform), perform your analysis, then reset.


Use case: querying the Materials Project

pymatgen provides a client for the Materials Project API. Once you fetch a structure from MP, bring it into TrainCraft with Structure.from_pymatgen:

from mp_api.client import MPRester
from traincraft import Structure
from traincraft.core.provenance import Provenance

with MPRester("YOUR_API_KEY") as mpr:
    docs = mpr.materials.summary.search(
        material_ids=["mp-30"],   # Cu (mp-30)
        fields=["structure"],
    )
    pmg_struct = docs[0].structure

tc_struct = Structure.from_pymatgen(
    pmg_struct,
    provenance=Provenance(
        origin="generated",
        source="mp:mp-30",
        extra={"mp_id": "mp-30"},
    ),
)
print(tc_struct.atoms.get_chemical_symbols()[:4])

ASE ↔ RDKit

Convert to RDKit Mol

RDKit conversion is only supported for non-periodic structures. The converter writes the atoms to an in-memory XYZ block, parses it with Chem.MolFromXYZBlock, and then uses DetermineBonds (the xyz2mol algorithm) to perceive bonding.

from ase.build import molecule
from traincraft.core.converter import ase_to_rdkit

h2o = molecule("H2O")
mol = ase_to_rdkit(h2o)     # charge=0 by default

print(mol.GetNumAtoms())    # 3
print(mol.GetNumBonds())    # 2

# RDKit SMILES
from rdkit.Chem import MolToSmiles
print(MolToSmiles(mol))     # "O"

If your molecule has a non-zero charge:

mol = ase_to_rdkit(cation_atoms, charge=+1)

Convert from RDKit Mol

If you have an RDKit Mol with embedded 3D coordinates, convert it back:

from rdkit.Chem import MolFromSmiles, AddHs
from rdkit.Chem.AllChem import EmbedMolecule, MMFFOptimizeMolecule, ETKDGv3
from traincraft.core.converter import rdkit_to_ase

mol = AddHs(MolFromSmiles("CCO"))     # ethanol
params = ETKDGv3()
params.randomSeed = 42
EmbedMolecule(mol, params)
MMFFOptimizeMolecule(mol)

atoms = rdkit_to_ase(mol)             # first conformer by default
# atoms = rdkit_to_ase(mol, conf_id=1)  # specific conformer

Use case: SMILES → TrainCraft → MACE

A complete pipeline: start from a SMILES string, generate conformers with RDKit, build a TrainCraft structure, and run MACE-OFF23:

from rdkit import Chem
from rdkit.Chem import AllChem, AddHs
from traincraft import Structure
from traincraft.core.provenance import Provenance
from traincraft.core.converter import rdkit_to_ase
from traincraft.config.models import MaceCalc

# Generate 5 conformers
smiles = "c1ccccc1"   # benzene
mol = AddHs(Chem.MolFromSmiles(smiles))
params = AllChem.ETKDGv3()
AllChem.EmbedMultipleConfs(mol, numConfs=5, params=params)
AllChem.MMFFOptimizeMoleculeConfs(mol)

structures = [
    Structure.from_rdkit(mol, conf_id=i,
                         provenance=Provenance(origin="generated",
                                               source=f"smiles:{smiles}:conf{i}"))
    for i in range(mol.GetNumConformers())
]
print(f"Generated {len(structures)} conformers")

Use case: symmetry analysis with pymatgen

from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
from traincraft.geometry import build_geometry
from traincraft.config.models import CrystalBuilder, GeometryConfig

s = build_geometry(GeometryConfig(builder=CrystalBuilder(
    name="Si", crystalstructure="diamond", a=5.43, cubic=True
)))

pmg = s.to_pymatgen()
sga = SpacegroupAnalyzer(pmg)
print(sga.get_space_group_symbol())       # "Fd-3m"
print(sga.get_space_group_number())       # 227
primitive = sga.get_primitive_standard_structure()
print(len(primitive))                     # 2 atoms

Summary

Conversion Function Notes
ASE → pymatgen ase_to_pymatgen Periodic → Structure; molecular → Molecule
pymatgen → ASE pymatgen_to_ase Works for both Structure and Molecule
ASE → RDKit ase_to_rdkit Non-periodic only; bonds perceived by xyz2mol
RDKit → ASE rdkit_to_ase Uses conf_id to select conformer

Next: Tutorial 10 — train a MACE model on the dataset you've built. Or check out the Concepts section for a deeper understanding of how TrainCraft works under the hood, or the Config Schema for the complete field reference.