Calculators & DFT Labeling¶
A calculator turns a structure into properties (energy, forces, …). TrainCraft uses calculators in two distinct roles:
| Role | TOML | Purpose | Typical choice |
|---|---|---|---|
| Exploration | [calculator] |
The cheap force engine that drives sampling (MD/MC) | mace, emt, tblite |
| Labeling | [labeling.calculator] |
The expensive engine that labels the selected frames | fhi_aims, qe |
This separation is the whole point of the spine: explore cheaply, run the selection funnel, then spend the expensive engine only on the survivors.
TrainCraft is code-agnostic. Any calculator that produces energies and forces can label frames — FHI-aims, Quantum ESPRESSO, VASP, or any other QM code can be plugged in as a single registered factory (an ASE-style calculator + a config model + a registry entry; see Write a Custom Builder for the registry pattern — calculators work the same way). Two DFT engines ship today:
fhi_aims— the reference engine. Its academic license (from MS1P e.V.) is free for academic groups with a voluntary donation; you obtain the source by registration (so its container is built from your own copy — see Run on HPC).qe(Quantum ESPRESSO) — fully open source, no license; runs the whole workflow with open tooling only.
Both can compute polarizability via DFPT (see below).
Available calculators¶
type |
Kind | Properties | Extra deps |
|---|---|---|---|
emt |
ASE force field | energy, forces (stress for bulk) | none |
tblite |
semiempirical GFN-xTB | energy, forces, stress | tblite |
xtb |
semiempirical GFN-xTB | energy, forces | xtb |
mace |
MLIP (foundation / fine-tuned) | energy, forces, stress | mace-torch, torch |
fhi_aims |
DFT (FHI-aims) | energy, forces, stress, dipole, polarizability | FHI-aims binary |
qe |
DFT (Quantum ESPRESSO) | energy, forces, stress, dipole | QE binaries |
mace¶
[calculator]
type = "mace"
model = "mace-mp0" # or "mace-off23"
device = "cuda" # "cpu" locally; "cuda" on the GPU image
# model_path = "my-finetuned.model" # a local checkpoint instead of a foundation model
DFT calculators¶
Both DFT calculators are ASE file-IO calculators. Their run command is never configured in the TOML — it is injected from the environment so the same config works locally and inside a container on HPC (see Run on HPC):
| Env var | Default | Used by |
|---|---|---|
TRAINCRAFT_AIMS_COMMAND |
aims.x |
fhi_aims |
TRAINCRAFT_PW_COMMAND |
pw.x |
qe |
TRAINCRAFT_AIMS_SPECIES_DIR / AIMS_SPECIES_DIR |
— | fhi_aims species defaults |
TRAINCRAFT_PW_PSEUDO_DIR / ESPRESSO_PSEUDO |
— | qe pseudopotentials |
On HPC the Slurm executor sets these for you, e.g.
TRAINCRAFT_PW_COMMAND="srun --mpi=pmix apptainer exec … traincraft-qe.sif pw.x" —
you never hard-code the command (the MPI plugin and container/native wrapper come
from [orchestration.slurm]; see Run on HPC).
fhi_aims (FHI-aims — the reference engine)¶
[labeling.calculator]
type = "fhi_aims"
xc = "pbe"
species_defaults = "tight" # basis level (light/intermediate/tight/…)
kpts = [4, 4, 4] # Monkhorst-Pack grid (periodic systems)
relativistic = "atomic_zora scalar"
properties = ["dipole", "polarizability"] # beyond E/F/stress
# extra = { ... } # any control.in keyword, passed through verbatim
Polarizability via DFPT (the IR/Raman driver) is computed in a single
aims.x run and selected automatically from the requested properties and the
system's periodicity: dfpt = dielectric for periodic cells, dfpt =
polarizability for molecules.
qe (Quantum ESPRESSO — open source)¶
[labeling.calculator]
type = "qe"
ecutwfc = 60.0
kpts = [4, 4, 4]
pseudopotentials = { Si = "Si.pbe-n-kjpaw_psl.1.0.0.UPF" }
properties = ["dipole"]
Fully open source (the source-built traincraft-qe image), so the whole workflow
can run with no licensed software. QE can compute polarizability/dielectric
response via DFPT (ph.x with epsil/fpol); because QE is a periodic code,
molecular polarizability is done in a vacuum supercell. That is a second binary
(ph.x) after the SCF, which TrainCraft's qe plugin does not wire yet — so
for polarizability today use fhi_aims, or extend build_qe with a ph.x step
(a tracked extension point, not a QE limitation).
The labeling stage¶
When [labeling] is present, the pipeline labels the selected frames after
the funnel:
This produces, under the run workspace:
labeled_dft/
labeled.extxyz # frames with E/F/stress(+dipole/pol) and origin=dft_labeled
manifest.json # level of theory, property set, frame count, wall time
frame_0000/ … # per-frame work dirs (so file-IO calcs don't collide)
Each labeled frame is tagged origin="dft_labeled" with its level_of_theory,
so the expensive data stays cleanly separable from the cheap, ML-generated points
(see Provenance). The dataset is built from these labeled frames.
Tip: to try the mechanics with zero deps, set
[labeling.calculator] type = "emt"— EMT stands in for DFT and produces E/F/stress.examples/18does this.