Calculators & DFT Labeling¶

A calculator turns a structure into properties (energy, forces, …). TrainCraft uses calculators in two distinct roles:

Role	TOML	Purpose	Typical choice
Exploration	`[calculator]`	The cheap force engine that drives sampling (MD/MC)	`mace`, `emt`, `tblite`
Labeling	`[labeling.calculator]`	The expensive engine that labels the selected frames	`fhi_aims`, `qe`

This separation is the whole point of the spine: explore cheaply, run the selection funnel, then spend the expensive engine only on the survivors.

TrainCraft is code-agnostic. Any calculator that produces energies and forces can label frames — FHI-aims, Quantum ESPRESSO, VASP, or any other QM code can be plugged in as a single registered factory (an ASE-style calculator + a config model + a registry entry; see Write a Custom Builder for the registry pattern — calculators work the same way). Two DFT engines ship today:

fhi_aims — the reference engine. Its academic license (from MS1P e.V.) is free for academic groups with a voluntary donation; you obtain the source by registration (so its container is built from your own copy — see Run on HPC).
qe (Quantum ESPRESSO) — fully open source, no license; runs the whole workflow with open tooling only.

Both can compute polarizability via DFPT (see below).

Available calculators¶

`type`	Kind	Properties	Extra deps
`emt`	ASE force field	energy, forces (stress for bulk)	none
`tblite`	semiempirical GFN-xTB	energy, forces, stress	`tblite`
`xtb`	semiempirical GFN-xTB	energy, forces	`xtb`
`mace`	MLIP (foundation / fine-tuned)	energy, forces, stress	`mace-torch`, `torch`
`fhi_aims`	DFT (FHI-aims)	energy, forces, stress, dipole, polarizability	FHI-aims binary
`qe`	DFT (Quantum ESPRESSO)	energy, forces, stress, dipole	QE binaries

`mace`¶

[calculator]
type   = "mace"
model  = "mace-mp0"      # or "mace-off23"
device = "cuda"          # "cpu" locally; "cuda" on the GPU image
# model_path = "my-finetuned.model"   # a local checkpoint instead of a foundation model

DFT calculators¶

Both DFT calculators are ASE file-IO calculators. Their run command is never configured in the TOML — it is injected from the environment so the same config works locally and inside a container on HPC (see Run on HPC):

Env var	Default	Used by
`TRAINCRAFT_AIMS_COMMAND`	`aims.x`	`fhi_aims`
`TRAINCRAFT_PW_COMMAND`	`pw.x`	`qe`
`TRAINCRAFT_AIMS_SPECIES_DIR` / `AIMS_SPECIES_DIR`	—	`fhi_aims` species defaults
`TRAINCRAFT_PW_PSEUDO_DIR` / `ESPRESSO_PSEUDO`	—	`qe` pseudopotentials

On HPC the Slurm executor sets these for you, e.g. TRAINCRAFT_PW_COMMAND="srun --mpi=pmix apptainer exec … traincraft-qe.sif pw.x" — you never hard-code the command (the MPI plugin and container/native wrapper come from [orchestration.slurm]; see Run on HPC).

`fhi_aims` (FHI-aims — the reference engine)¶

[labeling.calculator]
type             = "fhi_aims"
xc               = "pbe"
species_defaults = "tight"          # basis level (light/intermediate/tight/…)
kpts             = [4, 4, 4]        # Monkhorst-Pack grid (periodic systems)
relativistic     = "atomic_zora scalar"
properties       = ["dipole", "polarizability"]   # beyond E/F/stress
# extra = { ... }                   # any control.in keyword, passed through verbatim

Polarizability via DFPT (the IR/Raman driver) is computed in a single aims.x run and selected automatically from the requested properties and the system's periodicity: dfpt = dielectric for periodic cells, dfpt = polarizability for molecules.

`qe` (Quantum ESPRESSO — open source)¶

[labeling.calculator]
type    = "qe"
ecutwfc = 60.0
kpts    = [4, 4, 4]
pseudopotentials = { Si = "Si.pbe-n-kjpaw_psl.1.0.0.UPF" }
properties = ["dipole"]

Fully open source (the source-built traincraft-qe image), so the whole workflow can run with no licensed software. QE can compute polarizability/dielectric response via DFPT (ph.x with epsil/fpol); because QE is a periodic code, molecular polarizability is done in a vacuum supercell. That is a second binary (ph.x) after the SCF, which TrainCraft's qe plugin does not wire yet — so for polarizability today use fhi_aims, or extend build_qe with a ph.x step (a tracked extension point, not a QE limitation).

The labeling stage¶

When [labeling] is present, the pipeline labels the selected frames after the funnel:

[labeling.calculator]
type = "fhi_aims"
# … settings as above …

This produces, under the run workspace:

labeled_dft/
  labeled.extxyz        # frames with E/F/stress(+dipole/pol) and origin=dft_labeled
  manifest.json         # level of theory, property set, frame count, wall time
  frame_0000/ …         # per-frame work dirs (so file-IO calcs don't collide)

Each labeled frame is tagged origin="dft_labeled" with its level_of_theory, so the expensive data stays cleanly separable from the cheap, ML-generated points (see Provenance). The dataset is built from these labeled frames.

Tip: to try the mechanics with zero deps, set [labeling.calculator] type = "emt" — EMT stands in for DFT and produces E/F/stress. examples/18 does this.