Skip to content

Training (multi-head MACE)

Once the spine has produced a labelled dataset, the [training] stage turns it into a trained interatomic potential. TrainCraft wraps MACE's mace_run_train rather than reimplementing training: core prepares the train/valid files and renders the command, the MACE/torch stack does the work inside traincraft-mlip.sif (or a local mace env).

[training]
type = "mace"
name = "cu_finetune"
foundation_model = "medium"      # small | medium | large | mace-mp0 | <path.model>
strategy = "multihead"           # multihead | naive | scratch
heads = ["energy", "forces", "stress"]
e0s = "foundation"
pt_train_file = "mp"             # replay data for multihead fine-tuning

The trainer is a registry plugin (@register("trainer", …)), so adding a backend (MatterSim/Orb/SevenNet/…) is one new file and one registry entry — the same pattern as calculators and samplers.

Where training sits in the pipeline

geometry → sample → select → label → dataset → TRAIN

train consumes the dataset artifact (or, if no [dataset] section, the labelled frames) and writes a model tree under runs/<name>/model/: train.xyz, valid.xyz, model/<name>.model, checkpoints/, results/, logs/, and a manifest.json recording the foundation model, heads, E0s, the exact rendered command, and the split sizes.

On HPC the stage runs as a GPU step (--nv, traincraft-mlip.sif); the command is injected from $TRAINCRAFT_MACE_TRAIN_COMMAND so the science stays container-agnostic (the same mechanism as the DFT labelers — see Run on HPC).

Multi-head property targets

The property set in heads selects the MACE model type and loss. Energy/forces (and stress on periodic data) are the core task; dipole and polarizability are the heads that enable IR and Raman spectra reconstruction (the original scientific driver — see Calculators & DFT Labeling).

heads include … MACE --model MACE --loss
energy / forces (+ stress) (foundation default) (foundation default)
dipole only AtomicDipolesMACE dipole
energy / forces + dipole EnergyDipolesMACE energy_forces_dipole
+ polarizability AtomicDielectricMACE dipole_polar

The dipole/polarizability model types are MACE's dielectric-property models (cf. MACE-MDP and mace-field). They move faster than the energy/forces path, so the exact --model/--loss/keys are overridable via [training.extra] — verify them against your installed MACE version before a production polarizability run, the same way the QE ph.x polarizability path is gated for labeling.

Labels are written with explicit reference keys (REF_energy, REF_forces, REF_stress, REF_dipole, REF_polarizability) and passed to MACE via --energy_key/--forces_key/… so a frame missing a label for one property is simply skipped for that head rather than trained on zeros.

Fine-tuning defaults (and why)

The defaults follow Tompa, Varga-Umbrich, Batatia, Elena, Bernstein & Csányi, "Fine-tuning MLIP foundation models: strategies for accuracy and transferability" (arXiv:2606.12704). The paper's headline finding is that foundation-model quality, atomic-reference-energy consistency, and stable optimization matter more than the choice of fine-tuning method. Concretely, TrainCraft defaults to:

Setting Default Rationale (paper)
e0s "foundation" Reuse the foundation's isolated-atom energies. Averaging from data is 2–3× worse on forces and can destabilize MD.
strategy "multihead" Multihead replay is the only method tested that consistently preserves out-of-distribution accuracy and the repulsive wall. Use pt_train_file = "mp" (or your own) for replay.
weight_decay 0.0 Essential when fine-tuning — non-zero decay pulls weights away from the pretrained solution.
ema_decay 0.995 A higher EMA decay (> 0.99) stabilizes fine-tuning.
energy_weight, forces_weight 10, 10 Constant, energy-prioritised loss weights; single-stage (no force→energy schedule).
lr 1e-3 Naive fine-tuning tolerates 1e-31e-4; multihead converges best nearer 1e-4.

Cost. Multihead replay needs ~3–15× the compute of naive fine-tuning (lower learning rate + dual data batches). Use strategy = "naive" for a narrow, single-application model where forgetting is not a concern, and strategy = "scratch" (with foundation_model unset) to train from random initialization.

LoRA is discussed in the paper as an intermediate regularizer but is not recommended as the primary method for MACE; if you need it, pass the relevant flags through [training.extra].

Running it

# locally (needs the mace env: torch + mace-torch)
pixi run -e mace example-21

# or as part of a Slurm pipeline (train becomes a --nv GPU step)
traincraft submit examples/21_train_mace_finetune.toml --dry-run

See also: Use Your Own MACE Model for loading the fine-tuned .model back in as an exploration calculator, and the Roadmap for dataset health tooling and validation (parity, learning curves, IR/Raman reconstruction), which build on this stage.