Training (multi-head MACE)¶
Once the spine has produced a labelled dataset, the [training] stage turns
it into a trained interatomic potential. TrainCraft wraps MACE's
mace_run_train rather than reimplementing training: core prepares the
train/valid files and renders the command, the MACE/torch stack does the work
inside traincraft-mlip.sif (or a local mace env).
[training]
type = "mace"
name = "cu_finetune"
foundation_model = "medium" # small | medium | large | mace-mp0 | <path.model>
strategy = "multihead" # multihead | naive | scratch
heads = ["energy", "forces", "stress"]
e0s = "foundation"
pt_train_file = "mp" # replay data for multihead fine-tuning
The trainer is a registry plugin (@register("trainer", …)), so adding a
backend (MatterSim/Orb/SevenNet/…) is one new file and one registry entry — the
same pattern as calculators and samplers.
Where training sits in the pipeline¶
train consumes the dataset artifact (or, if no [dataset] section, the
labelled frames) and writes a model tree under runs/<name>/model/:
train.xyz, valid.xyz, model/<name>.model, checkpoints/, results/,
logs/, and a manifest.json recording the foundation model, heads, E0s, the
exact rendered command, and the split sizes.
On HPC the stage runs as a GPU step (--nv, traincraft-mlip.sif); the
command is injected from $TRAINCRAFT_MACE_TRAIN_COMMAND so the science stays
container-agnostic (the same mechanism as the DFT labelers — see
Run on HPC).
Multi-head property targets¶
The property set in heads selects the MACE model type and loss. Energy/forces
(and stress on periodic data) are the core task; dipole and
polarizability are the heads that enable IR and Raman spectra reconstruction
(the original scientific driver — see Calculators & DFT
Labeling).
heads include … |
MACE --model |
MACE --loss |
|---|---|---|
| energy / forces (+ stress) | (foundation default) | (foundation default) |
| dipole only | AtomicDipolesMACE |
dipole |
| energy / forces + dipole | EnergyDipolesMACE |
energy_forces_dipole |
| + polarizability | AtomicDielectricMACE |
dipole_polar |
The dipole/polarizability model types are MACE's dielectric-property models (cf.
MACE-MDP and mace-field). They move faster than the energy/forces path, so the
exact --model/--loss/keys are overridable via [training.extra] — verify
them against your installed MACE version before a production polarizability run,
the same way the QE ph.x polarizability path is gated for labeling.
Labels are written with explicit reference keys (REF_energy, REF_forces,
REF_stress, REF_dipole, REF_polarizability) and passed to MACE via
--energy_key/--forces_key/… so a frame missing a label for one property is
simply skipped for that head rather than trained on zeros.
Fine-tuning defaults (and why)¶
The defaults follow Tompa, Varga-Umbrich, Batatia, Elena, Bernstein & Csányi, "Fine-tuning MLIP foundation models: strategies for accuracy and transferability" (arXiv:2606.12704). The paper's headline finding is that foundation-model quality, atomic-reference-energy consistency, and stable optimization matter more than the choice of fine-tuning method. Concretely, TrainCraft defaults to:
| Setting | Default | Rationale (paper) |
|---|---|---|
e0s |
"foundation" |
Reuse the foundation's isolated-atom energies. Averaging from data is 2–3× worse on forces and can destabilize MD. |
strategy |
"multihead" |
Multihead replay is the only method tested that consistently preserves out-of-distribution accuracy and the repulsive wall. Use pt_train_file = "mp" (or your own) for replay. |
weight_decay |
0.0 |
Essential when fine-tuning — non-zero decay pulls weights away from the pretrained solution. |
ema_decay |
0.995 |
A higher EMA decay (> 0.99) stabilizes fine-tuning. |
energy_weight, forces_weight |
10, 10 |
Constant, energy-prioritised loss weights; single-stage (no force→energy schedule). |
lr |
1e-3 |
Naive fine-tuning tolerates 1e-3–1e-4; multihead converges best nearer 1e-4. |
Cost. Multihead replay needs ~3–15× the compute of naive fine-tuning (lower learning rate + dual data batches). Use
strategy = "naive"for a narrow, single-application model where forgetting is not a concern, andstrategy = "scratch"(withfoundation_modelunset) to train from random initialization.
LoRA is discussed in the paper as an intermediate regularizer but is not
recommended as the primary method for MACE; if you need it, pass the relevant
flags through [training.extra].
Running it¶
# locally (needs the mace env: torch + mace-torch)
pixi run -e mace example-21
# or as part of a Slurm pipeline (train becomes a --nv GPU step)
traincraft submit examples/21_train_mace_finetune.toml --dry-run
See also: Use Your Own MACE Model for loading the
fine-tuned .model back in as an exploration calculator, and the
Roadmap for dataset health tooling and validation (parity,
learning curves, IR/Raman reconstruction), which build on this stage.