Roadmap¶
TrainCraft is developed in phased, independently-testable chunks. The dataset + selection layer is the spine — everything else builds on it.
✅ Phase 0 — Foundation¶
Done. A clean, installable, testable package with no globals.
- pydantic v2 config models (discriminated unions,
extra=forbid) Structure(with content hash), registry,Workspace/Job, provenance- Geometry: file/scratch sources; nanotube/molecule builders; vacuum/supercell/perturb transforms
- Calculators:
emt,tblite/xtb,mace(MP0 + fine-tuned) - Sampling:
md(Langevin NVT),rattle(HiPhive) - Selection funnel: physicality → dedup → diversity (FPS)
- Dataset: extxyz IO with provenance; hash-dedup
- CLI:
run/validate/new/plugins - 6 annotated examples, 19 tests, CI on GitHub Actions
✅ Phase 1, Chunk 1 — Molecules on Surfaces + Monte Carlo¶
Done. Fragment identity + surface adsorbate builders + Metropolis MC.
core/fragments.py— per-atomtc_fragmentarray;infer_fragmentsfor reactive runssmilessource — RDKit ETKDG + MMFFsurface_adsorbatebuilder — single adsorbate on crystalline slabsurface_packingbuilder — N-molecule coverage via Packmolmonte_carlosampler — translate/rotate/conformer-swap with Metropolis acceptance- Examples 07–11 (CO on Cu, ethanol, butane)
✅ Phase 1, Chunk 2 — Mechanical Geometry Breadth¶
Done. All common bulk, surface, and 2D structure types.
core/converter.py— ASE ↔ pymatgen ↔ RDKit bridgeurlsource — download → ASE readcrystalbuilder — bulk + supercell + vacancy/substitution/interstitial defectsslabbuilder — named facet or arbitrary Miller indices; all-frameworklayeredbuilder — graphene/hBN/MX₂; AA/AB stacking; twist (non-periodic moiré flake)- Transforms:
strain(hydrostatic/Voigt),rotate,set_pbc - Examples 12–14, 36 additional tests
✅ Phase 1, Chunk 3 — Database Providers, Liquids, Intercalation, Constraints¶
Done. Remaining geometry breadth (except the polymer builder).
- Sources:
materials_project(mp-api),optimadeandpubchem(both dependency-free) liquidbuilder — Packmol multi-species box (explicit cell or density-driven)intercalationbuilder — guests per gallery of a planar layered host, with stagingconstraintstransform —FixAtomson the final structure (fixes the legacy index-misalignment bug after reordering builders)- Examples 15–17, unit tests (network/Packmol paths skip when deps are absent)
- Deferred:
polymer(PySoftK) — dependency is unreliable; wrapper to be verified against the live API rather than guessed
🟡 Phase 2 — DFT Labeling¶
Label selected frames with energy, forces, stress, dipole, and polarizability.
- ✅
calculators/dft.py— FHI-aims (fhi_aims) and Quantum ESPRESSO (qe) factories; FHI-aims polarizability via DFPT (dielectricperiodic /polarizabilitymolecular, auto-selected). Run command injected from the environment so the plugins stay container-agnostic. QE polarizability raisesNotImplementedError(needs aph.xrun). - ✅ Labeling stage (
[labeling]): labels the selected frames, tags themdft_labeledwith level of theory, writeslabeled_dft/(labeled.extxyz,manifest.json, per-frame work dirs).examples/18. - 🔜 Cost-aware labeling: polarizability flagged as the expensive task
- 🔜 Production runs on any Slurm cluster via the DFT container (or
runtime=native)
🟡 Cross-cutting — Packaging & HPC Deployment (any Slurm cluster)¶
Run the real workflow on any Slurm cluster via Apptainer (or the cluster's
own binaries). Nothing is site-specific in the code. See
DESIGN.md §20 and
the Run on HPC (Slurm + Apptainer) guide.
- ✅ Architecture + four Apptainer
*.deffiles (containers/):traincraft-core(CPU orchestrator),traincraft-mlip(GPU MACE),traincraft-qe(QE, open source),traincraft-dft(FHI-aims — private, licensed). DFT images are compiled from source (self-contained UCX+PMIx+OpenMPI). - ✅ Resumable per-stage execution (
traincraft stage) + a portable Slurm executor that renders dependency-chained sbatch scripts (traincraft submit,[orchestration]config) with two cluster-agnostic knobs:runtime(apptainerimages |nativehost binaries) andmpi(pmix|cray_shasta|pmi2|none).examples/19(Leonardo, apptainer+pmix),examples/20(LUMI, native+cray_shasta). - 🔜 Build + validate the images on a real cluster (single-node DFT, then multi-node)
Phase 3 — Training + Validation (in progress)¶
Train a multi-head MACE model and measure quality end-to-end. Delivered in chunks: training first (validation builds on it), then dataset health, then validation.
✅ Chunk 1 — Training (training/). MACE fine-tune / train-from-scratch
wrapper over mace_run_train, as a pluggable trainer registry backend.
Multi-head property targets (energy/forces/stress + dipole + polarizability) map
onto MACE's model types and losses (AtomicDipolesMACE / EnergyDipolesMACE /
AtomicDielectricMACE). The train stage consumes the dataset and emits a model
tree (model/<name>.model + manifest); on HPC it runs as a GPU (--nv) step in
traincraft-mlip.sif with the command injected from the environment. Fine-tuning
defaults follow Tompa et al. (arXiv:2606.12704):
foundation-consistent E0s, multihead replay against forgetting, weight_decay=0,
high EMA, constant energy-prioritised loss weights. See
Training; examples/21.
🔜 Chunk 2 — Dataset health tooling (datasets/). Composition/space/volume
coverage maps, per-element force distributions with outlier flags, extrapolation
grade, redundancy report.
🔜 Chunk 3 — Validation (validation/). Per-property parity + RMSE/MAE per
element, learning curves, NVE/MD stability, EOS/phonons, and IR/Raman spectra
reconstructed from MLIP-driven MD vs DFT/experiment.
🔜 Phase 4 — Active-Learning Loop¶
Close the loop: explore → select → label → retrain → converge.
selection/uncertainty.py— committee/ensemble uncertainty selectoractive_learning/— full loop with resume/idempotency- Convergence criteria: val force-RMSE + spectral error thresholds
🔜 Phase 5 — Orchestration¶
Parallel execution of the active-learning loop.
- Local engine hardened: threadpool for independent jobs
- QuACC adapter: explore + label stages as a parallel DAG
- Identical science, swappable engine
🔜 Phase 6 — Polish & Extras¶
- Full public API docs + library-usage tutorials (including Raman use case)
- Additional MLIP backends: MatterSim, Orb, SevenNet, CHGNet
Agent workbench — a purpose-built web UI¶
A single browser app, served from the VM (WebGL rendering needs no X server), that combines a conversational agent (the workflow pattern is introduced in Tutorial 11) with tabbed views over one workflow. It is a front-end over the existing TOML spine — the same configs the CLI and agent already use, no parallel logic:
- Chat — driven by Pi.dev as the agent backend (not a homegrown loop); Pi.dev does the agentic work — reading the schema, writing, validating and running configs — and the workbench renders the conversation and its results inline.
- Geometry — interactive 3D of the structure the agent just built (weas-widget / py3Dmol), with natural-language edits round-tripping to the agent.
- Workflow — the node-based editor: the pipeline DAG (geometry → sample → select → label → dataset → train) as nodes, edited visually and (de)serialised to/from the TOML the CLI runs.
- Dataset — interactive exploration of the generated dataset with chemiscope (structure–property maps linked to per-frame structures, descriptors, energies and forces).
Likely Streamlit/Gradio + stmol/py3Dmol + the chemiscope widget; the node editor emits the serialised TOML DAG. Details TBD.
Dependency graph¶
graph TD
P0["✅ Phase 0<br/>Foundation"]
P1A["✅ Phase 1 Ch.1<br/>Surfaces + MC"]
P1B["✅ Phase 1 Ch.2<br/>Geometry breadth"]
P1C["✅ Phase 1 Ch.3<br/>providers, liquid, intercalation"]
P2["🟡 Phase 2<br/>DFT labeling"]
HPC["🟡 Containers + HPC<br/>any Slurm cluster"]
P3["🔜 Phase 3<br/>Training + Validation"]
P4["🔜 Phase 4<br/>Active Learning"]
P5["🔜 Phase 5<br/>Orchestration"]
P6["🔜 Phase 6<br/>Polish"]
P0 --> P1A --> P1B --> P1C
P0 --> P2
P2 --> P3 --> P4 --> P5 --> P6
HPC -.-> P2
HPC -.-> P3