https://github.com/metatensor/metatrain
Training and evaluating machine learning models for atomistic systems.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
2 of 16 committers (12.5%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Training and evaluating machine learning models for atomistic systems.
Basic Info
- Host: GitHub
- Owner: metatensor
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://metatensor.github.io/metatrain/
- Size: 95.3 MB
Statistics
- Stars: 42
- Watchers: 16
- Forks: 12
- Open Issues: 76
- Releases: 11
Topics
Metadata Files
README.md
[](https://github.com/metatensor/metatrain/actions?query=branch%3Amain) [](https://metatensor.github.io/metatrain) [](https://codecov.io/gh/metatensor/metatrain)
metatrain is a command line interface (CLI) to train and evaluate atomistic
models of various architectures. It features a common yaml option inputs to configure
training and evaluation. Trained models are exported as standalone files that can be
used directly in various molecular dynamics (MD) engines (e.g. LAMMPS, i-PI, ASE
...) using the metatomic interface.
The idea behind metatrain is to have a general hub that provides a homogeneous
environment and user interface, transforming every ML architecture into an end-to-end
model that can be connected to an MD engine. Any custom architecture compatible with
TorchScript can be integrated into
metatrain, gaining automatic access to a training and evaluation interface, as well as
compatibility with various MD engines.
Note:
metatraindoes not provide mathematical functionalities per se, but relies on external models that implement the various architectures.
List of Implemented Architectures
Currently metatrain supports the following architectures for building an atomistic
model:
| Name | Description |
|--------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| GAP | Sparse Gaussian Approximation Potential (GAP) using Smooth Overlap of Atomic Positions (SOAP). |
| PET | Point Edge Transformer (PET), interatomic machine learning potential |
| NanoPET (experimental) | Re-implementation of the original PET with slightly improved training and evaluation speed |
| PET (deprecated) | Original implementation of the PET model used for prototyping, now deprecated in favor of the native metatrain PET implementation. |
| SOAP BPNN | A Behler-Parrinello neural network with SOAP features |
Documentation
For details, tutorials, and examples, please visit our documentation.
Installation
Install metatrain with pip:
bash
pip install metatrain
Install specific models by specifying the model name. For example, to install the SOAP-BPNN model:
bash
pip install metatrain[soap-bpnn]
We also offer a conda installation:
bash
conda install -c conda-forge metatrain
⚠️ The conda installation does not install model-specific dependencies and will only work for architectures without optional dependencies such as NanoPET or PET.
After installation, you can use mtt from the command line to train your models!
Quickstart
To train a model, use the following command:
bash
mtt train options.yaml
Where options.yaml is a configuration file specifying training options. For example, the following configuration trains a SOAP-BPNN model on the QM9 dataset:
```yaml
architecture used to train the model
architecture: name: soapbpnn training: numepochs: 5 # a very short training run
Mandatory section defining the parameters for system and target data of the training set
trainingset: systems: "qm9reduced_100.xyz" # file where the positions are stored targets: energy: key: "U0" # name of the target value unit: "eV" # unit of the target value
testset: 0.1 # 10% of the trainingset are randomly split for test validationset: 0.1 # 10% of the trainingset are randomly split for validation ```
Shell Completion
metatrain comes with completion definitions for its commands for bash and zsh. You
must manually configure your shell to enable completion support.
To make the completions available, source the definitions in your shell’s startup file
(e.g., ~/.bash_profile, ~/.zshrc, or ~/.profile):
bash
source $(mtt --shell-completion)
Having problems or ideas?
Having a problem with metatrain? Please let us know by submitting an issue.
Submit new features or bug fixes through a pull request.
Contributors
Thanks goes to all people who make metatrain possible:
Citing metatrain
If you found metatrain useful, you can cite its pre-print
(https://doi.org/10.48550/arXiv.2508.15704) as
@misc{metatrain,
title = {Metatensor and Metatomic: Foundational Libraries for Interoperable Atomistic
Machine Learning},
shorttitle = {Metatensor and Metatomic},
author = {Bigi, Filippo and Abbott, Joseph W. and Loche, Philip and Mazitov, Arslan
and Tisi, Davide and Langer, Marcel F. and Goscinski, Alexander and Pegolo, Paolo
and Chong, Sanggyu and Goswami, Rohit and Chorna, Sofiia and Kellner, Matthias and
Ceriotti, Michele and Fraux, Guillaume},
year = {2025},
month = aug,
publisher = {arXiv},
doi = {10.48550/arXiv.2508.15704},
}
Owner
- Name: metatensor
- Login: metatensor
- Kind: organization
- Repositories: 1
- Profile: https://github.com/metatensor
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you found metatrain useful for your work, you can cite it as below."
title: >-
Metatensor and metatomic: foundational libraries for interoperable atomistic machine learning
abstract: |
Incorporation of machine learning (ML) techniques into atomic-scale modeling has proven to be an extremely effective strategy to improve the accuracy and reduce the computational cost of simulations. It also entails conceptual and practical challenges, as it involves combining very different mathematical foundations, as well as software ecosystems that are very well developed in their own merit, but do not share many commonalities. To address these issues and facilitate the adoption of ML in atomistic simulations we introduce two dedicated software libraries. The first one, metatensor, provides cross-platform and cross-language storage and manipulation of arrays with many potentially sparse indices, designed from the ground up for atomistic ML applications. By combining the actual values with metadata that describes their nature and that facilitates the handling of geometric information and gradients with respect to the atomic positions, metatensor provides a common framework to enable data sharing between ML software - typically written in Python - and established atomistic modeling tools - typically written in Fortran, C or C++. The second library, metatomic, provides an interface to store an atomistic ML model and metadata about this model in a portable way, facilitating the implementation, training and distribution of models, and their use across different simulation packages. We showcase a growing ecosystem of tools, from low-level libraries, training utilities, to interfaces with existing software packages that demonstrate the effectiveness of metatensor and metatomic in bridging the gap between traditional simulation software and modern ML frameworks.
type: preprint
database: arXiv.org
repository: arXiv
url: http://arxiv.org/abs/2508.15704
keywords:
- Physics - Chemical Physics
authors:
- family-names: Bigi
given-names: Filippo
- family-names: Abbott
given-names: Joseph W.
- family-names: Loche
given-names: Philip
- family-names: Mazitov
given-names: Arslan
- family-names: Tisi
given-names: Davide
- family-names: Langer
given-names: Marcel F.
- family-names: Goscinski
given-names: Alexander
- family-names: Pegolo
given-names: Paolo
- family-names: Chong
given-names: Sanggyu
- family-names: Goswami
given-names: Rohit
- family-names: Chorna
given-names: Sofiia
- family-names: Kellner
given-names: Matthias
- family-names: Ceriotti
given-names: Michele
- family-names: Fraux
given-names: Guillaume
date-published: 2025-08-21
identifiers:
- type: doi
value: 10.48550/arXiv.2508.15704
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Filippo Bigi | 9****r | 212 |
| Philip Loche | p****e@p****e | 103 |
| Arslan Mazitov | 9****v | 19 |
| Guillaume Fraux | g****x@e****h | 16 |
| Davide Tisi | d****3@g****m | 7 |
| Sergey Pozdnyakov | s****v@g****m | 6 |
| Sanggyu "Raymond" Chong | 8****g | 4 |
| Tulga-Erdene Sodjargal | 1****n | 4 |
| Joseph W. Abbott | j****t@g****m | 3 |
| Matthias Kellner | 6****e | 2 |
| Paolo Pegolo | p****o@e****h | 2 |
| dependabot[bot] | 4****] | 2 |
| Alexander Goscinski | a****i@p****e | 1 |
| JJ | jj@t****m | 1 |
| Qianjun Xu | 9****X | 1 |
| Markus Fasching | m****l@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 114
- Total pull requests: 549
- Average time to close issues: about 2 months
- Average time to close pull requests: 9 days
- Total issue authors: 20
- Total pull request authors: 22
- Average comments per issue: 0.91
- Average comments per pull request: 1.01
- Merged pull requests: 400
- Bot issues: 1
- Bot pull requests: 1
Past Year
- Issues: 102
- Pull requests: 474
- Average time to close issues: about 1 month
- Average time to close pull requests: 7 days
- Issue authors: 20
- Pull request authors: 20
- Average comments per issue: 0.92
- Average comments per pull request: 0.9
- Merged pull requests: 335
- Bot issues: 1
- Bot pull requests: 1
Top Authors
Issue Authors
- frostedoyster (49)
- bananenpampe (9)
- jwa7 (8)
- Luthaf (8)
- PicoCentauri (7)
- tulga-rdn (7)
- HannaTuerk (4)
- DavideTisi (3)
- ppegolo (3)
- abmazitov (3)
- GardevoirX (2)
- MarkusFas (2)
- SanggyuChong (2)
- johannes-spies (1)
- pfebrer (1)
Pull Request Authors
- frostedoyster (261)
- PicoCentauri (103)
- Luthaf (46)
- abmazitov (32)
- jwa7 (31)
- ppegolo (20)
- DavideTisi (12)
- bananenpampe (9)
- tulga-rdn (8)
- SanggyuChong (7)
- spozdn (3)
- HannaTuerk (2)
- MarkusFas (2)
- GardevoirX (2)
- RMeli (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 3,151 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 12
- Total maintainers: 3
pypi.org: metatrain
Training and evaluating machine learning models for atomistic systems.
- Documentation: https://metatrain.readthedocs.io/
- License: bsd-3-clause
-
Latest release: 2025.9.1
published 6 months ago
Rankings
Maintainers (3)
Dependencies
- actions/checkout v3 composite
- actions/setup-python v4 composite
- metatensor-core *
- metatensor-torch *
- torch *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- readthedocs/actions/preview v1 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- furo *
- sphinx >=7
- sphinx-toggleprompt *
- tomli *
- actions/checkout v3 composite
- actions/setup-python v4 composite
- codecov/codecov-action v3 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- codecov/codecov-action v4 composite