https://github.com/metatensor/metatrain

Training and evaluating machine learning models for atomistic systems.

https://github.com/metatensor/metatrain

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    2 of 16 committers (12.5%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

atomistic-simulations machine-learning molecular-dynamics torch

Keywords from Contributors

materials-science material-science molecule particle-mesh-ewald fast-fourier-transform electrostatics archival interactive projection sequences
Last synced: 5 months ago · JSON representation ·

Repository

Training and evaluating machine learning models for atomistic systems.

Basic Info
Statistics
  • Stars: 42
  • Watchers: 16
  • Forks: 12
  • Open Issues: 76
  • Releases: 11
Topics
atomistic-simulations machine-learning molecular-dynamics torch
Created over 2 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation Codeowners

README.md

Metatensor logo

[![tests status](https://img.shields.io/github/checks-status/metatensor/metatrain/main)](https://github.com/metatensor/metatrain/actions?query=branch%3Amain) [![documentation](https://img.shields.io/badge/📚_documentation-latest-sucess)](https://metatensor.github.io/metatrain) [![coverage](https://codecov.io/gh/metatensor/metatrain/branch/main/graph/badge.svg)](https://codecov.io/gh/metatensor/metatrain)

metatrain is a command line interface (CLI) to train and evaluate atomistic models of various architectures. It features a common yaml option inputs to configure training and evaluation. Trained models are exported as standalone files that can be used directly in various molecular dynamics (MD) engines (e.g. LAMMPS, i-PI, ASE ...) using the metatomic interface.

The idea behind metatrain is to have a general hub that provides a homogeneous environment and user interface, transforming every ML architecture into an end-to-end model that can be connected to an MD engine. Any custom architecture compatible with TorchScript can be integrated into metatrain, gaining automatic access to a training and evaluation interface, as well as compatibility with various MD engines.

Note: metatrain does not provide mathematical functionalities per se, but relies on external models that implement the various architectures.

List of Implemented Architectures

Currently metatrain supports the following architectures for building an atomistic model:

| Name | Description | |--------------------------|--------------------------------------------------------------------------------------------------------------------------------------| | GAP | Sparse Gaussian Approximation Potential (GAP) using Smooth Overlap of Atomic Positions (SOAP). | | PET | Point Edge Transformer (PET), interatomic machine learning potential | | NanoPET (experimental) | Re-implementation of the original PET with slightly improved training and evaluation speed | | PET (deprecated) | Original implementation of the PET model used for prototyping, now deprecated in favor of the native metatrain PET implementation. | | SOAP BPNN | A Behler-Parrinello neural network with SOAP features |

Documentation

For details, tutorials, and examples, please visit our documentation.

Installation

Install metatrain with pip:

bash pip install metatrain

Install specific models by specifying the model name. For example, to install the SOAP-BPNN model:

bash pip install metatrain[soap-bpnn]

We also offer a conda installation:

bash conda install -c conda-forge metatrain

⚠️ The conda installation does not install model-specific dependencies and will only work for architectures without optional dependencies such as NanoPET or PET.

After installation, you can use mtt from the command line to train your models!

Quickstart

To train a model, use the following command:

bash mtt train options.yaml

Where options.yaml is a configuration file specifying training options. For example, the following configuration trains a SOAP-BPNN model on the QM9 dataset:

```yaml

architecture used to train the model

architecture: name: soapbpnn training: numepochs: 5 # a very short training run

Mandatory section defining the parameters for system and target data of the training set

trainingset: systems: "qm9reduced_100.xyz" # file where the positions are stored targets: energy: key: "U0" # name of the target value unit: "eV" # unit of the target value

testset: 0.1 # 10% of the trainingset are randomly split for test validationset: 0.1 # 10% of the trainingset are randomly split for validation ```

Shell Completion

metatrain comes with completion definitions for its commands for bash and zsh. You must manually configure your shell to enable completion support.

To make the completions available, source the definitions in your shell’s startup file (e.g., ~/.bash_profile, ~/.zshrc, or ~/.profile):

bash source $(mtt --shell-completion)

Having problems or ideas?

Having a problem with metatrain? Please let us know by submitting an issue.

Submit new features or bug fixes through a pull request.

Contributors

Thanks goes to all people who make metatrain possible:

Contributors

Citing metatrain

If you found metatrain useful, you can cite its pre-print (https://doi.org/10.48550/arXiv.2508.15704) as

@misc{metatrain, title = {Metatensor and Metatomic: Foundational Libraries for Interoperable Atomistic Machine Learning}, shorttitle = {Metatensor and Metatomic}, author = {Bigi, Filippo and Abbott, Joseph W. and Loche, Philip and Mazitov, Arslan and Tisi, Davide and Langer, Marcel F. and Goscinski, Alexander and Pegolo, Paolo and Chong, Sanggyu and Goswami, Rohit and Chorna, Sofiia and Kellner, Matthias and Ceriotti, Michele and Fraux, Guillaume}, year = {2025}, month = aug, publisher = {arXiv}, doi = {10.48550/arXiv.2508.15704}, }

Owner

  • Name: metatensor
  • Login: metatensor
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you found metatrain useful for your work, you can cite it as below."
title: >-
  Metatensor and metatomic: foundational libraries for interoperable atomistic machine learning
abstract: |
  Incorporation of machine learning (ML) techniques into atomic-scale modeling has proven to be an extremely effective strategy to improve the accuracy and reduce the computational cost of simulations. It also entails conceptual and practical challenges, as it involves combining very different mathematical foundations, as well as software ecosystems that are very well developed in their own merit, but do not share many commonalities. To address these issues and facilitate the adoption of ML in atomistic simulations we introduce two dedicated software libraries. The first one, metatensor, provides cross-platform and cross-language storage and manipulation of arrays with many potentially sparse indices, designed from the ground up for atomistic ML applications. By combining the actual values with metadata that describes their nature and that facilitates the handling of geometric information and gradients with respect to the atomic positions, metatensor provides a common framework to enable data sharing between ML software - typically written in Python - and established atomistic modeling tools - typically written in Fortran, C or C++. The second library, metatomic, provides an interface to store an atomistic ML model and metadata about this model in a portable way, facilitating the implementation, training and distribution of models, and their use across different simulation packages. We showcase a growing ecosystem of tools, from low-level libraries, training utilities, to interfaces with existing software packages that demonstrate the effectiveness of metatensor and metatomic in bridging the gap between traditional simulation software and modern ML frameworks.
type: preprint
database: arXiv.org
repository: arXiv
url: http://arxiv.org/abs/2508.15704
keywords: 
  - Physics - Chemical Physics
authors: 
  - family-names: Bigi
    given-names: Filippo
  - family-names: Abbott
    given-names: Joseph W.
  - family-names: Loche
    given-names: Philip
  - family-names: Mazitov
    given-names: Arslan
  - family-names: Tisi
    given-names: Davide
  - family-names: Langer
    given-names: Marcel F.
  - family-names: Goscinski
    given-names: Alexander
  - family-names: Pegolo
    given-names: Paolo
  - family-names: Chong
    given-names: Sanggyu
  - family-names: Goswami
    given-names: Rohit
  - family-names: Chorna
    given-names: Sofiia
  - family-names: Kellner
    given-names: Matthias
  - family-names: Ceriotti
    given-names: Michele
  - family-names: Fraux
    given-names: Guillaume
date-published: 2025-08-21
identifiers: 
  - type: doi
    value: 10.48550/arXiv.2508.15704

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 384
  • Total Committers: 16
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.448
Past Year
  • Commits: 244
  • Committers: 14
  • Avg Commits per committer: 17.429
  • Development Distribution Score (DDS): 0.43
Top Committers
Name Email Commits
Filippo Bigi 9****r 212
Philip Loche p****e@p****e 103
Arslan Mazitov 9****v 19
Guillaume Fraux g****x@e****h 16
Davide Tisi d****3@g****m 7
Sergey Pozdnyakov s****v@g****m 6
Sanggyu "Raymond" Chong 8****g 4
Tulga-Erdene Sodjargal 1****n 4
Joseph W. Abbott j****t@g****m 3
Matthias Kellner 6****e 2
Paolo Pegolo p****o@e****h 2
dependabot[bot] 4****] 2
Alexander Goscinski a****i@p****e 1
JJ jj@t****m 1
Qianjun Xu 9****X 1
Markus Fasching m****l@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 114
  • Total pull requests: 549
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 20
  • Total pull request authors: 22
  • Average comments per issue: 0.91
  • Average comments per pull request: 1.01
  • Merged pull requests: 400
  • Bot issues: 1
  • Bot pull requests: 1
Past Year
  • Issues: 102
  • Pull requests: 474
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 7 days
  • Issue authors: 20
  • Pull request authors: 20
  • Average comments per issue: 0.92
  • Average comments per pull request: 0.9
  • Merged pull requests: 335
  • Bot issues: 1
  • Bot pull requests: 1
Top Authors
Issue Authors
  • frostedoyster (49)
  • bananenpampe (9)
  • jwa7 (8)
  • Luthaf (8)
  • PicoCentauri (7)
  • tulga-rdn (7)
  • HannaTuerk (4)
  • DavideTisi (3)
  • ppegolo (3)
  • abmazitov (3)
  • GardevoirX (2)
  • MarkusFas (2)
  • SanggyuChong (2)
  • johannes-spies (1)
  • pfebrer (1)
Pull Request Authors
  • frostedoyster (261)
  • PicoCentauri (103)
  • Luthaf (46)
  • abmazitov (32)
  • jwa7 (31)
  • ppegolo (20)
  • DavideTisi (12)
  • bananenpampe (9)
  • tulga-rdn (8)
  • SanggyuChong (7)
  • spozdn (3)
  • HannaTuerk (2)
  • MarkusFas (2)
  • GardevoirX (2)
  • RMeli (2)
Top Labels
Issue Labels
Priority: Medium (28) Infrastructure: Miscellaneous (22) Bug (21) Discussion (15) Priority: Low (11) PET (10) Infrastructure: Data (10) Infrastructure: CLI (9) Priority: High (9) Enhancement (7) Documentation (6) SOAP BPNN (6) Infrastructure: Logging (6) NanoPET (4) Good first issue (2) Alchemical Model (1) dependencies (1)
Pull Request Labels
dependencies (1) PET (1) Infrastructure: Logging (1) SOAP BPNN (1) Priority: High (1) Infrastructure: Miscellaneous (1) Enhancement (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 3,151 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 12
  • Total maintainers: 3
pypi.org: metatrain

Training and evaluating machine learning models for atomistic systems.

  • Versions: 12
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 3,151 Last month
Rankings
Dependent packages count: 10.8%
Average: 35.9%
Dependent repos count: 61.0%
Maintainers (3)
Last synced: 6 months ago

Dependencies

.github/workflows/tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
pyproject.toml pypi
  • metatensor-core *
  • metatensor-torch *
  • torch *
.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/docs.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/lint.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
.github/workflows/pr-docs-preview.yml actions
  • readthedocs/actions/preview v1 composite
.github/workflows/soap-bpnn-tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
docs/requirements.txt pypi
  • furo *
  • sphinx >=7
  • sphinx-toggleprompt *
  • tomli *
.github/workflows/alchemical-model-tests.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v4 composite
  • codecov/codecov-action v3 composite
.github/workflows/pet-tests.yml actions
  • actions/checkout v4 composite
  • actions/setup-python v5 composite
  • codecov/codecov-action v4 composite