torchsim-hepba
High-entropy Prussian blue analogues atomistic modeling with ML potentials.
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.7%) to scientific vocabulary
Repository
High-entropy Prussian blue analogues atomistic modeling with ML potentials.
Basic Info
- Host: GitHub
- Owner: MKPBattery
- License: mit
- Language: Python
- Default Branch: main
- Size: 2.25 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
TorchSim HEPBA Project
This project provides an automated, extensible platform for the atomistic modeling and analysis of high-entropy Prussian blue analogues (HEPBA). By integrating state-of-the-art machine learning interatomic potentials (such as MACE) with efficient structure generation, optimization, and property analysis workflows, the project enables:
- Flexible construction of multi-metal HEPBA supercells, including structural water
- High-throughput structure optimization and property prediction using ML potentials
- Batch comparison and visualization of key structural and energetic properties across different metal compositions
- Output of standard structure files (.cif), analysis results (.json), and publication-ready plots
The codebase is designed for scientific rigor, reproducibility, and extensibility, making it a powerful tool for modern materials discovery and high-throughput screening.
TorchSim is a next-generation open-source atomistic simulation engine for the MLIP era. By rewriting the core primitives of atomistic simulation in Pytorch, it allows orders of magnitude acceleration of popular machine learning potentials.
- Automatic batching and GPU memory management allowing significant simulation speedup
- Support for MACE, Fairchem, SevenNet, ORB, MatterSim, graph-pes, and metatensor MLIP models
- Support for classical lennard jones, morse, and soft-sphere potentials
- Molecular dynamics integration schemes like NVE, NVT Langevin, and NPT Langevin
- Relaxation of atomic positions and cell with gradient descent and FIRE
- Swap monte carlo and hybrid swap monte carlo algorithm
- An extensible binary trajectory writing format with support for arbitrary properties
- A simple and intuitive high-level API for new users
- Integration with ASE, Pymatgen, and Phonopy
- and more: differentiable simulation, elastic properties, custom workflows...
Quick Start
Here is a quick demonstration of many of the core features of TorchSim: native support for GPUs, MLIP models, ASE integration, simple API, autobatching, and trajectory reporting, all in under 40 lines of code.
Running batched MD
```py import torch import torch_sim as ts
run natively on gpus
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
easily load the model from mace-mp
from mace.calculators.foundationsmodels import macemp from torchsim.models import MaceModel mace = macemp(model="small", returnrawmodel=True) mace_model = MaceModel(model=mace, device=device)
from ase.build import bulk cuatoms = bulk("Cu", "fcc", a=3.58, cubic=True).repeat((2, 2, 2)) manycuatoms = [cuatoms] * 50 trajectoryfiles = [f"Cutraj{i}.h5md" for i in range(len(manycu_atoms))]
run them all simultaneously with batching
finalstate = ts.integrate( system=manycuatoms, model=macemodel, nsteps=50, timestep=0.002, temperature=1000, integrator=ts.nvtlangevin, trajectoryreporter=dict(filenames=trajectoryfiles, statefrequency=10), ) finalatomslist = finalstate.to_atoms()
extract the final energy from the trajectory file
finalenergies = [] for filename in trajectoryfiles: with ts.TorchSimTrajectory(filename) as traj: finalenergies.append(traj.getarray("potential_energy")[-1])
print(final_energies) ```
Running batched relaxation
To then relax those structures with FIRE is just a few more lines.
```py
relax all of the high temperature states
relaxedstate = ts.optimize( system=finalstate, model=macemodel, optimizer=ts.frechetcell_fire, autobatcher=True, )
print(relaxed_state.energy) ```
Speedup
TorchSim achieves up to 100x speedup compared to ASE with popular MLIPs.
This figure compares the time per atom of ASE and torch_sim. Time per atom is defined
as the number of atoms / total time. While ASE can only run a single system of n_atoms
(on the $x$ axis), torch_sim can run as many systems as will fit in memory. On an H100 80 GB card,
the max atoms that could fit in memory was ~8,000 for GemNet, ~10,000 for MACE, and ~2,500
for SevenNet. This metric describes model performance by capturing speed and memory
usage simultaneously.
Installation
PyPI Installation
sh
pip install torch-sim-atomistic
Installing from source
sh
git clone https://github.com/radical-ai/torch-sim
cd torch-sim
pip install .
Examples
To understand how TorchSim works, start with the comprehensive tutorials in the documentation.
Core Modules
TorchSim's package structure is summarized in the API reference documentation and drawn as a treemap below.
License
TorchSim is released under an MIT license.
Citation
A manuscript is in preparation. Meanwhile, if you use TorchSim in your research, please cite the Zenodo archive.
Physical-Chemical Principles and Physical Reliability in MACE Machine Learning Potentials
This project utilizes MACE (Machine-Learning Atomic Cluster Expansion) and related machine learning interatomic potentials for large-scale atomistic simulations. Although MACE is fundamentally data-driven, its model design and training process incorporate a wide range of physical and chemical principles to ensure that the model is not a "black-box fit," but a physically reliable and efficient approximation. Key aspects include:
Symmetry Constraints
- Translational, rotational, and atomic permutation symmetries are automatically satisfied, ensuring the invariance of energy and forces under these operations.
- Local environment descriptors are constructed to be symmetry-invariant.
Additivity of Energy
- The total energy is decomposed into a sum of local atomic energies, reflecting the physical principle of locality in atomic interactions.
- Only atoms within a certain cutoff radius contribute to the local energy, neglecting long-range interactions as appropriate.
Physical Consistency of Forces
- Forces are obtained as the negative gradient of energy with respect to atomic positions; both energy and forces are fitted during training to ensure physical consistency.
- Stress tensors can also be predicted, maintaining the correct energy-force-stress relationships.
Element Distinction and Chemical Information
- Atomic species are encoded in the input, allowing the model to distinguish different chemical elements.
- Multi-body interactions (two-body, three-body, four-body, etc.) are included to capture complex chemical bonding.
Physically Representative Training Sets
- Training data covers a wide range of structures, temperatures, defects, and compositions to ensure model generalization.
- Unphysical high-energy structures are excluded to prevent the model from learning non-physical trends.
Conservation Laws and Physical Constraints
- Energy conservation and Newton's third law (equal and opposite forces) are automatically satisfied.
- Higher-order physical conservation (e.g., magnetism, polarity) can be supported in advanced models.
Physical Interpretability
- Local energy distributions can be analyzed to understand structural stability.
- Backpropagation enables tracing the physical origin of energy and force contributions.
Generalization and Physical Reasonableness
- The model is reliable only within the physical-chemical space covered by the training set; extrapolation is flagged or unreliable.
- Physical priors (e.g., hard-sphere repulsion, bond length ranges) can be incorporated to prevent non-physical predictions.
In summary, MACE and similar ML potentials incorporate fundamental physical and chemical principles at every stage—from model structure and input/output to training objectives and data selection—ensuring that simulation results are physically meaningful and scientifically valuable.
Challenges and Limitations of Atomistic Simulations
While atomistic simulations (including DFT and ML-based potentials) are powerful tools for understanding and predicting material properties, it is important to recognize their inherent computational challenges and scientific limitations:
1. Enormous Computational Cost
- Number of Atoms: Real materials contain on the order of $10^{23}$ atoms, but simulations are typically limited to hundreds (DFT) or up to millions (ML potentials) of atoms due to computational constraints.
- Degrees of Freedom: Each atom has three spatial coordinates; a system with 1,000 atoms has 3,000 degrees of freedom, leading to a highly complex energy landscape.
- Complex Interactions: Accurate modeling requires considering not only pairwise but also many-body interactions, and for DFT, all electronic degrees of freedom, causing computational cost to scale steeply with system size.
- Short Time Steps: Atomic vibrations occur on the femtosecond (10⁻¹⁵ s) scale, so molecular dynamics simulations require millions to billions of time steps to reach nanosecond or microsecond timescales.
- Sampling Limitations: High-entropy or disordered materials have an astronomical number of possible atomic configurations; a single simulation samples only a tiny fraction of this space.
2. Model Approximations and Uncertainties
- Potential Energy Surface Approximations: DFT relies on exchange-correlation functionals, and ML potentials are trained on finite datasets—neither is a perfect representation of physical reality.
- Transferability: ML models may not reliably predict properties for structures or chemistries not represented in the training set.
3. System Size and Boundary Effects
- Finite Size Effects: Simulated systems are much smaller than real materials, and artificial boundary conditions (e.g., periodic boundaries) can introduce artifacts.
4. Dynamics and Statistical Averaging
- Limited Sampling: Many material properties depend on long-time, ensemble-averaged behavior, while simulations typically provide only short trajectories or snapshots.
- Real-World Complexity: Experimental materials contain impurities, defects, and stresses that are difficult to fully capture in simulations.
5. Scale Gap Between Simulation and Experiment
- Macroscopic vs. Microscopic: Experiments measure macroscopic averages or distributions, while simulations provide microscopic snapshots or limited statistical samples.
- Multiscale Phenomena: Many important processes (e.g., phase transitions, aging, interfacial reactions) span multiple length and time scales, beyond the reach of single-scale atomistic simulations.
6. Practical Examples
- DFT: Simulating a 100-atom system may require hours to days on a supercomputer.
- ML Potentials: Can handle tens of thousands of atoms, but still face memory and sampling bottlenecks for larger or longer simulations.
- High-Entropy Materials: The number of possible atomic arrangements is astronomical; a single simulation represents only a minuscule subset.
7. Scientific Attitude
- Value: Atomistic simulations are invaluable for revealing microscopic mechanisms, generating structural hypotheses, predicting trends, and guiding experiments.
- Limitations: Results should not be interpreted as absolute truth, but as one piece of evidence to be validated and complemented by experimental data.
In summary, atomistic simulations provide deep insights into the microscopic world, but their results are subject to computational, statistical, and model-based uncertainties. They are best used as a tool for hypothesis generation and trend prediction, in close conjunction with experimental validation.
Project Achievements and Implemented Features
This project has accomplished the following core functionalities and results:
1. Automated Generation and Optimization of High-Entropy Prussian Blue Analogues (HEPBA)
- Flexible specification of multiple metal elements at the M site, enabling high-entropy design.
- Generation of realistic 3D supercell structures, including structural water.
- Automatic output of structure files (.cif) for downstream visualization and analysis.
2. Integration of MACE and Other Machine Learning Potentials
- Successful loading and application of MACE and related ML interatomic potentials for efficient prediction of energy, forces, and stress in large systems.
- GPU acceleration supported, significantly improving simulation efficiency.
3. Structure Optimization and Property Analysis
- Support for optimization algorithms such as FIRE, enabling automatic relaxation to stable structures.
- Automatic output of energy and force trajectory plots during optimization.
- Automated analysis and reporting of key properties: energy, density, volume, M-N/C-N bond lengths, etc.
4. Batch Comparison and Analysis of Multi-Metal Systems
- Batch generation, optimization, and comparison of single-metal HEPBA structures (Fe, Co, Ni, Mn, Cu, etc.).
- Automatic generation of comparison tables and plots (e.g., energy, density, bond lengths), facilitating structure-property relationship analysis.
5. Structure Visualization and Data Output
- Automatic saving of both initial and optimized structures (.cif), directly viewable in tools like VESTA.
- Automatic saving of analysis results (.json), comparison plots (.png), and other data for further processing and publication.
6. Scientific Rigor and Extensibility
- README provides detailed explanations of the physical-chemical constraints in MACE and the scientific limitations of atomistic simulations, demonstrating the project's scientific rigor.
- Codebase is well-structured and extensible, supporting further development for more metal combinations, different potentials, and new simulation tasks.
In summary, this project enables automated structure generation, ML potential-driven optimization and analysis, batch comparison, and visualization for high-entropy Prussian blue analogues, with strong scientific grounding and extensibility—making it a powerful tool for modern materials simulation and high-throughput screening.
Output Files and Their Descriptions
This project generates a variety of output files during structure generation, optimization, and analysis. Below is a detailed description of each output type:
1. Structure Files (.cif)
initial_{metal}_hepba.cif: The initial (unrelaxed) structure of the HEPBA supercell for each metal (e.g.,initial_Fe_hepba.cif).optimized_{metal}_hepba.cif: The optimized (relaxed) structure after energy minimization for each metal (e.g.,optimized_Cu_hepba.cif).- Usage: These files can be visualized and analyzed using crystallographic software such as VESTA, OVITO, or ASE.
2. Analysis Results (.json)
hepba_analysis_results_{metal}.json: Contains detailed property data for each metal's HEPBA structure, including:- Initial and optimized energy
- Volume and density
- Average bond lengths (M-N, C-N, etc.)
- Other structural and energetic properties
- Usage: Useful for quantitative comparison, further data analysis, or integration into reports and publications.
3. Optimization Trajectory Plots (.png)
optimization_{metal}_trajectory.png: Plots showing the evolution of energy and maximum force during the structure optimization process for each metal.- Usage: Helps assess the convergence and stability of the optimization process.
4. Comparison Plots (metal_comparison.png)
metal_comparison.png: A summary figure comparing key properties (energy, density, bond lengths, etc.) across all studied metals in a single view.- Usage: Facilitates visual comparison and highlights trends or differences between different HEPBA compositions.
5. Other Data Files
- Trajectory files (
.h5,.h5md,.hdf5,.traj): If molecular dynamics or batch simulations are performed, these files store atomic trajectories and can be analyzed with ASE or TorchSim tools.
In summary, the output files provide comprehensive structural, energetic, and comparative information for high-entropy Prussian blue analogues, supporting both in-depth analysis and publication-quality visualization.
Owner
- Name: Kepei Miao
- Login: MKPBattery
- Kind: user
- Company: University of Miami
- Repositories: 1
- Profile: https://github.com/MKPBattery
Battery health, AI/ PhD candidate in University of Miami
Citation (citation.cff)
cff-version: 1.2.0
title: Torch-Sim
message: If you use this software, please cite it as below.
authors:
- family-names: Gangan
given-names: Abhijeet S.
- family-names: Cohen
given-names: Orion Archer
- family-names: Riebesell
given-names: Janosh
- family-names: Goodall
given-names: Rhys
- family-names: Kolluru
given-names: Adeesh
- family-names: Falletta
given-names: Stefano
license: MIT
license-url: https://github.com/Radical-AI/torch-sim/blob/main/LICENSE
repository-code: https://github.com/Radical-AI/torch-sim
type: software
url: https://github.com/Radical-AI/torch-sim
doi: 10.5281/zenodo.7486816
version: 0.1.0
date-released: 2025-04-02
GitHub Events
Total
- Push event: 1
- Create event: 1
Last Year
- Push event: 1
- Create event: 1
Dependencies
- actions/checkout v4 composite
- actions/deploy-pages v4 composite
- actions/setup-python v5 composite
- actions/upload-pages-artifact v3 composite
- astral-sh/setup-uv v2 composite
- actions/checkout v4 composite
- gaurav-nelson/github-action-markdown-link-check v1 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- astral-sh/setup-uv v2 composite
- codecov/codecov-action v5 composite
- ase >=3.22.0
- h5py >=3.6.0
- matplotlib >=3.4.0
- numpy >=1.21.0
- pandas >=1.3.0
- pymatgen >=2022.0.0
- scikit-learn >=0.24.0
- scipy >=1.7.0
- seaborn >=0.11.0
- torch >=1.10.0
- torchsim >=0.1.0
- vasp >=6.3.0
- ase *
- ase >=3.22.0
- h5py >=3.8.0
- mace-torch >=0.3.12
- matplotlib *
- matplotlib >=3.7.0
- numpy >=1.24.0
- numpy *
- pymatgen >=2023.5.10
- torch >=2.0.0
- torch-sim >=0.1.0
- torch-sim-atomistic *