matgen_baselines
Science Score: 75.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, springer.com, acs.org -
○Academic email domains
-
✓Institutional organization owner
Organization bartel-group has institutional domain (bartel.cems.umn.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (7.7%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: Bartel-Group
- License: mit
- Language: Python
- Default Branch: main
- Size: 55.1 MB
Statistics
- Stars: 8
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Baselines for generative modeling of crystals
This package facilitates the generation of inorganic crystalline materials through random enumeration and ion exchange. It supports generating materials optimized for thermodynamic stability, targeting a specific electronic band gap, or achieving a desired bulk modulus. These metrics establish benchmarks for evaluating the performance of generative AI models in materials science.
A preprint describing the methods in this package can be found on arXiv.
Table of Contents
- Stability and novelty leaderboard
- Installation
- Usage
- Configuration
- Running Generation
- Calculating Decomposition Energies
- Assessing MP Novelty
- How to Cite
Stability and novelty leaderboard
Below are performance metrics for select models trained on the MP-20 dataset. Note this list is not exhaustive, but rather a sampling of different model types (diffusion-based, LLMs, and variational autoencoders).
We also provide results from two template-based methods (described here) to serve as a baseline.
Generative models
| Method | Median $\Delta E_{\mathrm{d}}$ (meV/atom) | Stability rate | Novelty rate | Novel prototype rate | Novel prototype stability rate | |--------|:---------------------------:|:--------------:|:------------:|:--------------------:|:------------------------------:| | MatterGen | 188 | 3.0% | 91.8% | 7.2% | 0% | | FTCP | 205 | 2.0% | 38.2% | 1.8% | 0% | | CDVAE | 207 | 1.8% | 96.0% | 8.2% | 0% | | CrystaLLM | 442 | 2.4% | 98.2% | 1.0% | 0% |
Template-based methods
| Method | Median $\Delta E_{\mathrm{d}}$ (meV/atom) | Stability rate | Novelty rate | Novel prototype rate | Novel prototype stability rate | |--------|:---------------------------:|:--------------:|:------------:|:--------------------:|:------------------------------:| | Random | 409 | 1.4% | 98.6% | 0% | 0% | | Ion exchange | 85 | 9.2% | 72.4% | 0% | 0% |
These results are based on a random sampling of 500 novel structures (not already present in the Materials Project) generated by each method.
- Median $\Delta E_{\mathrm{d}}$: The median decomposition energy, which measures the thermodynamic stability of a material relative to competing phases. Lower values are considered better, signifying materials closer to the hull.
- Stability rate: The percentage of generated materials that are on the hull ($\Delta E_{\mathrm{d}} \leq$ 0) defined by stable entries from the Materials Project.
- Novelty rate: The percentage of materials generated that were not already present in the Materials Project. Only these were materials were considered when computing the stability rate and average decomposition energy.
- Novel prototype rate: Defined as the percentage of proposed materials whose structures cannot be indexed to a known prototype in the AFLOW Encyclopedia of Crystallographic Prototypes.
- Novel prototype stability rate: The stability rate of materials in these novel structure prototypes.
The bold text in each column denotes the best value achieved among all methods. For all metrics except Avg. $\Delta E_{\mathrm{d}}$, a higher value is considered better.
Installation
The package can be installed as follows:
bash
git clone https://github.umn.edu/bartel-group/matgen_baselines.git
cd matgen_baselines
python -m pip install .
For ML predictions of band gap and bulk modulus, the user should also install CGCNN. Since this package is not available on PyPI, the following procedure can be used:
bash
git clone https://github.com/txie-93/cgcnn.git
mv cgcnn base_cgcnn # install directory
mv cgcnn_setup.py base_cgcnn/setup.py
cd base_cgcnn
python -m pip install -e .
cd ../
Usage
The package supports two primary generation methods with optional ML-based filtering:
Generation Methods
Random Enumeration (
random_enum)Ion Exchange (
ion_exchange)- Generates materials by performing ion substitutions on known materials
- Can directly target specific properties during generation
- Can optionally use ML filtering for additional verification
- Best for targeted exploration around known stable compounds
ML Filtering
ML filtering can be applied to either generation method to: - Predict stability using CHGNet - Predict band gaps using CGCNN - Predict bulk modulus using CGCNN
Configuration
Tasks are configured in a config.json file. Here are all the supported combinations with example configurations:
1. Basic Random Enumeration
Generate structures without any property targeting:
json
{
"method": "random_enum",
"num_strucs": 500,
"filepath": "Randomly-Enumerated"
}
2. Random Enumeration with ML Filtering
Generate structures and filter for stability or specific properties using ML:
json
{
"method": "random_enum",
"num_strucs": 500,
"ml_filter": {
"type": "stability", # or "band_gap" or "bulk_modulus"
"threshold": 0.0, # for stability: maximum allowed energy above hull (in eV/atom)
"target": 3, # for properties: desired band_gap (eV) or bulk_modulus (GPa)
"threshold": 0.5 # for properties: allowed deviation from target
},
"filepath": "Random-Enum-ML-Filtered/Property"
}
3. Ion Exchange with Direct Property Targeting
Generate structures using ion exchange, targeting stability or properties during generation:
json
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "stability",
"threshold": 0.0,
"filepath": "Ion-Exchanged/Property"
}
4. Ion Exchange with Additional ML Verification
Generate structures using ion exchange, then verify with ML:
json
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "stability",
"threshold": 0.0,
"ml_filter": {
"type": "stability",
"threshold": 0.0
},
"filepath": "Ion-Exchange-ML-Filtered/Stable"
}
Full Example Configuration
Here's a complete example showing all possible combinations:
json
{
"mp_api_key": "YOUR_MP_API_KEY_HERE",
"tasks": [
{
"method": "random_enum",
"num_strucs": 500,
"filepath": "Randomly-Enumerated"
},
{
"method": "random_enum",
"num_strucs": 500,
"ml_filter": {
"type": "stability",
"threshold": 0.0
},
"filepath": "Random-Enum-ML-Filtered/Stable"
},
{
"method": "random_enum",
"num_strucs": 500,
"ml_filter": {
"type": "band_gap",
"target": 3,
"threshold": 0.5
},
"filepath": "Random-Enum-ML-Filtered/Bandgap"
},
{
"method": "random_enum",
"num_strucs": 500,
"ml_filter": {
"type": "bulk_modulus",
"target": 400,
"threshold": 200
},
"filepath": "Random-Enum-ML-Filtered/Bulk-Modulus"
},
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "stability",
"threshold": 0.0,
"filepath": "Ion-Exchanged/Stable"
},
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "stability",
"threshold": 0.0,
"ml_filter": {
"type": "stability",
"threshold": 0.0
},
"filepath": "Ion-Exchange-ML-Filtered/Stable"
},
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "band_gap",
"target": 3,
"threshold": 0.5,
"filepath": "Ion-Exchanged/Bandgap"
},
{
"method": "ion_exchange",
"num_strucs": 500,
"filter_type": "bulk_modulus",
"target": 400,
"threshold": 200,
"filepath": "Ion-Exchanged/Bulk-Modulus"
}
]
}
Running Generation
Once your config.json is set up, start generation with:
bash
python generate.py
Generated structures will be saved as CIF files in the specified output directories.
Calculating Decomposition Energies
The package includes a standalone script calc_decomp.py for calculating decomposition energies of materials using the Materials Project database as a reference. This script processes a CSV file containing material compositions and their computed energies, calculating the energy above hull or decomposition energy for each entry.
⚠️ IMPORTANT: Input energies must be: - In units of eV/atom - From GGA/GGA+U calculations (or MLPs trained on GGA/GGA+U calculations) with Materials Project corrections already applied. Energies obtained from CHGNet are directly compatible.
Usage
Add your Materials Project API key to the script:
- Open
calc_decomp.py - In the
calculate_decomp_energies()function, set the API key associated with your Materials Project account
- Open
Prepare your input CSV file with the following format:
csv composition,energy_per_atom Fe3Al,-7.4878 AlFe2,-7.0365Run the script:
bash python calc_decomp.py input_file.csvThe script will:
- Process each composition
- Calculate decomposition energies using Materials Project data
- Add results to a new 'decomp_energy' column
- Save the updated data back to your input file
Output Format
The script will update your input file with a new column:
csv
composition,energy_per_atom,decomp_energy
Fe3Al,-7.4878,0.0
AlFe2,-7.0365,0.1
- Positive decomposition energies indicate the material is unstable
- Zero or negative values indicate the material is stable
- Values are in eV/atom
Assessing MP Novelty
The package includes a script assess_mp_novelty.py for checking the novelty of crystal structures by comparing them against the Materials Project database. A structure is considered novel if either:
- No materials with the same composition exist in MP, or
- No materials with matching structure exist in MP for the same composition
While this does not necessarily mean the material has never been synthesized, it confirms the material is absent from the MP-20 dataset, which is commonly used to train generative models for inorganic crystals.
Usage
Set your Materials Project API key:
- Open
assess_mp_novelty.py - Replace
'YOUR_API_KEY'with your actual Materials Project API key in theNoveltyAssessmentclass initialization
- Open
Place your CIF files in a directory (default:
Structures/)Run the script:
bash python assess_mp_novelty.pyThe script will:
- Process all CIF files in the structures directory
- Check each structure against the Materials Project database
- Generate a detailed report showing novelty status for each structure
- Save results to
novelty_results.json
How to Cite
If you use this code, please consider citing the below paper (available on arXiv):
bibtex
@article{szymanski_2025_matgen_baselines,
title={Establishing baselines for generative discovery of inorganic crystals},
DOI={10.48550/arXiv.2501.02144},
journal={arXiv},
author={Szymanski, Nathan J. and Bartel, Christopher J.},
year={2025}
}
You may also consider citing the below papers that this package relies on:
```bibtex @article{ong2013pymatgen, title={Python Materials Genomics (pymatgen): A robust, open-source python library for materials analysis}, author={Ong, Shyue Ping and Richards, William Davidson and Jain, Anubhav and Hautier, Geoffroy and Kocher, Michael and Cholia, Shreyas and Gunter, Dan and Chevrier, Vincent L. and Persson, Kristin A. and Ceder, Gerbrand}, journal={Computational Materials Science}, volume={68}, pages={314--319}, year={2013}, DOI={10.1016/j.commatsci.2012.10.028} }
@article{jain2013materials, title={Commentary: The Materials Project: A materials genome approach to accelerating materials innovation}, author={Jain, Anubhav and Ong, Shyue Ping and Hautier, Geoffroy and Chen, Wei and Richards, William Davidson and Dacek, Stephen and Cholia, Shreyas and Gunter, Dan and Skinner, David and Ceder, Gerbrand and Persson, Kristin A.}, journal={APL Materials}, volume={1}, pages={011002}, year={2013}, DOI={10.1063/1.4812323} }
@article{eckert2024aflow_library, title={The AFLOW library of crystallographic prototypes: Part 4}, author={Eckert, Hagen and Divilov, Simon and Mehl, Michael J. and Hicks, David and Zettel, Adam C. and Esters, Marco and Campilongo, Xiomara and Curtarolo, Stefano}, journal={Computational Materials Science}, volume={240}, pages={112988}, year={2024}, DOI={10.1016/j.commatsci.2024.112988} }
@article{hautier2011substitutions, title={Data Mined Ionic Substitutions for the Discovery of New Compounds}, author={Hautier, Geoffroy and Fischer, Chris and Ehrlacher, Virginie and Jain, Anubhav and Ceder, Gerbrand}, journal={Inorganic Chemistry}, volume={50}, number={2}, pages={656--663}, year={2011}, DOI={10.1021/ic102031h} }
@article{deng2023chgnet, title={CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling}, author={Deng, B. and Zhong, P. and Jun, K. and others}, journal={Nature Machine Intelligence}, volume={5}, pages={1031--1041}, year={2023}, DOI={10.1038/s42256-023-00716-3} }
@article{xie2018cgcnn, title={Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties}, author={Xie, Tian and Grossman, Jeffrey C.}, journal={Physical Review Letters}, volume={120}, number={145301}, year={2018}, DOI={10.1103/PhysRevLett.120.145301} } ```
Owner
- Name: The Design of Materials on Computers Lab
- Login: Bartel-Group
- Kind: organization
- Location: United States of America
- Website: https://bartel.cems.umn.edu/
- Repositories: 1
- Profile: https://github.com/Bartel-Group
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this code, please consider citing the below paper."
title: "Establishing Baselines for Generative Discovery of Inorganic Crystals"
authors:
- family-names: "Szymanski"
given-names: "Nathan J."
- family-names: "Bartel"
given-names: "Christopher J."
version: 1.0.0
date-released: 2025-01-07
arxiv: https://arxiv.org/abs/2501.02144
doi: 10.48550/arXiv.2501.02144
repository-code: https://github.com/Bartel-Group/matgen_baselines
license: MIT
keywords:
- materials-discovery
- generative-ai
- inorganic-crystals
- machine-learning
GitHub Events
Total
- Watch event: 7
- Member event: 1
- Push event: 41
- Create event: 2
Last Year
- Watch event: 7
- Member event: 1
- Push event: 41
- Create event: 2
Dependencies
- chgnet >=0.3.0
- mp-api >=0.37.4
- numpy >=1.20.0
- pandas >=1.5.0
- pymatgen >=2024.1.0
- scikit-learn >=1.0.0
- torch >=2.0.0
- tqdm >=4.65.0
- ase ==3.22.1
- keras ==2.3.1
- matminer ==0.6.2
- matplotlib ==3.3.4
- numpy ==1.18.5
- pandas ==1.1.5
- pip ==21.3.1
- plotly ==5.17.0
- pymatgen ==2019.12.22
- scikit-learn ==0.24.2
- scipy ==1.5.4
- seaborn ==0.11.2
- spglib ==2.0.2
- tensorboard ==1.15.0
- tensorflow ==1.15.5
- tqdm ==4.64.1