matbench-genmetrics
matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures - Published in JOSS (2024)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 9 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
1 of 3 committers (33.3%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Repository
Generative materials benchmarking metrics, inspired by guacamol and CDVAE.
Basic Info
- Host: GitHub
- Owner: sparks-baird
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://matbench-genmetrics.readthedocs.io/
- Size: 8.23 MB
Statistics
- Stars: 40
- Watchers: 1
- Forks: 2
- Open Issues: 17
- Releases: 3
Topics
Metadata Files
README.md
<!-- These are examples of badges you might also want to add to your README. Update the URLs accordingly.
-->
<!--- > NOTE: This is a WIP repository (as of 2022-08-06) being developed in parallel with
xtal2png and mp-time-split. Feedback and contributions welcome! --->
This is not an official repository of Matbench, but eventually, it may be incorporated into Matbench
matbench-genmetrics 
Generative materials benchmarking metrics, inspired by guacamol and CDVAE.
This repository provides standardized benchmarks for benchmarking generative models for crystal structure. Each benchmark has a fixed dataset, a predefined split, and notions of best (i.e. metrics) associated with it.

NOTE: This project is separate from https://matbench-discovery.materialsproject.org/ which provides a slick leaderboard and package for benchmarking ML models on crystal stability prediction from unrelaxed structures. This project instead looks at assessing the quality of generative models for crystal structures.
Getting Started
Installation, a dummy example, output metrics for the example, and descriptions of the benchmark metrics.
Installation
bash
pip install matbench-genmetrics
See Advanced Installation for more information.
Example
NOTE: be sure to set
dummy=Falsefor the real/full benchmark run.MPTSMetrics10is intended for fast prototyping and debugging, as it assumes only 10 generated structures.
```python
from matbenchgenmetrics.mptimesplit.utils.gen import DummyGenerator from matbenchgenmetrics.core.metrics import MPTSMetrics10, MPTSMetrics100, MPTSMetrics1000, MPTSMetrics10000 mptm = MPTSMetrics10(dummy=True) for fold in mptm.folds: trainvalinputs = mptm.gettrainandvaldata(fold) dg = DummyGenerator() dg.fit(trainvalinputs) genstructures = dg.gen(n=mptm.numgen) mptm.evaluateandrecord(fold, genstructures) print(mptm.recordedmetrics) ```
python
{
0: {
"validity": 0.4375,
"coverage": 0.0,
"novelty": 1.0,
"uniqueness": 0.9777777777777777,
},
1: {
"validity": 0.4390681003584229,
"coverage": 0.0,
"novelty": 1.0,
"uniqueness": 0.9333333333333333,
},
2: {
"validity": 0.4401197604790419,
"coverage": 0.0,
"novelty": 1.0,
"uniqueness": 0.8222222222222222,
},
3: {
"validity": 0.4408740359897172,
"coverage": 0.0,
"novelty": 1.0,
"uniqueness": 0.8444444444444444,
},
4: {
"validity": 0.4414414414414415,
"coverage": 0.0,
"novelty": 1.0,
"uniqueness": 0.9111111111111111,
},
}
Metrics
| Metric | Description |
| ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Validity | A loose measure of how "valid" the set of generated structures are by comparing the space group number distribution of the generated structures with the benchmark data. Formally, this is one minus (Wasserstein distance between distribution of space group numbers for train and generated structures divided by distance of dummy case between train and space_group_number == 1). See also https://github.com/sparks-baird/matbench-genmetrics/issues/44 |
| Coverage | A form of "rediscovery", where structures from the future that were held out were "discovered" by the generative model, i.e., when the generative model "predicted the future". Formally, this is the match counts between held-out test structures and generated structures divided by number of test structures.|
| Novelty | A measure of how novel the generated structures are relative to the structures that were used to train the generative model. Formally, this is one minus (match counts between train structures and generated structures divided by number of generated structures).|
| Uniqueness | A measure of whether the generative model is suggesting repeat structures or not. Formally, this is one minus (non-self-comparing match counts within generated structures divided by total possible non-self-comparing matches).|
A match is when StructureMatcher(stol=0.5, ltol=0.3, angle_tol=10.0).fit(s1, s2) evaluates to True.
Detailed descriptions of the metrics are given on the Metrics page.
We performed a "slow march of time" benchmarking study, which uses the mp-time-split data from a future fold as the "generated" structures for the previous fold. The results are presented in the charts below. See the corresponding notebook for details.

Advanced Installation
PyPI (pip) installation
Create and activate a new conda environment named matbench-genmetrics (-n) with python==3.11.* or your preferred Python version, then install matbench-genmetrics via pip.
bash
conda create -n matbench-genmetrics python==3.11.*
conda activate matbench-genmetrics
pip install matbench-genmetrics
Editable installation
In order to set up the necessary environment:
- clone and enter the repository via:
bash
git clone https://github.com/sparks-baird/matbench-genmetrics.git
cd matbench-genmetrics
- create and activate a new conda environment (optional, but recommended)
bash
conda env create --name matbench-genmetrics python==3.11.*
conda activate matbench-genmetrics
- perform an editable (
-e) installation in the current directory (.):
bash
pip install -e .
NOTE: Some changes, e.g. in
setup.cfg, might require you to runpip install -e .again.
Optional and needed only once after git clone:
- install several pre-commit git hooks with:
bash
pre-commit install
# You might also want to run `pre-commit autoupdate`
and checkout the configuration under .pre-commit-config.yaml.
The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.
- install nbstripout git hooks to remove the output cells of committed notebooks with:
bash
nbstripout --install --attributes notebooks/.gitattributes
This is useful to avoid large diffs due to plots in your notebooks.
A simple nbstripout --uninstall will revert these changes.
Then take a look into the scripts and notebooks folders.
Dependency Management & Reproducibility
- Always keep your abstract (unpinned) dependencies updated in
environment.ymland eventually insetup.cfgif you want to ship and install your package viapiplater on. - Create concrete dependencies as
environment.lock.ymlfor the exact reproduction of your environment with:
bash
conda env export -n matbench-genmetrics -f environment.lock.yml
For multi-OS development, consider using --no-builds during the export.
3. Update your current environment with respect to a new environment.lock.yml using:
bash
conda env update -f environment.lock.yml --prune
Project Organization
txt
├── AUTHORS.md <- List of developers and maintainers.
├── CHANGELOG.md <- Changelog to keep track of new features and fixes.
├── CONTRIBUTING.md <- Guidelines for contributing to this project.
├── Dockerfile <- Build a docker container with `docker build .`.
├── LICENSE.txt <- License as chosen on the command-line.
├── README.md <- The top-level README for developers.
├── configs <- Directory for configurations of model & application.
├── data
│ ├── external <- Data from third party sources.
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
│ └── raw <- The original, immutable data dump.
├── docs <- Directory for Sphinx documentation in rst or md.
├── environment.yml <- The conda environment file for reproducibility.
├── models <- Trained and serialized models, model predictions,
│ or model summaries.
├── notebooks <- Jupyter notebooks. Naming convention is a number (for
│ ordering), the creator's initials and a description,
│ e.g. `1.0-fw-initial-data-exploration`.
├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .`
│ to install for development or to build `tox -e build`.
├── references <- Data dictionaries, manuals, and all other materials.
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated plots and figures for reports.
├── scripts <- Analysis and production scripts which import the
│ actual PYTHON_PKG, e.g. train_model.
├── setup.cfg <- Declarative configuration of your project.
├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for
│ development or `python setup.py bdist_wheel` to build.
├── src
│ └── matbench_genmetrics <- Actual Python package where the main functionality goes.
├── tests <- Unit tests which can be run with `pytest`.
├── .coveragerc <- Configuration for coverage reports of unit tests.
├── .isort.cfg <- Configuration for git hook that sorts imports.
└── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.
Citing
Baird, S.G.; Sayeed, H.M.; Montoya, J.; Sparks, T.D. (2024). matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures. Journal of Open Source Software, 9(97), 5618, https://doi.org/10.21105/joss.05618
bibtex
@article{Baird2024, doi = {10.21105/joss.05618}, url = {https://doi.org/10.21105/joss.05618}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {97}, pages = {5618}, author = {Sterling G. Baird and Hasan M. Sayeed and Joseph Montoya and Taylor D. Sparks}, title = {matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures}, journal = {Journal of Open Source Software} }
Note
This project has been set up using PyScaffold 4.2.2.post1.dev2+ge50b5e1 and the dsproject extension 0.7.2.post1.dev2+geb5d6b6.
Owner
- Name: Sparks/Baird Materials Informatics
- Login: sparks-baird
- Kind: organization
- Email: sterling.baird@utah.edu
- Location: United States of America
- Repositories: 63
- Profile: https://github.com/sparks-baird
Sterling Baird and Taylor Sparks Materials Informatics Projects
JOSS Publication
matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures
Authors
Tags
materials informatics crystal structure generative modeling TimeSeriesSplit benchmarkingCitation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Baird
given-names: Sterling G.
orcid: "https://orcid.org/0000-0002-4491-6876"
- family-names: Sayeed
given-names: Hasan M.
orcid: "https://orcid.org/0000-0002-6583-7755"
- family-names: Montoya
given-names: Joseph
orcid: "https://orcid.org/0000-0001-5760-2860"
- family-names: Sparks
given-names: Taylor D.
orcid: "https://orcid.org/0000-0001-8020-7711"
contact:
- family-names: Baird
given-names: Sterling G.
orcid: "https://orcid.org/0000-0002-4491-6876"
doi: 10.5281/zenodo.10840604
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Baird
given-names: Sterling G.
orcid: "https://orcid.org/0000-0002-4491-6876"
- family-names: Sayeed
given-names: Hasan M.
orcid: "https://orcid.org/0000-0002-6583-7755"
- family-names: Montoya
given-names: Joseph
orcid: "https://orcid.org/0000-0001-5760-2860"
- family-names: Sparks
given-names: Taylor D.
orcid: "https://orcid.org/0000-0001-8020-7711"
date-published: 2024-05-27
doi: 10.21105/joss.05618
issn: 2475-9066
issue: 97
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 5618
title: "matbench-genmetrics: A Python library for benchmarking crystal
structure generative models using time-based splits of Materials
Project structures"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.05618"
volume: 9
title: "matbench-genmetrics: A Python library for benchmarking crystal
structure generative models using time-based splits of Materials
Project structures"
GitHub Events
Total
- Watch event: 6
- Fork event: 1
Last Year
- Watch event: 6
- Fork event: 1
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| sgbaird | s****d@u****u | 328 |
| hasan | h****3@g****m | 7 |
| Janosh Riebesell | j****l@g****m | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 28
- Total pull requests: 52
- Average time to close issues: 3 months
- Average time to close pull requests: 4 days
- Total issue authors: 6
- Total pull request authors: 3
- Average comments per issue: 2.68
- Average comments per pull request: 0.52
- Merged pull requests: 51
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- sgbaird (22)
- kjappelbaum (1)
- ml-evs (1)
- sp8rks (1)
- jamesrhester (1)
- hasan-sayeed (1)
Pull Request Authors
- sgbaird (50)
- janosh (1)
- hasan-sayeed (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
-
Total downloads:
- pypi 65 last-month
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 32
- Total maintainers: 1
proxy.golang.org: github.com/sparks-baird/matbench-genmetrics
- Documentation: https://pkg.go.dev/github.com/sparks-baird/matbench-genmetrics#section-documentation
- License: mit
-
Latest release: v0.6.5
published almost 2 years ago
Rankings
pypi.org: matbench-genmetrics
Generative materials benchmarking metrics, inspired by CDVAE.
- Homepage: https://github.com/sparks-baird/matbench-genmetrics/
- Documentation: https://matbench-genmetrics.readthedocs.io
- License: MIT
-
Latest release: 0.6.5
published almost 2 years ago
Rankings
Maintainers (1)
conda-forge.org: matbench-genmetrics
- Homepage: https://github.com/sparks-baird/matbench-genmetrics/
- License: MIT
-
Latest release: 0.2.1
published over 3 years ago
Rankings
Dependencies
- ipykernel *
- myst-parser *
- nbsphinx *
- nbsphinx-link *
- sphinx >=3.2.1
- sphinx_copybutton *
- sphinx_rtd_theme *
- actions/checkout v3 composite
- actions/download-artifact v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- coverallsapp/github-action master composite
- actions/checkout v3 composite
- actions/upload-artifact v1 composite
- openjournals/openjournals-draft-action master composite
- mcr.microsoft.com/vscode/devcontainers/python 0-${VARIANT} build
- ipython
- matplotlib
- pip
- plotly
- pre_commit
- pytest
- pytest-cov
- python >=3.6
- python-kaleido
- sphinx
- tox
