matbench-genmetrics

matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures - Published in JOSS (2024)

https://github.com/sparks-baird/matbench-genmetrics

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 9 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
1 of 3 committers (33.3%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

materials-informatics python

Last synced: 6 months ago · JSON representation ·

Repository

Generative materials benchmarking metrics, inspired by guacamol and CDVAE.

Basic Info

Host: GitHub
Owner: sparks-baird
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://matbench-genmetrics.readthedocs.io/
Size: 8.23 MB

Statistics

Stars: 40
Watchers: 1
Forks: 2
Open Issues: 17
Releases: 3

Topics

materials-informatics python

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Contributing License Citation Authors

PyPI - Downloads

This is not an official repository of Matbench, but eventually, it may be incorporated into Matbench

matbench-genmetrics

Generative materials benchmarking metrics, inspired by guacamol and CDVAE.

This repository provides standardized benchmarks for benchmarking generative models for crystal structure. Each benchmark has a fixed dataset, a predefined split, and notions of best (i.e. metrics) associated with it.

NOTE: This project is separate from https://matbench-discovery.materialsproject.org/ which provides a slick leaderboard and package for benchmarking ML models on crystal stability prediction from unrelaxed structures. This project instead looks at assessing the quality of generative models for crystal structures.

Getting Started

Installation, a dummy example, output metrics for the example, and descriptions of the benchmark metrics.

Installation

bash pip install matbench-genmetrics

See Advanced Installation for more information.

Example

NOTE: be sure to set dummy=False for the real/full benchmark run. MPTSMetrics10 is intended for fast prototyping and debugging, as it assumes only 10 generated structures.

```python

from matbenchgenmetrics.mptimesplit.utils.gen import DummyGenerator from matbenchgenmetrics.core.metrics import MPTSMetrics10, MPTSMetrics100, MPTSMetrics1000, MPTSMetrics10000 mptm = MPTSMetrics10(dummy=True) for fold in mptm.folds: trainvalinputs = mptm.gettrainandvaldata(fold) dg = DummyGenerator() dg.fit(trainvalinputs) genstructures = dg.gen(n=mptm.numgen) mptm.evaluateandrecord(fold, genstructures) print(mptm.recordedmetrics) ```

python { 0: { "validity": 0.4375, "coverage": 0.0, "novelty": 1.0, "uniqueness": 0.9777777777777777, }, 1: { "validity": 0.4390681003584229, "coverage": 0.0, "novelty": 1.0, "uniqueness": 0.9333333333333333, }, 2: { "validity": 0.4401197604790419, "coverage": 0.0, "novelty": 1.0, "uniqueness": 0.8222222222222222, }, 3: { "validity": 0.4408740359897172, "coverage": 0.0, "novelty": 1.0, "uniqueness": 0.8444444444444444, }, 4: { "validity": 0.4414414414414415, "coverage": 0.0, "novelty": 1.0, "uniqueness": 0.9111111111111111, }, }

Metrics

| Metric | Description | | ---------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Validity | A loose measure of how "valid" the set of generated structures are by comparing the space group number distribution of the generated structures with the benchmark data. Formally, this is one minus (Wasserstein distance between distribution of space group numbers for train and generated structures divided by distance of dummy case between train and space_group_number == 1). See also https://github.com/sparks-baird/matbench-genmetrics/issues/44 | | Coverage | A form of "rediscovery", where structures from the future that were held out were "discovered" by the generative model, i.e., when the generative model "predicted the future". Formally, this is the match counts between held-out test structures and generated structures divided by number of test structures.| | Novelty | A measure of how novel the generated structures are relative to the structures that were used to train the generative model. Formally, this is one minus (match counts between train structures and generated structures divided by number of generated structures).| | Uniqueness | A measure of whether the generative model is suggesting repeat structures or not. Formally, this is one minus (non-self-comparing match counts within generated structures divided by total possible non-self-comparing matches).|

A match is when StructureMatcher(stol=0.5, ltol=0.3, angle_tol=10.0).fit(s1, s2) evaluates to True.

Detailed descriptions of the metrics are given on the Metrics page.

We performed a "slow march of time" benchmarking study, which uses the mp-time-split data from a future fold as the "generated" structures for the previous fold. The results are presented in the charts below. See the corresponding notebook for details.

Slow March of Time benchmarking

Advanced Installation

PyPI (`pip`) installation

Create and activate a new conda environment named matbench-genmetrics (-n) with python==3.11.* or your preferred Python version, then install matbench-genmetrics via pip.

bash conda create -n matbench-genmetrics python==3.11.* conda activate matbench-genmetrics pip install matbench-genmetrics

Editable installation

In order to set up the necessary environment:

clone and enter the repository via:

bash git clone https://github.com/sparks-baird/matbench-genmetrics.git cd matbench-genmetrics

create and activate a new conda environment (optional, but recommended)

bash conda env create --name matbench-genmetrics python==3.11.* conda activate matbench-genmetrics

perform an editable (-e) installation in the current directory (.):

bash pip install -e .

NOTE: Some changes, e.g. in setup.cfg, might require you to run pip install -e . again.

Optional and needed only once after git clone:

install several pre-commit git hooks with:

bash pre-commit install # You might also want to run `pre-commit autoupdate`

and checkout the configuration under .pre-commit-config.yaml. The -n, --no-verify flag of git commit can be used to deactivate pre-commit hooks temporarily.

install nbstripout git hooks to remove the output cells of committed notebooks with:

bash nbstripout --install --attributes notebooks/.gitattributes

This is useful to avoid large diffs due to plots in your notebooks. A simple nbstripout --uninstall will revert these changes.

Then take a look into the scripts and notebooks folders.

Dependency Management & Reproducibility

Always keep your abstract (unpinned) dependencies updated in environment.yml and eventually in setup.cfg if you want to ship and install your package via pip later on.
Create concrete dependencies as environment.lock.yml for the exact reproduction of your environment with:

bash conda env export -n matbench-genmetrics -f environment.lock.yml

For multi-OS development, consider using --no-builds during the export. 3. Update your current environment with respect to a new environment.lock.yml using:

bash conda env update -f environment.lock.yml --prune

Project Organization

txt ├── AUTHORS.md <- List of developers and maintainers. ├── CHANGELOG.md <- Changelog to keep track of new features and fixes. ├── CONTRIBUTING.md <- Guidelines for contributing to this project. ├── Dockerfile <- Build a docker container with `docker build .`. ├── LICENSE.txt <- License as chosen on the command-line. ├── README.md <- The top-level README for developers. ├── configs <- Directory for configurations of model & application. ├── data │ ├── external <- Data from third party sources. │ ├── interim <- Intermediate data that has been transformed. │ ├── processed <- The final, canonical data sets for modeling. │ └── raw <- The original, immutable data dump. ├── docs <- Directory for Sphinx documentation in rst or md. ├── environment.yml <- The conda environment file for reproducibility. ├── models <- Trained and serialized models, model predictions, │ or model summaries. ├── notebooks <- Jupyter notebooks. Naming convention is a number (for │ ordering), the creator's initials and a description, │ e.g. `1.0-fw-initial-data-exploration`. ├── pyproject.toml <- Build configuration. Don't change! Use `pip install -e .` │ to install for development or to build `tox -e build`. ├── references <- Data dictionaries, manuals, and all other materials. ├── reports <- Generated analysis as HTML, PDF, LaTeX, etc. │ └── figures <- Generated plots and figures for reports. ├── scripts <- Analysis and production scripts which import the │ actual PYTHON_PKG, e.g. train_model. ├── setup.cfg <- Declarative configuration of your project. ├── setup.py <- [DEPRECATED] Use `python setup.py develop` to install for │ development or `python setup.py bdist_wheel` to build. ├── src │ └── matbench_genmetrics <- Actual Python package where the main functionality goes. ├── tests <- Unit tests which can be run with `pytest`. ├── .coveragerc <- Configuration for coverage reports of unit tests. ├── .isort.cfg <- Configuration for git hook that sorts imports. └── .pre-commit-config.yaml <- Configuration of pre-commit git hooks.

Citing

Baird, S.G.; Sayeed, H.M.; Montoya, J.; Sparks, T.D. (2024). matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures. Journal of Open Source Software, 9(97), 5618, https://doi.org/10.21105/joss.05618

bibtex @article{Baird2024, doi = {10.21105/joss.05618}, url = {https://doi.org/10.21105/joss.05618}, year = {2024}, publisher = {The Open Journal}, volume = {9}, number = {97}, pages = {5618}, author = {Sterling G. Baird and Hasan M. Sayeed and Joseph Montoya and Taylor D. Sparks}, title = {matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures}, journal = {Journal of Open Source Software} }

Note

This project has been set up using PyScaffold 4.2.2.post1.dev2+ge50b5e1 and the dsproject extension 0.7.2.post1.dev2+geb5d6b6.

Owner

Name: Sparks/Baird Materials Informatics
Login: sparks-baird
Kind: organization
Email: sterling.baird@utah.edu
Location: United States of America

Repositories: 63
Profile: https://github.com/sparks-baird

Sterling Baird and Taylor Sparks Materials Informatics Projects

JOSS Publication

matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures

Published

May 27, 2024

DOI

10.21105/joss.05618

Volume 9, Issue 97, Page 5618

Authors

Sterling G. Baird

Materials Science & Engineering, University of Utah, United States of America, Acceleration Consortium, University of Toronto. 80 St George St, Toronto, ON Canada

Hasan M. Sayeed

Materials Science & Engineering, University of Utah, United States of America

Joseph Montoya

Toyota Research Institute, Los Altos, CA, United States of America

Taylor D. Sparks

Materials Science & Engineering, University of Utah, United States of America

Editor

Sophie Beck

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Baird
  given-names: Sterling G.
  orcid: "https://orcid.org/0000-0002-4491-6876"
- family-names: Sayeed
  given-names: Hasan M.
  orcid: "https://orcid.org/0000-0002-6583-7755"
- family-names: Montoya
  given-names: Joseph
  orcid: "https://orcid.org/0000-0001-5760-2860"
- family-names: Sparks
  given-names: Taylor D.
  orcid: "https://orcid.org/0000-0001-8020-7711"
contact:
- family-names: Baird
  given-names: Sterling G.
  orcid: "https://orcid.org/0000-0002-4491-6876"
doi: 10.5281/zenodo.10840604
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Baird
    given-names: Sterling G.
    orcid: "https://orcid.org/0000-0002-4491-6876"
  - family-names: Sayeed
    given-names: Hasan M.
    orcid: "https://orcid.org/0000-0002-6583-7755"
  - family-names: Montoya
    given-names: Joseph
    orcid: "https://orcid.org/0000-0001-5760-2860"
  - family-names: Sparks
    given-names: Taylor D.
    orcid: "https://orcid.org/0000-0001-8020-7711"
  date-published: 2024-05-27
  doi: 10.21105/joss.05618
  issn: 2475-9066
  issue: 97
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5618
  title: "matbench-genmetrics: A Python library for benchmarking crystal
    structure generative models using time-based splits of Materials
    Project structures"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05618"
  volume: 9
title: "matbench-genmetrics: A Python library for benchmarking crystal
  structure generative models using time-based splits of Materials
  Project structures"

GitHub Events

Total

Watch event: 6
Fork event: 1

Last Year

Watch event: 6
Fork event: 1

Committers

Last synced: 7 months ago

All Time

Total Commits: 337
Total Committers: 3
Avg Commits per committer: 112.333
Development Distribution Score (DDS): 0.027

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
sgbaird	s**d@u**u	328
hasan	h**3@g**m	7
Janosh Riebesell	j**l@g**m	2

Committer Domains (Top 20 + Academic)

utah.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 28
Total pull requests: 52
Average time to close issues: 3 months
Average time to close pull requests: 4 days
Total issue authors: 6
Total pull request authors: 3
Average comments per issue: 2.68
Average comments per pull request: 0.52
Merged pull requests: 51
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sgbaird (22)
kjappelbaum (1)
ml-evs (1)
sp8rks (1)
jamesrhester (1)
hasan-sayeed (1)

Pull Request Authors

sgbaird (50)
janosh (1)
hasan-sayeed (1)

Top Labels

Issue Labels

enhancement (2)

Pull Request Labels

Packages

Total packages: 3
Total downloads:
- pypi 65 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 32
Total maintainers: 1

proxy.golang.org: github.com/sparks-baird/matbench-genmetrics

Documentation: https://pkg.go.dev/github.com/sparks-baird/matbench-genmetrics#section-documentation
License: mit
Latest release: v0.6.5
published almost 2 years ago

Versions: 17
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.5%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

pypi.org: matbench-genmetrics

Generative materials benchmarking metrics, inspired by CDVAE.

Homepage: https://github.com/sparks-baird/matbench-genmetrics/
Documentation: https://matbench-genmetrics.readthedocs.io
License: MIT
Latest release: 0.6.5
published almost 2 years ago

Versions: 14
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 65 Last month

Rankings

Dependent packages count: 6.6%

Stargazers count: 17.2%

Average: 22.1%

Forks count: 23.2%

Dependent repos count: 30.6%

Downloads: 33.1%

Maintainers (1)

sgbaird

Last synced: 6 months ago

conda-forge.org: matbench-genmetrics

Homepage: https://github.com/sparks-baird/matbench-genmetrics/
License: MIT
Latest release: 0.2.1
published over 3 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Stargazers count: 53.0%

Average: 54.9%

Forks count: 56.7%

Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi

ipykernel *
myst-parser *
nbsphinx *
nbsphinx-link *
sphinx >=3.2.1
sphinx_copybutton *
sphinx_rtd_theme *

.github/workflows/ci.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
coverallsapp/github-action master composite

.github/workflows/draft-pdf.yml actions

actions/checkout v3 composite
actions/upload-artifact v1 composite
openjournals/openjournals-draft-action master composite

Dockerfile docker

mcr.microsoft.com/vscode/devcontainers/python 0-${VARIANT} build

pyproject.toml pypi

setup.py pypi

environment.yml conda

ipython
matplotlib
pip
plotly
pre_commit
pytest
pytest-cov
python >=3.6
python-kaleido
sphinx
tox

matbench-genmetrics

Science Score: 100.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

matbench-genmetrics

Getting Started

Installation

Example

Metrics

Advanced Installation

PyPI (pip) installation

Editable installation

Dependency Management & Reproducibility

Project Organization

Citing

Note

Owner

JOSS Publication

matbench-genmetrics: A Python library for benchmarking crystal structure generative models using time-based splits of Materials Project structures

Authors

Editor

Tags

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/sparks-baird/matbench-genmetrics

Rankings

pypi.org: matbench-genmetrics

Rankings

Maintainers (1)

conda-forge.org: matbench-genmetrics

Rankings

Dependencies

PyPI (`pip`) installation