GMP-Featurizer

GMP-Featurizer: A parallelized Python package for efficiently computing the Gaussian Multipole features of atomic systems - Published in JOSS (2023)

https://github.com/tri-amdd/gmp-featurizer

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, acs.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

Feature calculator for GMP features

Basic Info
  • Host: GitHub
  • Owner: TRI-AMDD
  • License: apache-2.0
  • Language: C++
  • Default Branch: main
  • Size: 23.2 MB
Statistics
  • Stars: 5
  • Watchers: 3
  • Forks: 2
  • Open Issues: 1
  • Releases: 1
Created over 3 years ago · Last pushed over 2 years ago
Metadata Files
Readme Contributing License

README.md

GMP-featurizer

Testing - main Linting

This package is used to efficiently and accurately compute the GMP features and their derivatives for any chemical systems. The computation is also parallelized via Ray.

The details of the theory behind the Gaussian Multipole descriptors can be found in the original paper or in its arxiv version

Part of the code of this package is based on the AmpTorch package

Installation

To install this package, simply clone this repo, git clone https://github.com/TRI-AMDD/GMP-featurizer cd GMP-featurizer

Then install the requirements and the package itself pip install -r requirements.txt pip install -e .

Basic usage

Please refer to the example notebooks for better and detailed tutorials

An example "cif" file is provided in the "examples" directory

Import modules and load data

``` import numpy as np from GMPFeaturizer import GMPFeaturizer, ASEAtomsConverter, PymatgenStructureConverter from ase.io import read as aseread

Loading cif file as a ase atoms object

image = aseread("./examples/test.cif")

The input to the featurizer should be a non-empty list

images = [image]

initialize the converter, in this case it's the converter for ASE atoms objects

There is also a pre-existing converter for pymatgen Structure objects as well

converter = ASEAtomsConverter()

converter = PymatgenStructureConverter()

```

Setup the featurizer

The list of features is the Cartesian product of orders and sigams (except for order -1, which correspond just local electron density, so different simgas does not matter. Thus, there is only one feature for order -1).

With this setting, the list of features are

[(-1, 0), (0, 0.1), (0, 0.2), (0, 0.3), (1, 0.1), (1, 0.2), (1, 0.3), (2, 0.1), (2, 0.2), (2, 0.3)]

where the first number is the order of the MCSH angular probe, and the second number is the sigma of the Gaussian radial probe ``` GMPs = { "GMPs": {
"orders": [-1, 0, 1, 2], "sigmas": [0.1, 0.2, 0.3]
}, # path to the pseudo potential file "psppath": "/NC-SR.gpsp", # basically the accuracy of the resulting features "overlapthreshold": 1e-16, # whether the features are squared, #no need to change if you are not considering the feature derivatives # "square": False, }

featurizer = GMPFeaturizer(GMPs=GMPs, converter=converter, calcderivatives=True, verbose=True) ``` Set calcderivatives=True if you want to get the feature derivatives w.r.t. atom positions, which are stored in the form of sparse matrices.

Calculate features and access data

Use the "cores" argument to change the number of cores for parallelization. Also converted needed to be specified, ``` result = featurizer.prepare_features(images, cores=5)

features = [entry["features"] for entry in result] featureprimes = [entry["featureprimes"] for entry in result] ```

Specifying the list of GMP features

It's also possible to manually specify the list of GMP features to be computed, instead of specifying orders and sigmas. GMPs = { "GMPs_detailed_list": [(-1,0), (0, 0.1), (0, 0.2), (0, 0.3), (1, 0.2), (1, 0.3), (2, 0.3)], "psp_path": "./NC-SR.gpsp", # path to the pseudo potential file "overlap_threshold": 1e-16, # basically the accuracy of the resulting features # "square": False, # whether the features are squared, no need to change if you are not get the feature derivatives }

Whole Script

``` import numpy as np from GMPFeaturizer import GMPFeaturizer, ASEAtomsConverter, PymatgenStructureConverter

load data

from ase.io import read as aseread image = aseread("./examples/test.cif") images = [image]

converter = ASEAtomsConverter()

converter = PymatgenStructureConverter()

setup featurizer

GMPs = { "GMPs": {
"orders": [-1, 0, 1, 2], "sigmas": [0.1, 0.2, 0.3]
}, # path to the pseudo potential file "psppath": "/NC-SR.gpsp", # basically the accuracy of the resulting features "overlapthreshold": 1e-16, # whether the features are squared, #no need to change if you are not considering the feature derivatives # "square": False, } featurizer = GMPFeaturizer(GMPs=GMPs, converter=converter, calc_derivatives=True, verbose=True)

calculate features

result = featurizer.prepare_features(images, cores=5)

access data

features = [entry["features"] for entry in result] featureprimes = [entry["featureprimes"] for entry in result] ```

Save calculated feature to / load calculated feature from local folder

Simply set "savefeatures=True" when calling the preparefeatures function.

The path to the local database is set when initializing the featurizer featurizer = GMPFeaturizer(GMPs=GMPs, converter=converter, calc_derivatives=False, feature_database="cache/features/") features = featurizer.prepare_features(images, cores=5, save_features=True)

License

Apache 2.0

Copyright 2023 Toyota Research Institute

Owner

  • Name: Toyota Research Institute - Accelerated Materials Design & Discovery (AMDD)
  • Login: TRI-AMDD
  • Kind: organization

JOSS Publication

GMP-Featurizer: A parallelized Python package for efficiently computing the Gaussian Multipole features of atomic systems
Published
August 10, 2023
Volume 8, Issue 88, Page 5476
Authors
Xiangyun Lei ORCID
Toyota Research Institute, Los Altos, CA, United States of America
Joseph Montoya ORCID
Toyota Research Institute, Los Altos, CA, United States of America
Editor
Rachel Kurchin ORCID
Tags
Parallelization Machine Learning Chemistry Molecular Dynamics

GitHub Events

Total
  • Fork event: 1
Last Year
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 149
  • Total Committers: 3
  • Avg Commits per committer: 49.667
  • Development Distribution Score (DDS): 0.087
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Ray-Lei r****i@t****l 136
Joseph Montoya j****a@t****l 12
Bradley Dice b****e@b****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 7
  • Total pull requests: 4
  • Average time to close issues: 19 days
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 5
  • Total pull request authors: 3
  • Average comments per issue: 3.14
  • Average comments per pull request: 1.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • isdanni (2)
  • ltimmerman3 (1)
  • yw-fang (1)
  • JosephMontoya-TRI (1)
  • bdice (1)
Pull Request Authors
  • JosephMontoya-TRI (2)
  • bdice (1)
  • montoyjh (1)
Top Labels
Issue Labels
Pull Request Labels