https://github.com/suncat-center/catlearn

A machine learning environment for atomic-scale modeling in surface science and catalysis.

https://github.com/suncat-center/catlearn

Science Score: 33.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, aps.org, zenodo.org
  • Committers with academic emails
    4 of 20 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (19.9%) to scientific vocabulary

Keywords

atomistic-machine-learning catalysis catalyst computational-chemistry machine-learning materials-informatics materials-science nanotechnology python

Keywords from Contributors

chemical-kinetics catalysis-informatics chemical-engineering chemical-reaction-networks combinatorics
Last synced: 5 months ago · JSON representation

Repository

A machine learning environment for atomic-scale modeling in surface science and catalysis.

Basic Info
Statistics
  • Stars: 114
  • Watchers: 18
  • Forks: 67
  • Open Issues: 11
  • Releases: 8
Topics
atomistic-machine-learning catalysis catalyst computational-chemistry machine-learning materials-informatics materials-science nanotechnology python
Created almost 8 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

CatLearn

An environment for atomistic machine learning in Python for applications in catalysis.

DOI Build Status Coverage Status Documentation Status PyPI version License: GPL v3

Utilities for building and testing atomic machine learning models. Gaussian Processes (GP) regression machine learning routines are implemented. These will take any numpy array of training and test feature matrices along with a vector of target values.

In general, any data prepared in this fashion can be fed to the GP routines, a number of additional functions have been added that interface with ASE. This integration allows for the manipulation of atoms objects through GP predictions, as well as dynamic generation of descriptors through use of the many ASE functions.

CatLearn also includes the MLNEB algorithm for efficient transition state search, and the MLMIN algorithm for efficient atomic structure optimization.

Please see the tutorials for a detailed overview of what the code can do and the conventions used in setting up the predictive models. For an overview of all the functionality available, please read the documentation.

Table of contents

Installation

(Back to top)

The easiest way to install the code is with:

shell $ pip install catlearn

This will automatically install the code as well as the dependencies.

Installation without dependencies

(Back to top)

If you want to install catlearn without dependencies, you can do:

shell $ pip install catlearn --no-deps

MLMIN and MLNEB will not need anything apart from ASE 3.17.0 or newer to run, but there are other parts of the code, which need the dependencies listed in requirements.txt

Developer installation

shell $ git clone https://github.com/SUNCAT-Center/CatLearn.git

And then put the <install_dir>/ into your $PYTHONPATH environment variable.

You can install dependencies in with:

shell $ pip install -r requirements.txt

Docker

To use the docker image, it is necessary to have docker installed and running. After cloning the project, build and run the image as follows:

shell $ docker build -t catlearn .

Then it is possible to use the image in two ways. It is possible to run the docker image as a bash environment in which CatLearn can be used will all dependencies in place.

shell $ docker run -it catlearn bash

Or python can be run from the docker image.

shell $ docker run -it catlearn python2 [file.py] $ docker run -it catlearn python3 [file.py]

Use Ctrl + d to exit the docker image when done.

Optional Dependencies

The tutorial scripts will generally output some graphical representations of the results etc. For these scripts, it is advisable to have at least matplotlib installed:

shell $ pip install matplotlib seaborn

Tutorials

(Back to top)

Helpful examples and test scripts are present in tutorials.

Usage

(Back to top)

Set up CatLearn's Gaussian Process model and make some predictions using the following lines of code:

```python import numpy as np from catlearn.regression import GaussianProcess

Define some input data.

trainfeatures = np.arange(200).reshape(50, 4) target = np.random.randomsample((50,)) test_features = np.arange(100).reshape(25, 4)

Setup the kernel.

kernel = [{'type': 'gaussian', 'width': 0.5}]

Train the GP model.

gp = GaussianProcess(kernellist=kernel, regularization=1e-3, trainfp=trainfeatures, traintarget=target, optimize_hyperparameters=True)

Get the predictions.

prediction = gp.predict(testfp=testfeatures) ```

Functionality

(Back to top)

There is much functionality in CatLearn to assist in handling atom data and building optimal models. This includes:

  • API to other codes:
  • Fingerprint generators:
    • Bulk systems
    • Support/slab systems
    • Discrete systems
  • Preprocessing routines:
    • Data cleaning
    • Feature elimination
    • Feature engineering
    • Feature extraction
    • Feature scaling
  • Regression methods:
    • Regularized ridge regression
    • Gaussian processes regression
  • Cross-validation:
    • K-fold cv
    • Ensemble k-fold cv
  • Machine Learning Algorithms
    • Machine Learning Nudged Elastic Band (ML-NEB) algorithm.
  • General utilities:
    • K-means clustering
    • Neighborlist generators
    • Penalty functions
    • SQLite db storage

How to cite CatLearn

(Back to top)

If you find CatLearn useful in your research, please cite

1) M. H. Hansen, J. A. Garrido Torres, P. C. Jennings, 
   Z. Wang, J. R. Boes, O. G. Mamun and T. Bligaard.
   An Atomistic Machine Learning Package for Surface Science and Catalysis.
   https://arxiv.org/abs/1904.00904

If you use CatLearn's ML-NEB module, please cite:

2) J. A. Garrido Torres, M. H. Hansen, P. C. Jennings,
   J. R. Boes and T. Bligaard. Phys. Rev. Lett. 122, 156001.
   https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.122.156001

Contribution

(Back to top)

Anyone is welcome to contribute to the project. Please see the contribution guide for help setting up a local copy of the code. There are some TODO items in the README files for the various modules that give suggestions on parts of the code that could be improved.

Owner

  • Name: SUNCAT-Center
  • Login: SUNCAT-Center
  • Kind: organization

GitHub Events

Total
  • Watch event: 11
  • Fork event: 5
Last Year
  • Watch event: 11
  • Fork event: 5

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 1,580
  • Total Committers: 20
  • Avg Commits per committer: 79.0
  • Development Distribution Score (DDS): 0.439
Past Year
  • Commits: 11
  • Committers: 3
  • Avg Commits per committer: 3.667
  • Development Distribution Score (DDS): 0.545
Top Committers
Name Email Commits
Paul C. Jennings j****c@g****m 886
mhangaard m****d@g****m 532
Jose A Garrido Torres j****t@s****u 82
Jacob Boes j****s@g****m 13
schlexer p****r@g****m 11
Jose A. Garrido Torres 3****s@u****m 9
Raul Flores r****2@g****m 7
Andrew Doyle a****5@g****m 6
Martin Hangaard Hansen h****d@s****u 6
Martin Hangaard Hansen m****n@t****m 5
Ziyun Wang z****g@l****m 5
Vladilsav Ivanistsev 5****v@u****m 4
Max Hoffmann m****n@g****m 3
Vieri Wijaya 5****6@u****m 2
Igor Kowalec 5****c@u****m 2
dependabot[bot] 4****]@u****m 2
Markus Ekvall m****l@d****u 2
Markus Ekvall m****l@M****l 1
mamunm m****m@s****u 1
Jiang Li 4****t@u****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 26
  • Total pull requests: 77
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 15
  • Total pull request authors: 14
  • Average comments per issue: 1.65
  • Average comments per pull request: 1.18
  • Merged pull requests: 68
  • Bot issues: 0
  • Bot pull requests: 3
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • pcjennings (5)
  • mhangaard (4)
  • jagarridotorres (3)
  • kimrojas (2)
  • mhoffman (2)
  • mustafaalsalmi1999 (1)
  • jan-janssen (1)
  • keeeto (1)
  • zyt0y (1)
  • schumannj (1)
  • ehermes (1)
  • vieri2006 (1)
  • ssyrnyk (1)
  • lixinyuu (1)
  • raulf2012 (1)
Pull Request Authors
  • mhangaard (37)
  • pcjennings (13)
  • jagarridotorres (6)
  • schlexer (3)
  • dependabot[bot] (3)
  • mamunm (2)
  • vladislavivanistsev (2)
  • jianglst (2)
  • Sudo-Raheel (2)
  • vieri2006 (2)
  • mhoffman (2)
  • raulf2012 (2)
  • jboes (1)
  • ikowalec (1)
Top Labels
Issue Labels
enhancement (5) bug (3) help wanted (2)
Pull Request Labels
enhancement (4) dependencies (3) bug (1) duplicate (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 127 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 28
  • Total maintainers: 3
pypi.org: catlearn

Machine Learning using atomic-scale calculations.

  • Versions: 25
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 127 Last month
Rankings
Forks count: 5.5%
Stargazers count: 7.3%
Dependent packages count: 10.0%
Average: 11.4%
Downloads: 12.5%
Dependent repos count: 21.7%
Last synced: 6 months ago
conda-forge.org: catlearn

Utilities for building and testing atomic machine learning models. Gaussian Processes (GP) regression machine learning routines are implemented. These will take any numpy array of training and test feature matrices along with a vector of target values. In general, any data prepared in this fashion can be fed to the GP routines, a number of additional functions have been added that interface with ASE. This integration allows for the manipulation of atoms objects through GP predictions, as well as dynamic generation of descriptors through use of the many ASE functions.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Forks count: 22.8%
Stargazers count: 32.3%
Dependent repos count: 34.0%
Average: 35.1%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

Pipfile pypi
  • gpflow * develop
  • matplotlib * develop
  • piprot * develop
  • pyinstrument * develop
  • pytest-cov * develop
  • recommonmark * develop
  • seaborn * develop
  • setuptools * develop
  • sphinx * develop
  • sphinx-autobuild * develop
  • sphinx-rtd-theme * develop
  • sphinxcontrib-napoleon * develop
  • sty ==1.0.0b6 develop
  • twine * develop
  • ase >=3.17.0
  • h5py >=2.7.1
  • networkx >=2.1.0
  • numpy >=1.14.3
  • pandas >=0.23.0
  • scikit-learn >=0.19.1
  • scipy >=1.1.0
  • tqdm *
Pipfile.lock pypi
  • alabaster ==0.7.10 develop
  • argh ==0.26.2 develop
  • attrs ==18.1.0 develop
  • babel ==2.5.3 develop
  • certifi ==2018.4.16 develop
  • chardet ==3.0.4 develop
  • commonmark ==0.5.4 develop
  • coverage ==4.5.1 develop
  • cycler ==0.10.0 develop
  • docutils ==0.14 develop
  • gpflow ==1.1.1 develop
  • idna ==2.6 develop
  • imagesize ==1.0.0 develop
  • jinja2 >=2.10.1 develop
  • kiwisolver ==1.0.1 develop
  • livereload ==2.5.2 develop
  • markupsafe ==1.0 develop
  • matplotlib ==2.2.2 develop
  • more-itertools ==4.1.0 develop
  • multipledispatch ==0.5.0 develop
  • numpy ==1.14.3 develop
  • packaging ==17.1 develop
  • pandas ==0.23.0 develop
  • pathtools ==0.1.2 develop
  • piprot ==0.9.10 develop
  • pkginfo ==1.4.2 develop
  • pluggy ==0.6.0 develop
  • pockets ==0.6.2 develop
  • port-for ==0.3.1 develop
  • py ==1.5.3 develop
  • pygments ==2.2.0 develop
  • pyinstrument ==2.0.2 develop
  • pyinstrument-cext ==0.1.6 develop
  • pyparsing ==2.2.0 develop
  • pytest ==3.5.1 develop
  • pytest-cov ==2.5.1 develop
  • python-dateutil ==2.7.3 develop
  • pytz ==2018.4 develop
  • pyyaml >=4.2b1 develop
  • recommonmark ==0.4.0 develop
  • requests >=2.20.0 develop
  • requests-futures ==0.9.7 develop
  • requests-toolbelt ==0.8.0 develop
  • scipy ==1.1.0 develop
  • seaborn ==0.8.1 develop
  • six ==1.11.0 develop
  • snowballstemmer ==1.2.1 develop
  • sphinx ==1.7.4 develop
  • sphinx-autobuild ==0.7.1 develop
  • sphinx-rtd-theme ==0.3.1 develop
  • sphinxcontrib-napoleon ==0.6.1 develop
  • sphinxcontrib-websupport ==1.0.1 develop
  • sty ==1.0.0b6 develop
  • tornado ==5.0.2 develop
  • tqdm ==4.23.3 develop
  • twine ==1.11.0 develop
  • urllib3 >=1.23 develop
  • watchdog ==0.8.3 develop
  • ase ==3.16.0
  • click ==6.7
  • cycler ==0.10.0
  • decorator ==4.3.0
  • flask ==1.0.2
  • h5py ==2.7.1
  • itsdangerous ==0.24
  • jinja2 >=2.10.1
  • kiwisolver ==1.0.1
  • markupsafe ==1.0
  • matplotlib ==2.2.2
  • networkx ==2.1.0
  • numpy ==1.14.3
  • pandas ==0.23.0
  • pyparsing ==2.2.0
  • python-dateutil ==2.7.3
  • pytz ==2018.4
  • scikit-learn ==0.19.1
  • scipy ==1.1.0
  • six ==1.11.0
  • tqdm ==4.23.3
  • werkzeug >=0.15.3
requirements.txt pypi
  • ase >=3.17.0
  • click >=6.7
  • cycler >=0.10.0
  • decorator >=4.3.0
  • flask >=1.0.2
  • h5py >=2.7.1
  • itsdangerous >=0.24
  • jinja2 >=2.10
  • kiwisolver >=1.0.1
  • markupsafe >=1.0
  • networkx >=2.1.0
  • numpy >=1.14.3
  • pandas >=0.24.0
  • psutil >=5.4.3
  • pyparsing >=2.2.0
  • python-dateutil >=2.7.3
  • pytz >=2018.4
  • scikit-learn >=0.19.1
  • scipy >=1.1.0
  • six >=1.11.0
  • tqdm >=4.23.3
  • werkzeug >=0.14.1
Dockerfile docker
  • jenningspc/catlearn latest build
setup/Dockerfile docker
  • ubuntu latest build