https://github.com/ncfrey/pumml

Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised machine learning to classify materials from only positive and unlabeled examples.

https://github.com/ncfrey/pumml

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: acs.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.1%) to scientific vocabulary

Keywords

chemistry density-functional-theory machine-learning materials-design materials-discoveries materials-informatics materials-science physics positive-unlabeled-learning
Last synced: 9 months ago · JSON representation

Repository

Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised machine learning to classify materials from only positive and unlabeled examples.

Basic Info
Statistics
  • Stars: 37
  • Watchers: 4
  • Forks: 13
  • Open Issues: 3
  • Releases: 0
Topics
chemistry density-functional-theory machine-learning materials-design materials-discoveries materials-informatics materials-science physics positive-unlabeled-learning
Created almost 7 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.md

pumml Python 3.6 Python 3.7

pumml

Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised positive and unlabeled (PU) machine learning to classify materials when data is incomplete and only examples of "positive" materials are available. As an example, pumml was used to predict the "synthesizability" of bulk and 2D materials from "positive" examples of synthesized materials.

How to cite pumml

If you use pumml in your research, please cite the following work:

Nathan C. Frey, Jin Wang, Gabriel Iván Vega Bellido, Babak Anasori, Yury Gogotsi, and Vivek B. Shenoy. Prediction of Synthesis of 2D Metal Carbides and Nitrides (MXenes) and Their Precursors with Positive and Unlabeled Machine Learning. ACS Nano 2019 13 (3), 3031-3041.

Please also consider citing the original works that establish the underlying methodology of pumml:

Elkan, Charles, and Keith Noto. Learning classifiers from only positive and unlabeled data. Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, 2008.

Mordelet, F.; Vert, J.-P. A Bagging SVM to Learn from Positive and Unlabeled Examples. Pattern Recognit. Lett. 2014, 37, 201−209.

Getting pumml

The easiest way to get started with pumml is to create a virtual environment with python3.6 and then pip install pumml

You can also create a virtual environment, clone this repo and do python setup.py install in the root directory.

Using pumml

In the example_notebooks folder you will find a Jupyter notebook called basic_example.ipynb that shows the basic functionality of the package. The notebook materials_project_example.ipynb shows how to use pumml to predict the synthetic accessibility of theoretical materials in the Materials Project database. Static images of Materials Project data are available on figshare for experimenting with pumml.

About pumml

More information about using PU learning for materials synthesis prediction can be found in our publication: DOI: 10.1021/acsnano.8b08014 https://pubs.acs.org/doi/abs/10.1021/acsnano.8b08014

Helpful PU learning wrappers for scikit-learn can be found at: Alexandre Drouin, pu-learning, 2013, https://github.com/aldro61/pu-learning

In addition to our transductive bagging scheme with decision tree base classifiers, we recommend the robust ensemble of support vector machines (RESVM) method introduced by Claesen et al. RESVM is an alternative PU learning method that provides an excellent benchmark. It is implemented here: Marc Claesen, EnsembleSVM, 2014, https://github.com/claesenm/EnsembleSVM and a python wrapper is available here: Marc Claesen, resvm, 2014, https://github.com/claesenm/resvm.

License

This code is made available under the MIT License.

Owner

  • Name: Nathan Frey
  • Login: ncfrey
  • Kind: user
  • Location: Manhattan, NY
  • Company: Prescient Design • Genentech

Machine Learning Scientist at Prescient Design • Genentech. Previously postdoc at MIT, PhD at UPenn, and scientist at Berkeley Lab.

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 66
  • Total Committers: 4
  • Avg Commits per committer: 16.5
  • Development Distribution Score (DDS): 0.303
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
ncfrey n****y@u****m 46
Vishnu Harshith 6****h@u****m 13
Nathan Frey n****3@g****m 4
dependabot[bot] 4****]@u****m 3

Issues and Pull Requests

Last synced: 12 months ago

All Time
  • Total issues: 3
  • Total pull requests: 15
  • Average time to close issues: 3 months
  • Average time to close pull requests: 6 days
  • Total issue authors: 1
  • Total pull request authors: 3
  • Average comments per issue: 0.33
  • Average comments per pull request: 0.2
  • Merged pull requests: 12
  • Bot issues: 0
  • Bot pull requests: 6
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ncfrey (3)
Pull Request Authors
  • dependabot[bot] (8)
  • ncfrey (7)
  • VishnuHarshith (2)
Top Labels
Issue Labels
enhancement (3) good first issue (1)
Pull Request Labels
dependencies (8) enhancement (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 21 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 3
  • Total maintainers: 1
pypi.org: pumml

Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised positive and unlabeled (PU) machine learning to classify materials when data is incomplete and only examples of 'positive' materials are available.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 21 Last month
Rankings
Forks count: 9.8%
Dependent packages count: 10.1%
Stargazers count: 11.1%
Average: 16.9%
Dependent repos count: 21.6%
Downloads: 31.8%
Maintainers (1)
Last synced: 9 months ago
conda-forge.org: pumml

Positive and Unlabeled Materials Machine Learning (pumml) is a code that uses semi-supervised positive and unlabeled (PU) machine learning to classify materials when data is incomplete and only examples of "positive" materials are available. As an example, pumml was used to predict the "synthesizability" of bulk and 2D materials from "positive" examples of synthesized materials.

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Forks count: 38.1%
Average: 41.2%
Stargazers count: 41.4%
Dependent packages count: 51.2%
Last synced: 9 months ago

Dependencies

requirements.txt pypi
  • matminer ==0.6.3
  • matplotlib >=3.1.1
  • monty >=2.0.4
  • numpy ==1.19.4
  • pandas ==1.1.4
  • pymatgen ==2020.11.11
  • scikit-learn ==0.23.2
  • scipy >=1.3.0
  • seaborn >=0.9.0
setup.py pypi
  • matminer ==0.6.3
  • matplotlib >=3.1.1
  • monty >=2.0.4
  • numpy ==1.19.4
  • pandas ==1.1.4
  • pymatgen ==2020.11.11
  • scikit-learn ==0.23.2
  • scipy >=1.3.0
  • seaborn >=0.9.0