pyepal

Multiobjective active learning with tunable accuracy/efficiency tradeoff and clear stopping criterion.

https://github.com/lamalab-org/pyepal

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: nature.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (5.4%) to scientific vocabulary

Keywords

active-learning hacktoberfest machine-learning multiobjective pareto python

Last synced: 6 months ago · JSON representation ·

Repository

Multiobjective active learning with tunable accuracy/efficiency tradeoff and clear stopping criterion.

Basic Info

Host: GitHub
Owner: lamalab-org
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 10.5 MB

Statistics

Stars: 40
Watchers: 3
Forks: 6
Open Issues: 25
Releases: 0

Topics

active-learning hacktoberfest machine-learning multiobjective pareto python

Created almost 6 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog Contributing License Citation

| | | | -------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | Continuous integration | | | Code health | GitHub last commit | | Documentation and tutorial | | | Social | | | Python | PyPI - Python Version | | License | | | Citation | |

Generalized Python implementation of a modified version of the ε-PAL algorithm [1, 2].

For more detailed docs go here.

Installation

To install the latest stable release use

(bash) pip install pyepal

or the conda channel (recommended)

(bash) conda install pyepal -c conda-forge

to install the latest development version from the head use

(bash) pip install git+https://github.com/kjappelbaum/pyepal.git

Developers can install the extras [testing, docs, pre-commit]. Installation should take only a few minutes.

Additional Notes

On macOS you might need to install libomp (e.g., brew install libomp) for multithreading in some models.
We currently support Python 3.7 and 3.8.
If you want to limit how many CPUs openblas uses, you can export OPENBLAS_NUM_THREADS=1

Usage

The main logic is implemented in the PALBase class. There are some prebuilt classes for common use cases (GPy, sklearn) that inherit from this class. For more details about how to use the code and notes about the tutorials see the docs.

Pre-Built classes

scikit-learn

If you want to use a list of sklearn models, you can use the PALSklearn class. To use it for one step, you can follow the following code snippet. The basic principle is the same for all the different PAL classes.

```python from pyepal import PALSklearn from sklearn.gaussianprocess import GaussianProcessRegressor from sklearn.gaussianprocess.kernels import RBF, Matern

For each objective, initialize a model

gprobjective0 = GaussianProcessRegressor(RBF()) gprobjective1 = GaussianProcessRegressor(RBF())

The minimal input to create a PAL instance is a list of models,

the design space (X, in ML terms "feature matrix") and the number of objectives

palsklearninstance = PALSklearn(X, [gprobjective0, gprobjective_1], 2)

the next step is to provide some initial measurements.

You can do this with the updatetrainset function, which you

can use throughout the active learning process to update the training set.

For this, provide a numpy array of indices in your design space

and the corresponding measurements

sampledindices = np.array([1,2,3]) measurements = np.array([[1,2], [0.8, 1], [7,1]]) palsklearninstance.updatetrainset(sampled_indices, measurements)

Now, you're ready to run the first iteration.

This will return the next index to sample and update all the attributes

If there are no unclassified samples left, it will return None and

print a statement saying that the classification is completed

indextosample = palsklearninstance.runone_step() ```

GPy

If you want to use a list of GPy models, you can use the PALGPy class.

Coregionalized GPR

Coregionalized GPR models can utilize correlations between the objectives and also work in the cases in which some of the objectives are not measured for all samples.

Custom classes

You will need to implement the _train() and _predict() functions if you inherit from PALBase. If you want to tune the hyperparameters of your models while new training points are added, you can implement a schedule by setting the _should_optimize_hyperparameters() function and the _set_hyperparameters() function, which sets the hyperparameters for the model(s).

If you need to train a model, use self.design_space as the feature matrix and self.y as the target vector. Note that in self.y all objectives are turned into maximization problems. That is, if one of your problems is a minimization problem, PyePAL will flip its sign in self.y.

A basic example of how a custom class can be implemented is the PALSklearn class:

```python class PALSklearn(PALBase): """PAL class for a list of Sklearn (GPR) models, with one model per objective"""

def __init__(self, *args, **kwargs):
    super().__init__(*args, **kwargs)

    validate_number_models(self.models, self.ndim)

def _train(self):
    for i, model in enumerate(self.models):
        model.fit(self.design_space[self.sampled], self.y[self.sampled, i].reshape(-1,1))

def _predict(self):
    means, stds = [], []
    for model in self.models:
        mean, std = model.predict(self.design_space, return_std=True)
        means.append(mean.reshape(-1, 1))
        stds.append(std.reshape(-1, 1))

    self._means = np.hstack(mean)
    self.std = np.hstack(stds)

```

For scheduling of the hyperparameter optimization, we have some predefined schedules in the pyepal.pal.schedules module.

Test the algorithms

If the full design space is known, you can use a while loop to fully explore the space with PyePAL. For the theoretical guarantees of PyePAL to hold, you'll need to sample until all uncertainties are below epsilon. In practice, it is usually enough to require as a termination criterion that there are no unclassified samples left. For this you can use the following snippet

```python from pyepal.utils import exhaustloop from pyepal.models.gpr import buildmodel

indices for initialization

sample_idx = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 50, 60, 70])

build one model per objective

model0 = buildmodel(X[sampleidx], y[sampleidx], 0) model1 = buildmodel(X[sampleidx], y[sampleidx], 1)

initialize the PAL instance

palinstance = PALGPy(X, [model0, model1], 2, betascale=1) palinstance.updatetrainset(sampleidx, y[sample_idx])

This will run the sampling and training as long as there

are unclassified samples

exhaust_loop(palinstance, y) ```

To measure the performance, you can use the get_hypervolume function from pyepal.pal.utils. More indicators are implemented in packages like deap, pagmo, or pymoo.

References

Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.
Zuluaga, M.; Sergent, G.; Krause, A.; Püschel, M. Active Learning for Multi-Objective Optimization; Dasgupta, S., McAllester, D., Eds.; Proceedings of machine learning research; PMLR: Atlanta, Georgia, USA, 2013; Vol. 28, pp 462–470.

Citation

If you find this code useful for your work, please cite:

Our paper that describes the implementation and an application to materials discovery: Jablonka, K. M.; Jothiappan, G. M.; Wang, S.; Smit, B.; Yoo, B. Bias Free Multiobjective Active Learning for Materials Design and Discovery. Nat Commun 2021, 12 (1), 2312.
The original paper that describes the ε-PAL algorithm: Zuluaga, M.; Krause, A.; Püschel, M. E-PAL: An Active Learning Approach to the Multi-Objective Optimization Problem. Journal of Machine Learning Research 2016, 17 (104), 1–32.

Acknowledgments

The research was supported by the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement 666983, MaGic), by the NCCR-MARVEL, funded by the Swiss National Science Foundation, and by the Swiss National Science Foundation (SNSF) under Grant 200021_172759. Part of the work was performed as part of the Explore Together internship program at BASF.

Owner

Name: Laboratory for AI for Materials
Login: lamalab-org
Kind: organization

Repositories: 1
Profile: https://github.com/lamalab-org

Research group led by Kevin Maik Jablonka

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it using these metadata."
authors:
  - family-names: Jablonka
    given-names: "Kevin Maik"
    orcid: "https://orcid.org/0000-0003-4894-4660"
  - family-names: "Melpatti Jothiappan"
    given-names: Giriprasad
  - family-names: Wang
    given-names: Shefang
  - family-names: Smit
    given-names: Berend
    orcid: "https://orcid.org/0000-0003-4653-8562"
  - family-names: Brian
    given-names: Yoo
    orcid: "https://orcid.org/0000-0002-0326-0831"
doi: "10.1038/s41467-021-22437-0"
license: "Apache-2.0"
repository-code: "https://github.com/kjappelbaum/pyepal"
title: pyepal
version: v0.7.0

GitHub Events

Total

Delete event: 11
Issue comment event: 22
Pull request review event: 10
Pull request event: 23
Create event: 10

Last Year

Delete event: 11
Issue comment event: 22
Pull request review event: 10
Pull request event: 23
Create event: 10

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 71
Total pull requests: 53
Average time to close issues: 11 days
Average time to close pull requests: 2 months
Total issue authors: 2
Total pull request authors: 3
Average comments per issue: 1.51
Average comments per pull request: 1.06
Merged pull requests: 27
Bot issues: 0
Bot pull requests: 24

Past Year

Issues: 0
Pull requests: 20
Average time to close issues: N/A
Average time to close pull requests: 14 days
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 1.35
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 20

View more stats

Top Authors

Issue Authors

kjappelbaum (68)
byooooo (3)

Pull Request Authors

kjappelbaum (26)
dependabot[bot] (22)
byooooo (3)

Top Labels

Issue Labels

enhancement (43) documentation (15) bug (7) CI (6) no-issue-activity (6) blocking (5) question (3) feature_request (2) help wanted (1) wontfix (1) wontfixnow (1) priority (1)

Pull Request Labels

dependencies (22) enhancement (6) python (6) documentation (3) CI (2) bugfix (1)

Dependencies

.github/workflows/codeql.yml actions

actions/checkout v3 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite

.github/workflows/pre_commit.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/python_package.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite
codecov/codecov-action v1 composite

.github/workflows/release_me.yml actions

GoogleCloudPlatform/release-please-action v2.6.0 composite

docs/_build/html/_static/fonts/source-serif-pro/bower.json bower

pyproject.toml pypi

setup.py pypi