oboe

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

https://github.com/udellgroup/oboe

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
2 of 3 committers (66.7%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (19.1%) to scientific vocabulary

Keywords

automl collaborative-filtering ml-pipelines

Last synced: 9 months ago · JSON representation

Repository

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

Basic Info

Host: GitHub
Owner: udellgroup
License: bsd-3-clause
Language: Python
Default Branch: master
Homepage:
Size: 325 MB

Statistics

Stars: 84
Watchers: 11
Forks: 17
Open Issues: 0
Releases: 0

Topics

automl collaborative-filtering ml-pipelines

Created over 8 years ago · Last pushed over 4 years ago

Metadata Files

Readme License

The Oboe systems

This bundle of libraries, Oboe and TensorOboe, are automated machine learning (AutoML) systems that use collaborative filtering to find good models for supervised learning tasks within a user-specified time limit. Further hyperparameter tuning can be performed afterwards.

The name comes from the musical instrument oboe: in an orchestra, oboe plays an initial note which the other instruments use to tune to the right frequency before the performance begins. Our Oboe systems play a similar role in AutoML: we use meta-learning to select a promising set of models or to build an ensemble for a new dataset. Users can either directly use the selected models or further fine-tune their hyperparameters.

On a new dataset:

Oboe searches for promising estimators (supervised learners) by matrix factorization and classical experiment design. It requires a pre-processed dataset: one-hot encode categorical features and then standardize all features to have zero meanand unit variance. For a complete description, refer to our paper OBOE: Collaborative Filtering for AutoML Model Selection at KDD 2019.
TensorOboe searches for promising pipelines, which are directed graphs of learning components here, including imputation, encoding, standardization, dimensionality reduction and estimation. Thus it can accept a raw dataset, possibly with missing entries, different types of features, not-centered features, etc. For a complete description, refer to our paper AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space at KDD 2020.

This bundle of systems is still under developement and subjects to change. For any questions, please submit an issue. The authors will respond as soon as possible.

Installation

The easiest way is to install using pip:

pip install oboe

Alternatively, if you want to customize the source code, you may install in the editable mode by first git clone this respository, and then do

pip install -e .

in the cloned directory. Note this will download some large (about 100MB in total) files to warm-start TensorOboe fitting, so that the setup time (in minutes) can be saved at the cost of disk space and network data usage.

It is recommended to install within an isolated environment (a conda virtual environment, for example) to avoid conflicting dependency versions.

Dependencies with verified versions

The Oboe systems work on Python 3.7 or later. The following libraries are required. The listed versions are the versions that are verified to work. Older versions may work but are not guaranteed.

numpy (1.16.4)
scipy (1.4.1)
pandas (0.24.2)
scikit-learn (0.22.1)
tensorly (0.6.0)
OpenML (0.9.0)
mkl (>=1.0.0)

Examples

For more detailed examples, please refer to the Jupyter notebooks in the example folder. A basic classification example using Oboe:

```python method = 'Oboe' # 'Oboe' or 'TensorOboe' problem_type = 'classification'

from oboe import AutoLearner, error # This may take around 15 seconds at first run.

import numpy as np from sklearn.datasets import loadiris from sklearn.modelselection import traintestsplit

data = loadiris() x = np.array(data['data']) y = np.array(data['target']) xtrain, xtest, ytrain, ytest = traintestsplit(x, y, testsize=0.2)

m = AutoLearner(ptype=problemtype, runtimelimit=30, method=method, verbose=False) m.fit(xtrain, ytrain) ypredicted = m.predict(x_test)

print("prediction error (balanced error rate): {}".format(error(ytest, ypredicted, 'classification')))
print("selected models: {}".format(m.get_models()))

```

Warm-start meta-training

The large_files folder includes some large numpy arrays that are intermediate results of previous meta-training. This folder is not included in the pip installation, and the files within it can be manually downloaded from this GitHub repository.

The default functionality in TensorOboe is to skip the step of imputing missing entries in the error tensor, and directly use the pre-imputed error tensor. If users desire to impute the error tensor by themselves, the original non-imputed error tensor can be found at large_files/error_tensor_f16_compressed.npz, and the TensorOboe initialization can be done by setting the original_error_tensor_dir argument to the path of this .npz file, and setting mode to 'initialize' when creating the AutoLearner instance: m = AutoLearner(..., method='TensorOboe', mode='initialize', path_to_imputed_error_tensor=<path_to_this_npy_file>).

References

[1] Chengrun Yang, Yuji Akimoto, Dae Won Kim, Madeleine Udell. OBOE: Collaborative filtering for AutoML model selection. KDD 2019.

[2] Chengrun Yang, Jicong Fan, Ziyang Wu, Madeleine Udell. AutoML Pipeline Selection: Efficiently Navigating the Combinatorial Space. KDD 2020.

Owner

Name: udellgroup
Login: udellgroup
Kind: organization

Repositories: 10
Profile: https://github.com/udellgroup

GitHub Events

Total

Watch event: 2
Fork event: 1

Last Year

Watch event: 2
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 278
Total Committers: 3
Avg Commits per committer: 92.667
Development Distribution Score (DDS): 0.439

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
yujiakimoto	y**2@c**u	156
Chengrun Yang	y**3@g**m	117
Yuji Akimoto	y**2@e**u	5

Committer Domains (Top 20 + Academic)

en-cs-keuka.coecis.cornell.edu: 1 cornell.edu: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 8
Total pull requests: 9
Average time to close issues: 2 months
Average time to close pull requests: about 2 months
Total issue authors: 6
Total pull request authors: 4
Average comments per issue: 3.0
Average comments per pull request: 0.11
Merged pull requests: 8
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

mbahmani (2)
sebastianpinedaar (2)
pwoller (1)
eddiebergman (1)
iXanthos (1)
zml24 (1)

Pull Request Authors

chengrunyang (6)
mbahmani (1)
yujiakimoto (1)
ghost (1)

Top Labels

Issue Labels

question (2) invalid (1)

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- pypi 42 last-month

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 5
(may contain duplicates)
Total versions: 3
Total maintainers: 1

proxy.golang.org: github.com/udellgroup/oboe

Documentation: https://pkg.go.dev/github.com/udellgroup/oboe#section-documentation
License: bsd-3-clause
Latest release: v0.2.0
published over 4 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 9 months ago

pypi.org: oboe

An AutoML pipeline selection system to quickly select a promising pipeline for a new dataset.

Homepage: https://github.com/udellgroup/oboe
Documentation: https://oboe.readthedocs.io/
License: BSD License
Latest release: 0.2.0
published over 4 years ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 5
Downloads: 42 Last month

Rankings

Dependent repos count: 6.6%

Stargazers count: 7.8%

Forks count: 9.1%

Dependent packages count: 10.0%

Average: 15.5%

Downloads: 44.0%

Maintainers (1)

chengrunyang

Last synced: 9 months ago

oboe

Science Score: 10.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

The Oboe systems

Installation

Dependencies with verified versions

Examples

Warm-start meta-training

References

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/udellgroup/oboe

Rankings

pypi.org: oboe

Rankings

Maintainers (1)