autopytorch

Automatic architecture search and hyperparameter optimization for PyTorch

https://github.com/automl/auto-pytorch

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    2 of 13 committers (15.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

automl deep-learning pytorch tabular-data
Last synced: 6 months ago · JSON representation ·

Repository

Automatic architecture search and hyperparameter optimization for PyTorch

Basic Info
  • Host: GitHub
  • Owner: automl
  • License: apache-2.0
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 19.4 MB
Statistics
  • Stars: 2,477
  • Watchers: 46
  • Forks: 300
  • Open Issues: 75
  • Releases: 4
Topics
automl deep-learning pytorch tabular-data
Created about 7 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License Citation

README.md

Auto-PyTorch

Copyright (C) 2021 AutoML Groups Freiburg and Hannover

While early AutoML frameworks focused on optimizing traditional ML pipelines and their hyperparameters, another trend in AutoML is to focus on neural architecture search. To bring the best of these two worlds together, we developed Auto-PyTorch, which jointly and robustly optimizes the network architecture and the training hyperparameters to enable fully automated deep learning (AutoDL).

Auto-PyTorch is mainly developed to support tabular data (classification, regression) and time series data (forecasting). The newest features in Auto-PyTorch for tabular data are described in the paper "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL" (see below for bibtex ref). Details about Auto-PyTorch for multi-horizontal time series forecasting tasks can be found in the paper "Efficient Automated Deep Learning for Time Series Forecasting" (also see below for bibtex ref).

Also, find the documentation here.

From v0.1.0, AutoPyTorch has been updated to further improve usability, robustness and efficiency by using SMAC as the underlying optimization package as well as changing the code structure. Therefore, moving from v0.0.2 to v0.1.0 will break compatibility. In case you would like to use the old API, you can find it at master_old.

Workflow

The rough description of the workflow of Auto-Pytorch is drawn in the following figure.

AutoPyTorch Workflow

In the figure, Data is provided by user and Portfolio is a set of configurations of neural networks that work well on diverse datasets. The current version only supports the greedy portfolio as described in the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL This portfolio is used to warm-start the optimization of SMAC. In other words, we evaluate the portfolio on a provided data as initial configurations. Then API starts the following procedures: 1. Validate input data: Process each data type, e.g. encoding categorical data, so that Auto-Pytorch can handled. 2. Create dataset: Create a dataset that can be handled in this API with a choice of cross validation or holdout splits. 3. Evaluate baselines * Tabular dataset 1: Train each algorithm in the predefined pool with a fixed hyperparameter configuration and dummy model from sklearn.dummy that represents the worst possible performance. * *Time Series Forecasting dataset** : Train a dummy predictor that repeats the last observed value in each series 4. Search by SMAC:\ a. Determine budget and cut-off rules by Hyperband\ b. Sample a pipeline hyperparameter configuration *2 by SMAC\ c. Update the observations by obtained results\ d. Repeat a. -- c. until the budget runs out 5. Build the best ensemble for the provided dataset from the observations and model selection of the ensemble.

*1: Baselines are a predefined pool of machine learning algorithms, e.g. LightGBM and support vector machine, to solve either regression or classification task on the provided dataset

*2: A pipeline hyperparameter configuration specifies the choice of components, e.g. target algorithm, the shape of neural networks, in each step and (which specifies the choice of components in each step and their corresponding hyperparameters.

Installation

PyPI Installation

```sh

pip install autoPyTorch

```

Auto-PyTorch for Time Series Forecasting requires additional dependencies

```sh

pip install autoPyTorch[forecasting]

```

Manual Installation

We recommend using Anaconda for developing as follows:

```sh

Following commands assume the user is in a cloned directory of Auto-Pytorch

We also need to initialize the automl_common repository as follows

You can find more information about this here:

https://github.com/automl/automl_common/

git submodule update --init --recursive

Create the environment

conda create -n auto-pytorch python=3.8 conda activate auto-pytorch conda install swig python setup.py install

```

Similarly, to install all the dependencies for Auto-PyTorch-TimeSeriesForecasting:

```sh

git submodule update --init --recursive

conda create -n auto-pytorch python=3.8 conda activate auto-pytorch conda install swig pip install -e[forecasting]

```

Examples

In a nutshell:

```py from autoPyTorch.api.tabular_classification import TabularClassificationTask

data and metric imports

import sklearn.modelselection import sklearn.datasets import sklearn.metrics X, y = sklearn.datasets.loaddigits(returnXy=True) Xtrain, Xtest, ytrain, ytest = \ sklearn.modelselection.traintestsplit(X, y, randomstate=1)

initialise Auto-PyTorch api

api = TabularClassificationTask()

Search for an ensemble of machine learning algorithms

api.search( Xtrain=Xtrain, ytrain=ytrain, Xtest=Xtest, ytest=ytest, optimizemetric='accuracy', totalwalltimelimit=300, funcevaltimelimit_secs=50 )

Calculate test accuracy

ypred = api.predict(Xtest) score = api.score(ypred, ytest) print("Accuracy score", score) ```

For Time Series Forecasting Tasks ```py

from autoPyTorch.api.timeseriesforecasting import TimeSeriesForecastingTask

data and metric imports

from sktime.datasets import loadlongley targets, features = loadlongley()

define the forecasting horizon

forecasting_horizon = 3

Dataset optimized by APT-TS can be a list of np.ndarray/ pd.DataFrame where each series represents an element in the

list, or a single pd.DataFrame that records the series

index information: to which series the timestep belongs? This id can be stored as the DataFrame's index or a separate

column

Within each series, we take the last forecasting_horizon as test targets. The items before that as training targets

Normally the value to be forecasted should follow the training sets

ytrain = [targets[: -forecastinghorizon]] ytest = [targets[-forecastinghorizon:]]

same for features. For uni-variant models, Xtrain, Xtest can be omitted and set as None

Xtrain = [features[: -forecastinghorizon]]

Here x_test indicates the 'known future features': they are the features known previously, features that are unknown

could be replaced with NAN or zeros (which will not be used by our networks). If no feature is known beforehand,

we could also omit X_test

knownfuturefeatures = list(features.columns) Xtest = [features[-forecastinghorizon:]]

starttimes = [targets.index.totimestamp()[0]] freq = '1Y'

initialise Auto-PyTorch api

api = TimeSeriesForecastingTask()

Search for an ensemble of machine learning algorithms

api.search( Xtrain=Xtrain, ytrain=ytrain, Xtest=Xtest, optimizemetric='meanMAPEforecasting', npredictionsteps=forecastinghorizon, memorylimit=16 * 1024, # Currently, forecasting models use much more memories freq=freq, starttimes=starttimes, funcevaltimelimitsecs=50, totalwalltimelimit=60, minnumtestinstances=1000, # proxy validation sets. This only works for the tasks with more than 1000 series knownfuturefeatures=knownfuturefeatures, )

our dataset could directly generate sequences for new datasets

testsets = api.dataset.generatetest_seqs()

Calculate test accuracy

ypred = api.predict(testsets) score = api.score(ypred, ytest) print("Forecasting score", score) ```

For more examples including customising the search space, parellising the code, etc, checkout the examples folder

sh $ cd examples/

Code for the paper is available under examples/ensemble in the TPAMI.2021.3067763 branch.

Contributing

If you want to contribute to Auto-PyTorch, clone the repository and checkout our current development branch

sh $ git checkout development

License

This program is free software: you can redistribute it and/or modify it under the terms of the Apache license 2.0 (please see the LICENSE file).

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

You should have received a copy of the Apache license 2.0 along with this program (see LICENSE file).

Reference

Please refer to the branch TPAMI.2021.3067763 to reproduce the paper Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL.

bibtex @article{zimmer-tpami21a, author = {Lucas Zimmer and Marius Lindauer and Frank Hutter}, title = {Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL}, journal = {IEEE Transactions on Pattern Analysis and Machine Intelligence}, year = {2021}, note = {also available under https://arxiv.org/abs/2006.13799}, pages = {3079 - 3090} }

bibtex @incollection{mendoza-automlbook18a, author = {Hector Mendoza and Aaron Klein and Matthias Feurer and Jost Tobias Springenberg and Matthias Urban and Michael Burkart and Max Dippel and Marius Lindauer and Frank Hutter}, title = {Towards Automatically-Tuned Deep Neural Networks}, year = {2018}, month = dec, editor = {Hutter, Frank and Kotthoff, Lars and Vanschoren, Joaquin}, booktitle = {AutoML: Methods, Sytems, Challenges}, publisher = {Springer}, chapter = {7}, pages = {141--156} }

bibtex @article{deng-ecml22, author = {Difan Deng and Florian Karl and Frank Hutter and Bernd Bischl and Marius Lindauer}, title = {Efficient Automated Deep Learning for Time Series Forecasting}, year = {2022}, booktitle = {Machine Learning and Knowledge Discovery in Databases. Research Track - European Conference, {ECML} {PKDD} 2022}, url = {https://doi.org/10.48550/arXiv.2205.05511}, }

Contact

Auto-PyTorch is developed by the AutoML Groups of the University of Freiburg and Hannover.

Owner

  • Name: AutoML-Freiburg-Hannover
  • Login: automl
  • Kind: organization
  • Location: Freiburg and Hannover, Germany

Citation (CITATION.cff)

preferred-citation:
  type: article
  authors:
  - family-names: "Zimmer"
    given-names: "Lucas"
    affiliation: "University of Freiburg, Germany"    
  - family-names: "Lindauer"
    given-names: "Marius"
    affiliation: "University of Freiburg, Germany"    
  - family-names: "Hutter"
    given-names: "Frank"
    affiliation: "University of Freiburg, Germany"
  doi: "10.1109/TPAMI.2021.3067763"
  journal-title: "IEEE Transactions on Pattern Analysis and Machine Intelligence"
  title: "Auto-PyTorch Tabular: Multi-Fidelity MetaLearning for Efficient and Robust AutoDL"
  year: 2021
  note: "also available under https://arxiv.org/abs/2006.13799"
  start: 3079
  end: 3090

GitHub Events

Total
  • Issues event: 2
  • Watch event: 109
  • Member event: 3
  • Issue comment event: 9
  • Fork event: 17
Last Year
  • Issues event: 2
  • Watch event: 109
  • Member event: 3
  • Issue comment event: 9
  • Fork event: 17

Committers

Last synced: 12 months ago

All Time
  • Total Commits: 224
  • Total Committers: 13
  • Avg Commits per committer: 17.231
  • Development Distribution Score (DDS): 0.732
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
chico f****e@g****m 60
Lucas Zimmer z****l@i****e 53
Ravin Kohli 1****i 43
Matthias Urban u****m@i****e 27
nabenabe0928 s****o@g****m 14
Marius Lindauer m****s@g****m 8
LMZimmer 5****r 7
dwoiwode f****n@d****e 5
bastiscode s****8@g****m 3
ntnguyen88 6****8 1
Tim Hatch t****m@t****m 1
Johnny Burns j****2@g****m 1
Daiki Katsuragawa 5****a 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 65
  • Total pull requests: 55
  • Average time to close issues: 6 months
  • Average time to close pull requests: 4 months
  • Total issue authors: 42
  • Total pull request authors: 10
  • Average comments per issue: 1.89
  • Average comments per pull request: 0.69
  • Merged pull requests: 18
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • nabenabe0928 (11)
  • RobbyW551 (5)
  • ravinkohli (4)
  • franchuterivera (3)
  • ArlindKadra (3)
  • shabir1 (2)
  • Yuang-Deng (2)
  • CHDNY (1)
  • jmrichardson (1)
  • Songenyu (1)
  • zym604 (1)
  • alirostami9972 (1)
  • LokeshBadisa (1)
  • LuciusMos (1)
  • bakirillov (1)
Pull Request Authors
  • ravinkohli (31)
  • nabenabe0928 (7)
  • dengdifan (5)
  • ArlindKadra (3)
  • theodorju (2)
  • franchuterivera (2)
  • marcelovca90 (2)
  • Borda (1)
  • dwoiwode (1)
  • na018 (1)
Top Labels
Issue Labels
enhancement (13) bug (9) fix-later (9) not urgent (6) Documentation (2) refactoring (2) needs-more-information (1)
Pull Request Labels
enhancement (6) bug (2) first priority (1) refactoring (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 181 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 9
  • Total maintainers: 4
proxy.golang.org: github.com/automl/Auto-PyTorch
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 7.0%
Average: 8.2%
Dependent repos count: 9.3%
Last synced: 6 months ago
pypi.org: autopytorch

Auto-PyTorch searches neural architectures using smac

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 181 Last month
Rankings
Stargazers count: 1.5%
Forks count: 3.2%
Average: 9.3%
Dependent packages count: 10.1%
Downloads: 10.1%
Dependent repos count: 21.6%
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • ConfigSpace >=0.4.14,<0.5
  • catboost *
  • dask *
  • distributed >=2.2.0
  • flaky *
  • imgaug >=0.4.0
  • lightgbm *
  • lockfile *
  • numpy *
  • pandas *
  • pynisher >=0.6.3
  • pyrfr >=0.7,<0.9
  • scikit-learn >=0.24.0,<0.25.0
  • scipy >=1.7
  • smac ==0.14.0
  • tabulate *
  • tensorboard *
  • torch *
  • torchvision *
.github/workflows/dist.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/docker-publish.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
  • docker/login-action f054a8b539a109f9f41c372932f1ae047eff08c9 composite
  • docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite
.github/workflows/docs.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/long_regression_test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/pre-commit.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/pytest.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
.github/workflows/release.yml actions
  • actions/checkout master composite
  • actions/setup-python v2 composite
  • pypa/gh-action-pypi-publish master composite
Dockerfile docker
  • ubuntu 20.04 build
.binder/requirements.txt pypi
setup.py pypi