AMLTK

AMLTK: A Modular AutoML Toolkit in Python - Published in JOSS (2024)

https://github.com/automl/amltk

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: joss.theoj.org
✓
Committers with academic emails
2 of 13 committers (15.4%) from academic institutions
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords from Contributors

mesh

Scientific Fields

Artificial Intelligence and Machine Learning Computer Science - 40% confidence

Last synced: 6 months ago · JSON representation ·

Repository

A build-it-yourself AutoML Framework

Basic Info

Host: GitHub
Owner: automl
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage: https://automl.github.io/amltk/
Size: 27.8 MB

Statistics

Stars: 72
Watchers: 10
Forks: 6
Open Issues: 34
Releases: 24

Created over 3 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

AutoML Toolkit

A framework for building an AutoML System. The toolkit is designed to be modular and extensible, allowing you to easily swap out components and integrate your own. The toolkit is designed to be used in a variety of different ways, whether for research purposes, building your own AutoML Tool or educational purposes.

We focus on building complex parametrized pipelines easily, providing tools to optimize these pipeline parameters and lastly, providing tools to schedule compute tasks on a variety of different compute backends, without the need to refactor everything, once you swap out any one of these.

The goal of this toolkit is to drive innovation for AutoML Systems by: 1. Allowing concise research artifacts that can study different design decisions in AutoML. 2. Enabling simple prototypes to scale to the compute you have available. 3. Providing a framework for building real and robust AutoML Systems that are extensible by design.

Please check out our documentation for more: * Documentation - The homepage * Guides - How to use the Pipelines, Optimizers and Schedulers in a walkthrough fashion. * Reference - A short-overview reference for the various components of the toolkit. * Examples - A collection of examples for using the toolkit in different ways. * API - The full API reference for the toolkit.

Installation

To install AutoML Toolkit (amltk), you can simply use pip:

bash pip install amltk

[!TIP] We also provide a list of optional dependencies which you can install if you intend to use them. This allows the toolkit to be as lightweight as possible and play nicely with the tools you use. * pip install amltk[notebook] - For usage in a notebook * pip install amltk[sklearn] - For usage with scikit-learn * pip install amltk[smac] - For using SMAC as an optimizer * pip install amltk[optuna] - For using Optuna as an optimizer * pip install amltk[pynisher, threadpoolctl, wandb] - Various plugins for running compute tasks * pip install amltk[cluster, dask, loky] - Different compute backends to run from

Install from source

To install from source, you can clone this repo and install with pip:

bash git clone git@github.com:automl/amltk.git pip install -e amltk # -e for editable mode

If planning to contribute, you can install the development dependencies but we highly recommend checking out our contributing guide for more.

bash pip install -e "amltk[dev]"

Features

Here's a brief overview of 3 of the core components from the toolkit:

Pipelines

Define parametrized machine learning pipelines using a fluid API: ```python from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer from sklearn.preprocessing import OneHotEncoder from sklearn.svm import SVC

from amltk.pipeline import Choice, Component, Sequential

pipeline = ( Sequential(name="mypipeline") >> Component(SimpleImputer, space={"strategy": ["mean", "median"]}) # Choose either mean or median >> OneHotEncoder(drop="first") # No parametrization, no problem >> Choice( # Our pipeline can choose between two different estimators Component( RandomForestClassifier, space={"nestimators": (10, 100), "criterion": ["gini", "logloss"]}, config={"maxdepth": 3}, ), Component(SVC, space={"kernel": ["linear", "rbf", "poly"]}), name="estimator", ) )

Parser the search space with implemented or you custom parser

searchspace = pipeline.searchspace(parser="configspace") config = searchspace.sampleconfiguration()

Configure a pipeline

configured_pipeline = pipeline.configure(config)

Build the pipeline with a build, no amltk code in your built model

model = configured_pipeline.build(builder="sklearn") ```

Optimizers

Optimize your pipelines using a variety of different optimizers, with a unified API and a suite of utility for recording and taking control of the optimization loop:

```python from amltk.optimization import Trial, Metric, History

pipeline = ... accuracy = Metric("accuracy", maximize=True, bounds=(0. 1)) inferencetime = Metric("inferencetime", maximize=False)

def evaluate(trial: Trial) -> Trial.Report: model = pipeline.configure(trial.config).build("sklearn")

try:
    # Profile the things you'd like
    with trial.profile("fit"):
        model.fit(...)

except Exception as e:
    # Generate reports from exceptions easily
    return trial.fail(exception=e)

# Record anything else you'd like
trial.summary["model_size"] = ...

# Store whatever you'd like
trial.store({"model.pkl": model, "predictions.npy": predictions}),
return trial.success(accuracy=0.8, inference_time=...)

Easily swap between optimizers, without needing to change the rest of your code

from amltk.optimization.optimizers.smac import SMACOptimizer from amltk.optimization.optimizers.smac import OptunaOptimizer import random

Optimizer = random.choice([SMACOptimizer, OptunaOptimizer]) smacoptimizer = Optimizer(space=pipeline, metrics=[accuracy, inferencetime], bucket="results")

You decide how your optimization loop should work

history = History() for _ in range(10): trial = optimizer.ask() report = evaluate(trial) history.add(report) optimizer.tell(report)

print(history.df()) ```

[!TIP] Check out our integrated optimizers or integrate your own using the very same API we use!

Scheduling

Schedule your optimization jobs or AutoML tasks on a variety of different compute backends. By leveraging compute workers and asyncio, you can easily scale your compute needs, react to events as they happen and swap backends, without needing to modify your code!

```python from amltk.scheduling import Scheduler

Create a Scheduler with a backend, here 4 processes

scheduler = Scheduler.with_processes(4)

scheduler = Scheduler.with_SLURM(...)

scheduler = Scheduler.with_OAR(...)

scheduler = Scheduler(executor=myowncompute_backend)

Define some compute and wrap it as a task to offload to the scheduler

def expensive_function(x: int) -> float: return (2 ** x) / x

task = scheduler.task(expensive_function)

numbers = range(-5, 5) results = []

When the scheduler starts, submit 4 tasks to the processes

@scheduler.onstart(repeat=4) def onstart(): n = next(numbers) task.submit(n)

When the task is done, store the result

@task.onresult def onresult(_, result: float): results.append(result)

Easy to incrementently add more functionallity

@task.onresult def launchnext(_, result: float): if (n := next(numbers, None)) is not None: task.submit(n)

React to issues when they happen

@task.onexception def stopsomethingwentwrong(_, exception: Exception): scheduler.stop()

Start the scheduler and run it as you like

scheduler.run(timeout=10)

... await scheduler.async_run() for servers and real-time applications

```

[!TIP] Check out our integrated compute backends or use your own!

Extra Material

AutoML Fall School 2023 Colab

Owner

Name: AutoML-Freiburg-Hannover
Login: automl
Kind: organization
Location: Freiburg and Hannover, Germany

Website: www.automl.org
Repositories: 186
Profile: https://github.com/automl

JOSS Publication

AMLTK: A Modular AutoML Toolkit in Python

Published

August 14, 2024

DOI

10.21105/joss.06367

Volume 9, Issue 100, Page 6367

Authors

Edward Bergman

University of Freiburg, Germany

Matthias Feurer

LMU Munich, Germany, Munich Center for Machine Learning

Aron Bahram

University of Freiburg, Germany

Amir Rezaei Balef

University of Tübingen, Germany

Lennart Purucker

University of Freiburg, Germany

Sarah Segel

Leibniz University Hannover, Germany

Marius Lindauer

Leibniz University Hannover, Germany, L3S Research Center, Germany

Frank Hutter

University of Freiburg, Germany

Katharina Eggensperger

University of Tübingen, Germany

Editor

Josh Borrow

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Bergman
  given-names: Edward
  orcid: "https://orcid.org/0009-0003-4390-7614"
- family-names: Feurer
  given-names: Matthias
  orcid: "https://orcid.org/0000-0001-9611-8588"
- family-names: Bahram
  given-names: Aron
  orcid: "https://orcid.org/0009-0002-8896-2863"
- family-names: Balef
  given-names: Amir Rezaei
  orcid: "https://orcid.org/0000-0002-6882-0051"
- family-names: Purucker
  given-names: Lennart
  orcid: "https://orcid.org/0009-0001-1181-0549"
- family-names: Segel
  given-names: Sarah
  orcid: "https://orcid.org/0009-0005-2966-266X"
- family-names: Lindauer
  given-names: Marius
  orcid: "https://orcid.org/0000-0002-9675-3175"
- family-names: Hutter
  given-names: Frank
  orcid: "https://orcid.org/0000-0002-2037-3694"
- family-names: Eggensperger
  given-names: Katharina
  orcid: "https://orcid.org/0000-0002-0309-401X"
contact:
- family-names: Bergman
  given-names: Edward
  orcid: "https://orcid.org/0009-0003-4390-7614"
- family-names: Feurer
  given-names: Matthias
  orcid: "https://orcid.org/0000-0001-9611-8588"
- family-names: Lindauer
  given-names: Marius
  orcid: "https://orcid.org/0000-0002-9675-3175"
- family-names: Hutter
  given-names: Frank
  orcid: "https://orcid.org/0000-0002-2037-3694"
- family-names: Eggensperger
  given-names: Katharina
  orcid: "https://orcid.org/0000-0002-0309-401X"
doi: 10.5281/zenodo.13309537
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Bergman
    given-names: Edward
    orcid: "https://orcid.org/0009-0003-4390-7614"
  - family-names: Feurer
    given-names: Matthias
    orcid: "https://orcid.org/0000-0001-9611-8588"
  - family-names: Bahram
    given-names: Aron
    orcid: "https://orcid.org/0009-0002-8896-2863"
  - family-names: Balef
    given-names: Amir Rezaei
    orcid: "https://orcid.org/0000-0002-6882-0051"
  - family-names: Purucker
    given-names: Lennart
    orcid: "https://orcid.org/0009-0001-1181-0549"
  - family-names: Segel
    given-names: Sarah
    orcid: "https://orcid.org/0009-0005-2966-266X"
  - family-names: Lindauer
    given-names: Marius
    orcid: "https://orcid.org/0000-0002-9675-3175"
  - family-names: Hutter
    given-names: Frank
    orcid: "https://orcid.org/0000-0002-2037-3694"
  - family-names: Eggensperger
    given-names: Katharina
    orcid: "https://orcid.org/0000-0002-0309-401X"
  date-published: 2024-08-14
  doi: 10.21105/joss.06367
  issn: 2475-9066
  issue: 100
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 6367
  title: "AMLTK: A Modular AutoML Toolkit in Python"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.06367"
  volume: 9
title: "AMLTK: A Modular AutoML Toolkit in Python"

GitHub Events

Total

Issues event: 1
Watch event: 11
Member event: 2
Issue comment event: 8
Push event: 1
Pull request review event: 2
Pull request review comment event: 1
Pull request event: 2
Fork event: 2

Last Year

Issues event: 1
Watch event: 11
Member event: 2
Issue comment event: 8
Push event: 1
Pull request review event: 2
Pull request review comment event: 1
Pull request event: 2
Fork event: 2

Committers

Last synced: 7 months ago

All Time

Total Commits: 535
Total Committers: 13
Avg Commits per committer: 41.154
Development Distribution Score (DDS): 0.136

Past Year

Commits: 4
Committers: 3
Avg Commits per committer: 1.333
Development Distribution Score (DDS): 0.5

Top Committers

Name	Email	Commits
eddiebergman	e**s@g**m	462
github-actions[bot]	g****]	42
dependabot[bot]	4****]	8
Vladislav Moroshan	v**n@g**m	5
Aron Bahram	a**m@g**m	4
Matthias Feurer	l**s@m**e	4
Pieter Gijsbers	p**s@t**l	3
Ravin Kohli	1****i	2
Sarah Segel	3****l	1
Lennart Purucker	c**t@l**m	1
Katharina Eggensperger	k**r@u**e	1
Benjamin Rombaut	b**t@g**m	1
Amir Rezaei Balef	a**f@g**m	1

Committer Domains (Top 20 + Academic)

uni-tuebingen.de: 1 lennart-purucker.com: 1 tue.nl: 1 matthiasfeurer.de: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 40
Total pull requests: 67
Average time to close issues: about 1 month
Average time to close pull requests: 3 days
Total issue authors: 7
Total pull request authors: 7
Average comments per issue: 1.6
Average comments per pull request: 0.57
Merged pull requests: 58
Bot issues: 0
Bot pull requests: 7

Past Year

Issues: 2
Pull requests: 2
Average time to close issues: about 1 month
Average time to close pull requests: about 23 hours
Issue authors: 2
Pull request authors: 2
Average comments per issue: 3.5
Average comments per pull request: 2.5
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

eddiebergman (28)
gomezzz (6)
vladislavalerievich (1)
LennartPurucker (1)
berombau (1)
dependabot[bot] (1)
amirbalef (1)

Pull Request Authors

eddiebergman (89)
vladislavalerievich (10)
dependabot[bot] (8)
LennartPurucker (2)
berombau (2)
PGijsbers (2)
sarah-segel (1)

Top Labels

Issue Labels

bug (10) feature (7) ux (5) documentation (4) ci (2) decision (2) test (1) blocked (1) refactor (1) dependencies (1) python (1)

Pull Request Labels

feature (27) bug (18) ux (12) dependencies (8) python (6) documentation (6) ci (5) chore (4) deprication (4) refactor (2) github_actions (2) invalid (1)

Packages

Total packages: 1
Total downloads:
- pypi 118 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 23
Total maintainers: 1

pypi.org: amltk

AutoML Toolkit: a toolkit for building automl system

Documentation: https://amltk.readthedocs.io/
License: Copyright 2023 AutoML-Freiburg-Hannover-Tübingen Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Latest release: 1.12.1
published over 1 year ago

Versions: 23
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 118 Last month

Rankings

Dependent packages count: 10.0%

Average: 38.8%

Dependent repos count: 67.6%

Maintainers (1)

eddiebergman

Last synced: 6 months ago

Dependencies

.github/workflows/docs.yml actions

actions/cache v3 composite
actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/pre-commit.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
actions/setup-python v4 composite

pyproject.toml pypi

more_itertools *
numpy *
pandas *
psutil *
typing_extensions *

.github/workflows/release.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
commitizen-tools/commitizen-action master composite
pypa/gh-action-pypi-publish release/v1 composite

AMLTK

Science Score: 100.0%

Keywords from Contributors

Scientific Fields

Repository

Basic Info

Statistics

Metadata Files

README.md

AutoML Toolkit

Installation

Install from source

Features

Pipelines

Parser the search space with implemented or you custom parser

Configure a pipeline

Build the pipeline with a build, no amltk code in your built model

Optimizers

Easily swap between optimizers, without needing to change the rest of your code

You decide how your optimization loop should work

Scheduling

Create a Scheduler with a backend, here 4 processes

scheduler = Scheduler.with_SLURM(...)

scheduler = Scheduler.with_OAR(...)

scheduler = Scheduler(executor=myowncompute_backend)

Define some compute and wrap it as a task to offload to the scheduler

When the scheduler starts, submit 4 tasks to the processes

When the task is done, store the result

Easy to incrementently add more functionallity

React to issues when they happen

Start the scheduler and run it as you like

... await scheduler.async_run() for servers and real-time applications

Extra Material

Owner

JOSS Publication

AMLTK: A Modular AutoML Toolkit in Python

Authors

Editor

Tags

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: amltk

Rankings

Maintainers (1)

Dependencies