AMLTK
AMLTK: A Modular AutoML Toolkit in Python - Published in JOSS (2024)
Science Score: 100.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
2 of 13 committers (15.4%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords from Contributors
Scientific Fields
Repository
A build-it-yourself AutoML Framework
Basic Info
- Host: GitHub
- Owner: automl
- License: bsd-3-clause
- Language: Python
- Default Branch: main
- Homepage: https://automl.github.io/amltk/
- Size: 27.8 MB
Statistics
- Stars: 72
- Watchers: 10
- Forks: 6
- Open Issues: 34
- Releases: 24
Metadata Files
README.md
AutoML Toolkit
A framework for building an AutoML System. The toolkit is designed to be modular and extensible, allowing you to easily swap out components and integrate your own. The toolkit is designed to be used in a variety of different ways, whether for research purposes, building your own AutoML Tool or educational purposes.
We focus on building complex parametrized pipelines easily, providing tools to optimize these pipeline parameters and lastly, providing tools to schedule compute tasks on a variety of different compute backends, without the need to refactor everything, once you swap out any one of these.
The goal of this toolkit is to drive innovation for AutoML Systems by: 1. Allowing concise research artifacts that can study different design decisions in AutoML. 2. Enabling simple prototypes to scale to the compute you have available. 3. Providing a framework for building real and robust AutoML Systems that are extensible by design.
Please check out our documentation for more:
* Documentation - The homepage
* Guides - How to use the Pipelines, Optimizers and Schedulers in
a walkthrough fashion.
* Reference - A short-overview reference for the various components
of the toolkit.
* Examples - A collection of examples for using the toolkit in
different ways.
* API - The full API reference for the toolkit.
Installation
To install AutoML Toolkit (amltk), you can simply use pip:
bash
pip install amltk
[!TIP] We also provide a list of optional dependencies which you can install if you intend to use them. This allows the toolkit to be as lightweight as possible and play nicely with the tools you use. *
pip install amltk[notebook]- For usage in a notebook *pip install amltk[sklearn]- For usage with scikit-learn *pip install amltk[smac]- For using SMAC as an optimizer *pip install amltk[optuna]- For using Optuna as an optimizer *pip install amltk[pynisher, threadpoolctl, wandb]- Various plugins for running compute tasks *pip install amltk[cluster, dask, loky]- Different compute backends to run from
Install from source
To install from source, you can clone this repo and install with pip:
bash
git clone git@github.com:automl/amltk.git
pip install -e amltk # -e for editable mode
If planning to contribute, you can install the development dependencies but we highly recommend checking out our contributing guide for more.
bash
pip install -e "amltk[dev]"
Features
Here's a brief overview of 3 of the core components from the toolkit:
Pipelines
Define parametrized machine learning pipelines using a fluid API: ```python from sklearn.ensemble import RandomForestClassifier from sklearn.impute import SimpleImputer from sklearn.preprocessing import OneHotEncoder from sklearn.svm import SVC
from amltk.pipeline import Choice, Component, Sequential
pipeline = ( Sequential(name="mypipeline") >> Component(SimpleImputer, space={"strategy": ["mean", "median"]}) # Choose either mean or median >> OneHotEncoder(drop="first") # No parametrization, no problem >> Choice( # Our pipeline can choose between two different estimators Component( RandomForestClassifier, space={"nestimators": (10, 100), "criterion": ["gini", "logloss"]}, config={"maxdepth": 3}, ), Component(SVC, space={"kernel": ["linear", "rbf", "poly"]}), name="estimator", ) )
Parser the search space with implemented or you custom parser
searchspace = pipeline.searchspace(parser="configspace") config = searchspace.sampleconfiguration()
Configure a pipeline
configured_pipeline = pipeline.configure(config)
Build the pipeline with a build, no amltk code in your built model
model = configured_pipeline.build(builder="sklearn") ```
Optimizers
Optimize your pipelines using a variety of different optimizers, with a unified API and a suite of utility for recording and taking control of the optimization loop:
```python from amltk.optimization import Trial, Metric, History
pipeline = ... accuracy = Metric("accuracy", maximize=True, bounds=(0. 1)) inferencetime = Metric("inferencetime", maximize=False)
def evaluate(trial: Trial) -> Trial.Report: model = pipeline.configure(trial.config).build("sklearn")
try:
# Profile the things you'd like
with trial.profile("fit"):
model.fit(...)
except Exception as e:
# Generate reports from exceptions easily
return trial.fail(exception=e)
# Record anything else you'd like
trial.summary["model_size"] = ...
# Store whatever you'd like
trial.store({"model.pkl": model, "predictions.npy": predictions}),
return trial.success(accuracy=0.8, inference_time=...)
Easily swap between optimizers, without needing to change the rest of your code
from amltk.optimization.optimizers.smac import SMACOptimizer from amltk.optimization.optimizers.smac import OptunaOptimizer import random
Optimizer = random.choice([SMACOptimizer, OptunaOptimizer]) smacoptimizer = Optimizer(space=pipeline, metrics=[accuracy, inferencetime], bucket="results")
You decide how your optimization loop should work
history = History() for _ in range(10): trial = optimizer.ask() report = evaluate(trial) history.add(report) optimizer.tell(report)
print(history.df()) ```
[!TIP] Check out our integrated optimizers or integrate your own using the very same API we use!
Scheduling
Schedule your optimization jobs or AutoML tasks on a variety of different compute backends. By leveraging compute workers and asyncio, you can easily scale your compute needs, react to events as they happen and swap backends, without needing to modify your code!
```python from amltk.scheduling import Scheduler
Create a Scheduler with a backend, here 4 processes
scheduler = Scheduler.with_processes(4)
scheduler = Scheduler.with_SLURM(...)
scheduler = Scheduler.with_OAR(...)
scheduler = Scheduler(executor=myowncompute_backend)
Define some compute and wrap it as a task to offload to the scheduler
def expensive_function(x: int) -> float: return (2 ** x) / x
task = scheduler.task(expensive_function)
numbers = range(-5, 5) results = []
When the scheduler starts, submit 4 tasks to the processes
@scheduler.onstart(repeat=4) def onstart(): n = next(numbers) task.submit(n)
When the task is done, store the result
@task.onresult def onresult(_, result: float): results.append(result)
Easy to incrementently add more functionallity
@task.onresult def launchnext(_, result: float): if (n := next(numbers, None)) is not None: task.submit(n)
React to issues when they happen
@task.onexception def stopsomethingwentwrong(_, exception: Exception): scheduler.stop()
Start the scheduler and run it as you like
scheduler.run(timeout=10)
... await scheduler.async_run() for servers and real-time applications
```
[!TIP] Check out our integrated compute backends or use your own!
Extra Material
Owner
- Name: AutoML-Freiburg-Hannover
- Login: automl
- Kind: organization
- Location: Freiburg and Hannover, Germany
- Website: www.automl.org
- Repositories: 186
- Profile: https://github.com/automl
JOSS Publication
AMLTK: A Modular AutoML Toolkit in Python
Authors
Tags
Machine Learning AutoML Hyperparameter Optimization Modular Data ScienceCitation (CITATION.cff)
cff-version: "1.2.0"
authors:
- family-names: Bergman
given-names: Edward
orcid: "https://orcid.org/0009-0003-4390-7614"
- family-names: Feurer
given-names: Matthias
orcid: "https://orcid.org/0000-0001-9611-8588"
- family-names: Bahram
given-names: Aron
orcid: "https://orcid.org/0009-0002-8896-2863"
- family-names: Balef
given-names: Amir Rezaei
orcid: "https://orcid.org/0000-0002-6882-0051"
- family-names: Purucker
given-names: Lennart
orcid: "https://orcid.org/0009-0001-1181-0549"
- family-names: Segel
given-names: Sarah
orcid: "https://orcid.org/0009-0005-2966-266X"
- family-names: Lindauer
given-names: Marius
orcid: "https://orcid.org/0000-0002-9675-3175"
- family-names: Hutter
given-names: Frank
orcid: "https://orcid.org/0000-0002-2037-3694"
- family-names: Eggensperger
given-names: Katharina
orcid: "https://orcid.org/0000-0002-0309-401X"
contact:
- family-names: Bergman
given-names: Edward
orcid: "https://orcid.org/0009-0003-4390-7614"
- family-names: Feurer
given-names: Matthias
orcid: "https://orcid.org/0000-0001-9611-8588"
- family-names: Lindauer
given-names: Marius
orcid: "https://orcid.org/0000-0002-9675-3175"
- family-names: Hutter
given-names: Frank
orcid: "https://orcid.org/0000-0002-2037-3694"
- family-names: Eggensperger
given-names: Katharina
orcid: "https://orcid.org/0000-0002-0309-401X"
doi: 10.5281/zenodo.13309537
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Bergman
given-names: Edward
orcid: "https://orcid.org/0009-0003-4390-7614"
- family-names: Feurer
given-names: Matthias
orcid: "https://orcid.org/0000-0001-9611-8588"
- family-names: Bahram
given-names: Aron
orcid: "https://orcid.org/0009-0002-8896-2863"
- family-names: Balef
given-names: Amir Rezaei
orcid: "https://orcid.org/0000-0002-6882-0051"
- family-names: Purucker
given-names: Lennart
orcid: "https://orcid.org/0009-0001-1181-0549"
- family-names: Segel
given-names: Sarah
orcid: "https://orcid.org/0009-0005-2966-266X"
- family-names: Lindauer
given-names: Marius
orcid: "https://orcid.org/0000-0002-9675-3175"
- family-names: Hutter
given-names: Frank
orcid: "https://orcid.org/0000-0002-2037-3694"
- family-names: Eggensperger
given-names: Katharina
orcid: "https://orcid.org/0000-0002-0309-401X"
date-published: 2024-08-14
doi: 10.21105/joss.06367
issn: 2475-9066
issue: 100
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 6367
title: "AMLTK: A Modular AutoML Toolkit in Python"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.06367"
volume: 9
title: "AMLTK: A Modular AutoML Toolkit in Python"
GitHub Events
Total
- Issues event: 1
- Watch event: 11
- Member event: 2
- Issue comment event: 8
- Push event: 1
- Pull request review event: 2
- Pull request review comment event: 1
- Pull request event: 2
- Fork event: 2
Last Year
- Issues event: 1
- Watch event: 11
- Member event: 2
- Issue comment event: 8
- Push event: 1
- Pull request review event: 2
- Pull request review comment event: 1
- Pull request event: 2
- Fork event: 2
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| eddiebergman | e****s@g****m | 462 |
| github-actions[bot] | g****] | 42 |
| dependabot[bot] | 4****] | 8 |
| Vladislav Moroshan | v****n@g****m | 5 |
| Aron Bahram | a****m@g****m | 4 |
| Matthias Feurer | l****s@m****e | 4 |
| Pieter Gijsbers | p****s@t****l | 3 |
| Ravin Kohli | 1****i | 2 |
| Sarah Segel | 3****l | 1 |
| Lennart Purucker | c****t@l****m | 1 |
| Katharina Eggensperger | k****r@u****e | 1 |
| Benjamin Rombaut | b****t@g****m | 1 |
| Amir Rezaei Balef | a****f@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 40
- Total pull requests: 67
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 7
- Average comments per issue: 1.6
- Average comments per pull request: 0.57
- Merged pull requests: 58
- Bot issues: 0
- Bot pull requests: 7
Past Year
- Issues: 2
- Pull requests: 2
- Average time to close issues: about 1 month
- Average time to close pull requests: about 23 hours
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 3.5
- Average comments per pull request: 2.5
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- eddiebergman (28)
- gomezzz (6)
- vladislavalerievich (1)
- LennartPurucker (1)
- berombau (1)
- dependabot[bot] (1)
- amirbalef (1)
Pull Request Authors
- eddiebergman (89)
- vladislavalerievich (10)
- dependabot[bot] (8)
- LennartPurucker (2)
- berombau (2)
- PGijsbers (2)
- sarah-segel (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 118 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 23
- Total maintainers: 1
pypi.org: amltk
AutoML Toolkit: a toolkit for building automl system
- Documentation: https://amltk.readthedocs.io/
- License: Copyright 2023 AutoML-Freiburg-Hannover-Tübingen Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
Latest release: 1.12.1
published over 1 year ago
Rankings
Maintainers (1)
Dependencies
- actions/cache v3 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- actions/checkout v3 composite
- actions/setup-python v4 composite
- more_itertools *
- numpy *
- pandas *
- psutil *
- typing_extensions *
- actions/checkout v3 composite
- actions/download-artifact v3 composite
- actions/setup-python v4 composite
- actions/upload-artifact v3 composite
- commitizen-tools/commitizen-action master composite
- pypa/gh-action-pypi-publish release/v1 composite
