amlb

OpenML AutoML Benchmarking Framework

https://github.com/openml/automlbenchmark

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
✓
Committers with academic emails
7 of 31 committers (22.6%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.5%) to scientific vocabulary

Keywords

automl benchmark machine-learning

Keywords from Contributors

meta-learning tabular-data benchmarking datascience openml huggingface

Last synced: 6 months ago · JSON representation ·

Repository

OpenML AutoML Benchmarking Framework

Basic Info

Host: GitHub
Owner: openml
License: mit
Language: Python
Default Branch: master
Homepage: https://openml.github.io/automlbenchmark
Size: 115 MB

Statistics

Stars: 432
Watchers: 15
Forks: 138
Open Issues: 117
Releases: 14

Topics

automl benchmark machine-learning

Created over 7 years ago · Last pushed 6 months ago

Metadata Files

Readme Contributing License Citation

AutoML Benchmark

The OpenML AutoML Benchmark provides a framework for evaluating and comparing open-source AutoML systems. The system is extensible because you can add your own AutoML frameworks and datasets. For a thorough explanation of the benchmark, and evaluation of results, you can read our paper.

Automatic Machine Learning (AutoML) systems automatically build machine learning pipelines or neural architectures in a data-driven, objective, and automatic way. They automate a lot of drudge work in designing machine learning systems, so that better systems can be developed, faster. However, AutoML research is also slowed down by two factors:

We currently lack standardized, easily-accessible benchmarking suites of tasks (datasets) that are curated to reflect important problem domains, practical to use, and sufficiently challenging to support a rigorous analysis of performance results.
Subtle differences in the problem definition, such as the design of the hyperparameter search space or the way time budgets are defined, can drastically alter a task’s difficulty. This issue makes it difficult to reproduce published research and compare results from different papers.

This toolkit aims to address these problems by setting up standardized environments for in-depth experimentation with a wide range of AutoML systems.

Website: https://openml.github.io/automlbenchmark/index.html

Documentation: https://openml.github.io/automlbenchmark/docs/index.html

Installation: https://openml.github.io/automlbenchmark/docs/getting_started/

Features:

Curated suites of benchmarking datasets from OpenML (regression, classification).
Includes code to benchmark a number of popular AutoML systems on regression and classification tasks.
New AutoML systems can be added
Experiments can be run in Docker or Singularity containers
Execute experiments locally or on AWS

Owner

Name: OpenML
Login: openml
Kind: organization
Email: openmlhq@googlegroups.com
Location: The Future

Website: http://www.openml.org
Twitter: open_ml
Repositories: 56
Profile: https://github.com/openml

Open, Networked Machine Learning

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
title: "AutoML Benchmark"
version: 2.1.7
license: "MIT"
url: "https://github.com/openml/automlbenchmark"
preferred-citation:
  type: article
  authors:
  - family-names: "Gijsbers"
    given-names: "Pieter"
    orcid: "https://orcid.org/0000-0001-7346-8075"
  - family-names: "de Paula Bueno"
    given-names: "Marcos"
  - family-names: "Coors"
    given-names: "Stefan"
    orcid: "https://orcid.org/0000-0001-7346-8075"
  - family-names: "LeDell"
    given-names: "Erin"
  - family-names: "Poirier"
    given-names: "Sébastien"
  - family-names: "Thomas"
    given-names: "Janek"
    orcid: "https://orcid.org/0000-0003-4511-6245"
  - family-names: "Bischl"
    given-names: "Bernd"
    orcid: "https://orcid.org/0000-0001-6002-6980"
  - family-names: "Vanschoren"
    given-names: "Joaquin"
    orcid: "https://orcid.org/0000-0001-7044-9805"
  journal: "Journal of Machine Learning Research"
  start: 1 # First page number
  end: 65 # Last page number
  title: "AMLB: an AutoML Benchmark"
  issue: 101
  volume: 25
  year: 2024
  url: http://jmlr.org/papers/v25/22-0493.html

GitHub Events

Total

Issues event: 49
Watch event: 31
Delete event: 30
Issue comment event: 135
Push event: 113
Pull request review comment event: 23
Pull request review event: 39
Pull request event: 76
Fork event: 8
Create event: 31

Last Year

Issues event: 49
Watch event: 31
Delete event: 30
Issue comment event: 135
Push event: 113
Pull request review comment event: 23
Pull request review event: 39
Pull request event: 76
Fork event: 8
Create event: 31

Committers

Last synced: over 2 years ago

All Time

Total Commits: 1,157
Total Committers: 31
Avg Commits per committer: 37.323
Development Distribution Score (DDS): 0.537

Past Year

Commits: 43
Committers: 8
Avg Commits per committer: 5.375
Development Distribution Score (DDS): 0.209

Top Committers

Name	Email	Commits
Sebastien Poirier	s**n@h**i	536
PGijsbers	p**s@t**l	391
Janek Thomas	j**s@w**e	68
ledell	e**n@h**i	52
Coorsaa	s**s@g**t	14
Piotrek	p**6@g**m	11
mwever	w**r@m**e	11
Joaquin Vanschoren	j**n@g**m	11
chico	f**e@g**m	9
Nick Erickson	n**k@a**m	7
Matthias Feurer	f**m@i**e	6
github-actions	g**s@g**m	6
wever	w**r@p**e	5
Eddie Bergman	e**s@g**m	4
Nick Erickson	i**a@g**m	3
Xiaoyun Zhang	b**g@g**m	3
Francisco Rivera Valverde	4****a	3
Qingyun Wu	q**y@v**u	2
Alan Silva	3****r	2
ja-thomas	j****s	2
Nikolay Nikitin	n**o@y**u	1
Oleksandr Shchur	o**r@g**m	1
LevineHuang	l**g@1**m	1
Nandini Nayar	n**9@c**u	1
a-hanf	a****f	1
TrellixVulnTeam	1****m	1
Weisu Yin	w**y@a**m	1
Oleksandr Shchur	s**o@a**m	1
Robinnibor	r**s@g**m	1
dev-rinchin	5****n	1
and 1 more...

Committer Domains (Top 20 + Academic)

amazon.com: 3 h2o.ai: 2 isys-otfml.cs.uni-paderborn.de: 1 cornell.edu: 1 163.com: 1 yandex.ru: 1 virginia.edu: 1 pc-kb-felix.cs.uni-paderborn.de: 1 github.com: 1 informatik.uni-freiburg.de: 1 mail.uni-paderborn.de: 1 gmx.net: 1 tue.nl: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 127
Total pull requests: 179
Average time to close issues: about 1 year
Average time to close pull requests: about 2 months
Total issue authors: 38
Total pull request authors: 26
Average comments per issue: 2.77
Average comments per pull request: 1.66
Merged pull requests: 134
Bot issues: 0
Bot pull requests: 16

Past Year

Issues: 30
Pull requests: 63
Average time to close issues: about 1 month
Average time to close pull requests: 12 days
Issue authors: 10
Pull request authors: 7
Average comments per issue: 1.87
Average comments per pull request: 1.75
Merged pull requests: 46
Bot issues: 0
Bot pull requests: 16

View more stats

Top Authors

Issue Authors

PGijsbers (50)
Innixma (13)
sebhrusen (9)
sedol1339 (6)
alanwilter (5)
eddiebergman (4)
cynthiamaia (3)
mfeurer (2)
israel-cj (2)
annawiewer (2)
RamlatchxRamspeicher (2)
juliocartier (1)
thenol (1)
dev-rinchin (1)
Robinnibor (1)

Pull Request Authors

PGijsbers (119)
Innixma (23)
pre-commit-ci[bot] (14)
sebhrusen (10)
SubhadityaMukherjee (5)
limpbot (4)
eddiebergman (3)
adibiasio (2)
shchur (2)
alanwilter (2)
Lopa10ko (2)
dmitryglhf (2)
ja-thomas (2)
kimusaku (2)
coderabbitai[bot] (2)

Top Labels

Issue Labels

enhancement (25) framework (13) bug (13) question (8) Documentation (7) automation (6) aws (6) quality (5) container (3) data (3) openml (2) dependencies (2) framework add (2) Benchmark Design (2) Answered (1) good first issue (1) website (1) change (1) external (1) data add (1)

Pull Request Labels

automation (13) Documentation (12) framework (12) enhancement (11) quality (9) bug (8) WIP (5) framework add (3) website (2) needs reviewer (2) aws (2) Benchmark Design (2) tests (2) external (1) dependencies (1) help wanted (1)

Packages

Total packages: 1
Total downloads:
- pypi 30 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 1
Total maintainers: 1

pypi.org: amlb

Benchmarking for AutoML frameworks

Homepage: https://github.com/openml/automlbenchmark
Documentation: https://amlb.readthedocs.io/
License: mit
Latest release: 0.0.1
published over 2 years ago

Versions: 1
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 30 Last month

Rankings

Stargazers count: 3.5%

Forks count: 4.4%

Dependent packages count: 7.6%

Average: 21.2%

Dependent repos count: 69.5%

Maintainers (1)

pgijsbers

Last synced: 6 months ago

Dependencies

examples/custom/extensions/Stacking/requirements.txt pypi

scikit-learn ==0.22.1

frameworks/GAMA/requirements.txt pypi

packaging *

frameworks/H2OAutoML/requirements.txt pypi

colorama >=0.3.8
future *
packaging *
pandas *
requests >=2.10
tabulate *

frameworks/MLPlan/requirements.txt pypi

liac-arff ==2.4
numpy >=1.15,<2.0
pandas >=0.23,<1.0
ruamel.yaml >=0.15,<1.0
scikit-learn >=0.22.2
scipy >=1.5,<1.6
setuptools *
torch >=1.6.0,<1.7.0
tpot >=0.11.0,<0.12
xgboost >=1.1.0,<1.2

frameworks/RandomForest/requirements.txt pypi

pandas *

frameworks/TunedRandomForest/requirements.txt pypi

stopit ==1.1.2

frameworks/autosklearn/requirements.txt pypi

openml *
packaging *
scipy >=0.14.1,<1.7.0

frameworks/autoxgboost/requirements.txt pypi

rpy2 ==2.3.0

frameworks/mlr3automl/requirements.txt pypi

rpy2 ==2.3.0

frameworks/oboe/requirements.txt pypi

cvxpy >=1.0,<2.0
mkl >=1.0.0
multiprocess >=0.70.5
numpy ==1.16.4
openml ==0.10.2
pandas ==0.24.2
scikit-learn ==0.22.1
scipy ==1.4.1
tensorly *

frameworks/ranger/requirements.txt pypi

rpy2 ==2.3.0

frameworks/shared/requirements.in pypi

psutil >=5.4
pyarrow >=4.0
ruamel.yaml >=0.15

frameworks/shared/requirements.txt pypi

numpy ==1.21.0
psutil ==5.8.0
pyarrow ==4.0.1
ruamel.yaml ==0.17.4
ruamel.yaml.clib ==0.2.2

requirements-dev.txt pypi

pip-tools *
pytest *
pytest-mock *

requirements-report.txt pypi

matplotlib *
numpy *
openml *
pandas *
seaborn *
tabulate *

requirements.in pypi

boto3 >=1.9,<2.0
liac-arff >=2.5,<3.0
numpy >=1.20,<2.0
openml ==0.12.2
pandas >=1.2.4,<2.0
psutil >=5.4,<6.0
pyarrow >=4.0
ruamel.yaml >=0.15,<1.0
scikit-learn >=0.24
tables >=3.6

requirements.txt pypi

boto3 ==1.17.74
botocore ==1.20.74
certifi ==2020.12.5
chardet ==4.0.0
idna ==2.10
jmespath ==0.10.0
joblib ==1.0.1
liac-arff ==2.5.0
minio ==7.0.3
numexpr ==2.7.3
numpy ==1.20.3
openml ==0.12.2
pandas ==1.2.4
psutil ==5.8.0
pyarrow ==4.0.0
python-dateutil ==2.8.1
pytz ==2021.1
requests ==2.25.1
ruamel.yaml ==0.17.4
ruamel.yaml.clib ==0.2.2
s3transfer ==0.4.2
scikit-learn ==0.24.2
scipy ==1.6.3
six ==1.16.0
tables ==3.6.1
threadpoolctl ==2.1.0
urllib3 ==1.26.4
xmltodict ==0.12.0

.github/workflows/run_all_frameworks.yml actions

actions/cache v2 composite
actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/versioning-reset.yml actions

actions/checkout v3 composite

.github/workflows/versioning.yml actions

actions/checkout v3 composite
actions/github-script v6 composite
author/action-rollback stable composite

amlb

Science Score: 64.0%

Keywords

Keywords from Contributors

Repository

Basic Info

Statistics

Topics

Metadata Files

docs/readme.md

AutoML Benchmark

Features:

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

pypi.org: amlb

Rankings

Maintainers (1)

Dependencies