openml

OpenML's Python API for a World of Data and More 💫

https://github.com/openml/openml-python

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • ✓
    CITATION.cff file
    Found CITATION.cff file
  • ✓
    codemeta.json file
    Found codemeta.json file
  • ✓
    .zenodo.json file
    Found .zenodo.json file
  • â—‹
    DOI references
  • â—‹
    Academic publication links
  • ✓
    Committers with academic emails
    5 of 53 committers (9.4%) from academic institutions
  • â—‹
    Institutional organization owner
  • â—‹
    JOSS paper metadata
  • â—‹
    Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary

Keywords

benchmarking data datascience machine-learning meta-learning openml python tabular-data

Keywords from Contributors

closember neuroimaging parallel mesh fmri molecular-dynamics-simulation energy-system battery gtk qt
Last synced: 6 months ago · JSON representation ·

Repository

OpenML's Python API for a World of Data and More 💫

Basic Info
Statistics
  • Stars: 304
  • Watchers: 22
  • Forks: 153
  • Open Issues: 94
  • Releases: 14
Topics
benchmarking data datascience machine-learning meta-learning openml python tabular-data
Created almost 12 years ago · Last pushed 6 months ago
Metadata Files
Readme Contributing License Citation

README.md

    OpenML Logo

    OpenML-Python

    Python Logo

The Python API for a World of Data and More :dizzy:

Latest Release Python Versions Downloads License <!-- Add green badges for CI and precommit -->

Installation | Documentation | Contribution guidelines

OpenML-Python provides an easy-to-use and straightforward Python interface for OpenML, an online platform for open science collaboration in machine learning. It can download or upload data from OpenML, such as datasets and machine learning experiment results.

:joystick: Minimal Example

Use the following code to get the credit-g dataset:

```python import openml

dataset = openml.datasets.getdataset("credit-g") # or by ID getdataset(31) X, y, categoricalindicator, attributenames = dataset.get_data(target="class") ```

Get a task for supervised classification on credit-g:

```python import openml

task = openml.tasks.gettask(31) dataset = task.getdataset() X, y, categoricalindicator, attributenames = dataset.getdata(target=task.targetname)

get splits for the first fold of 10-fold cross-validation

trainindices, testindices = task.gettraintestsplitindices(fold=0) ```

Use an OpenML benchmarking suite to get a curated list of machine-learning tasks: ```python import openml

suite = openml.study.getsuite("amlb-classification-all") # Get a curated list of tasks for classification for taskid in suite.tasks: task = openml.tasks.gettask(taskid) ```

:magic_wand: Installation

OpenML-Python is supported on Python 3.8 - 3.13 and is available on Linux, MacOS, and Windows.

You can install OpenML-Python with:

bash pip install openml

:pagefacingup: Citing OpenML-Python

If you use OpenML-Python in a scientific publication, we would appreciate a reference to the following paper:

Matthias Feurer, Jan N. van Rijn, Arlind Kadra, Pieter Gijsbers, Neeratyoy Mallik, Sahithya Ravi, Andreas Müller, Joaquin Vanschoren, Frank Hutter
OpenML-Python: an extensible Python API for OpenML
Journal of Machine Learning Research, 22(100):1−5, 2021

Bibtex entry: bibtex @article{JMLR:v22:19-920, author = {Matthias Feurer and Jan N. van Rijn and Arlind Kadra and Pieter Gijsbers and Neeratyoy Mallik and Sahithya Ravi and Andreas Müller and Joaquin Vanschoren and Frank Hutter}, title = {OpenML-Python: an extensible Python API for OpenML}, journal = {Journal of Machine Learning Research}, year = {2021}, volume = {22}, number = {100}, pages = {1--5}, url = {http://jmlr.org/papers/v22/19-920.html} }

Owner

  • Name: OpenML
  • Login: openml
  • Kind: organization
  • Email: openmlhq@googlegroups.com
  • Location: The Future

Open, Networked Machine Learning

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software in a publication, please cite the metadata from preferred-citation."
preferred-citation:
  type: article
  authors:
  - family-names: "Feurer"
    given-names: "Matthias"
    orcid: "https://orcid.org/0000-0001-9611-8588"
  - family-names: "van Rijn"
    given-names: "Jan N."
    orcid: "https://orcid.org/0000-0003-2898-2168"
  - family-names: "Kadra"
    given-names: "Arlind"
  - family-names: "Gijsbers"
    given-names: "Pieter"
    orcid: "https://orcid.org/0000-0001-7346-8075"
  - family-names: "Mallik"
    given-names: "Neeratyoy"
    orcid: "https://orcid.org/0000-0002-0598-1608"
  - family-names: "Ravi"
    given-names: "Sahithya"
  - family-names: "Müller"
    given-names: "Andreas"
    orcid: "https://orcid.org/0000-0002-2349-9428"
  - family-names: "Vanschoren"
    given-names: "Joaquin"
    orcid: "https://orcid.org/0000-0001-7044-9805"
  - family-names: "Hutter"
    given-names: "Frank"
    orcid: "https://orcid.org/0000-0002-2037-3694"
  journal: "Journal of Machine Learning Research"
  title: "OpenML-Python: an extensible Python API for OpenML"
  abstract: "OpenML is an online platform for open science collaboration in machine learning, used to share datasets and results of machine learning experiments. In this paper, we introduce OpenML-Python, a client API for Python, which opens up the OpenML platform for a wide range of Python-based machine learning tools. It provides easy access to all datasets, tasks and experiments on OpenML from within Python. It also provides functionality to conduct machine learning experiments, upload the results to OpenML, and reproduce results which are stored on OpenML. Furthermore, it comes with a scikit-learn extension and an extension mechanism to easily integrate other machine learning libraries written in Python into the OpenML ecosystem. Source code and documentation are available at https://github.com/openml/openml-python/."
  volume: 22
  year: 2021
  start: 1
  end: 5
  pages: 5
  number: 100
  url: https://jmlr.org/papers/v22/19-920.html

GitHub Events

Total
  • Create event: 35
  • Release event: 1
  • Issues event: 25
  • Watch event: 25
  • Delete event: 42
  • Issue comment event: 95
  • Push event: 189
  • Pull request review comment event: 48
  • Pull request review event: 82
  • Pull request event: 83
  • Fork event: 12
Last Year
  • Create event: 35
  • Release event: 1
  • Issues event: 25
  • Watch event: 25
  • Delete event: 42
  • Issue comment event: 95
  • Push event: 189
  • Pull request review comment event: 48
  • Pull request review event: 82
  • Pull request event: 83
  • Fork event: 12

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 1,353
  • Total Committers: 53
  • Avg Commits per committer: 25.528
  • Development Distribution Score (DDS): 0.696
Past Year
  • Commits: 42
  • Committers: 9
  • Avg Commits per committer: 4.667
  • Development Distribution Score (DDS): 0.667
Top Committers
Name Email Commits
Matthias Feurer f****m@i****e 411
Jan van Rijn j****n@g****m 289
PGijsbers p****s@t****l 159
Andreas Mueller a****r@n****u 134
neeratyoy n****y@g****m 103
Joaquin Vanschoren j****n@g****m 32
Lennart Purucker p****r@c****e 30
sahithyaravi1493 s****3@g****m 29
Arlind Kadra a****a@g****m 28
Zardaloop f****i@g****m 22
dependabot[bot] 4****] 13
Sahithya Ravi 4****3 13
Eddie Bergman e****s@g****m 12
pre-commit-ci[bot] 6****] 12
janvanrijn v****n@c****e 11
Guillaume Lemaitre g****8@g****m 7
a-moadel 4****l 4
Jesper van Engelen c****t@j****l 3
Vishal Parmar v****2@g****m 3
allcontributors[bot] 4****] 2
nabenabe0928 s****o@g****m 2
prabhant p****h@g****m 2
toon t****k@g****m 2
Pieter Gijsbers P****s 1
Abraham Francis a****9@g****m 1
zikun 3****n 1
chadmarchand 3****d 1
William Raynaut w****t@g****m 1
Will Martin 3****n 1
Tim Andrews t****1@g****m 1
and 23 more...

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 108
  • Total pull requests: 207
  • Average time to close issues: about 2 years
  • Average time to close pull requests: 3 months
  • Total issue authors: 36
  • Total pull request authors: 30
  • Average comments per issue: 2.55
  • Average comments per pull request: 2.05
  • Merged pull requests: 138
  • Bot issues: 0
  • Bot pull requests: 30
Past Year
  • Issues: 7
  • Pull requests: 84
  • Average time to close issues: 3 months
  • Average time to close pull requests: 15 days
  • Issue authors: 6
  • Pull request authors: 13
  • Average comments per issue: 0.71
  • Average comments per pull request: 1.13
  • Merged pull requests: 45
  • Bot issues: 0
  • Bot pull requests: 4
Top Authors
Issue Authors
  • PGijsbers (32)
  • mfeurer (11)
  • joaquinvanschoren (7)
  • eddiebergman (7)
  • amueller (5)
  • ArlindKadra (5)
  • ArturDev42 (4)
  • LennartPurucker (2)
  • janvanrijn (2)
  • Neeratyoy (2)
  • learsi1911 (2)
  • pseudotensor (2)
  • Taniya-Das (2)
  • remram44 (1)
  • xieleo5 (1)
Pull Request Authors
  • LennartPurucker (68)
  • PGijsbers (58)
  • dependabot[bot] (28)
  • eddiebergman (21)
  • pre-commit-ci[bot] (19)
  • mfeurer (16)
  • v-parmar (10)
  • SubhadityaMukherjee (9)
  • samplecatalina (6)
  • knyazer (4)
  • janvanrijn (4)
  • Kang13531 (4)
  • ArlindKadra (3)
  • BrunoBelucci (2)
  • Taniya-Das (2)
Top Labels
Issue Labels
Data (17) enhancement (16) Documentation (15) Good First Issue (13) serverside (12) Feature request (9) CI (8) bug (8) Run (6) Flow (4) testing (3) priority (3) Task (2) Requires Feedback (2) dependencies (2) wontfix (1) in progress (1)
Pull Request Labels
dependencies (30) CI (6) testing (5) bug (4) enhancement (3) later (2) Documentation (2) priority (2) in progress (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 29,364 last-month
  • Total docker downloads: 2,304
  • Total dependent packages: 29
    (may contain duplicates)
  • Total dependent repositories: 188
    (may contain duplicates)
  • Total versions: 22
  • Total maintainers: 3
pypi.org: openml

Python API for OpenML

  • Documentation: https://openml.readthedocs.io/
  • License: BSD 3-Clause License Copyright (c) 2014-2019, Matthias Feurer, Jan van Rijn, Andreas Müller, Joaquin Vanschoren and others. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. License of the files CONTRIBUTING.md, ISSUE_TEMPLATE.md and PULL_REQUEST_TEMPLATE.md: Those files are modifications of the respecting templates in scikit-learn and they are licensed under a New BSD license: New BSD License Copyright (c) 2007–2018 The scikit-learn developers. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. b. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. c. Neither the name of the Scikit-learn Developers nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  • Latest release: 0.15.1
    published about 1 year ago
  • Versions: 16
  • Dependent Packages: 29
  • Dependent Repositories: 180
  • Downloads: 29,364 Last month
  • Docker Downloads: 2,304
Rankings
Dependent packages count: 0.6%
Downloads: 1.0%
Dependent repos count: 1.1%
Docker downloads count: 1.7%
Average: 2.0%
Forks count: 3.9%
Stargazers count: 3.9%
Last synced: 6 months ago
conda-forge.org: openml
  • Homepage: https://openml.org/
  • License: BSD-3-Clause
  • Latest release: 0.12.2
    published over 4 years ago
  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 4
Rankings
Forks count: 15.7%
Dependent repos count: 16.0%
Stargazers count: 23.7%
Average: 26.8%
Dependent packages count: 51.6%
Last synced: 6 months ago
anaconda.org: openml

OpenML-Python provides an easy-to-use and straightforward Python interface for OpenML, an online platform for open science collaboration in machine learning. It can download or upload data from OpenML, such as datasets and machine learning experiment results.

  • Homepage: https://openml.org
  • License: BSD-3-Clause
  • Latest release: 0.15.1
    published 7 months ago
  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 4
Rankings
Forks count: 28.1%
Stargazers count: 36.7%
Average: 40.2%
Dependent repos count: 44.7%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

.github/workflows/dist.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/docs.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/pre-commit.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/release_docker.yaml actions
  • actions/checkout v2 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
  • docker/setup-buildx-action v1 composite
  • docker/setup-qemu-action v1 composite
.github/workflows/test.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • codecov/codecov-action v1 composite
docker/Dockerfile docker
  • python 3 build
setup.py pypi
  • liac-arff >=2.4.0
  • minio *
  • numpy >=1.6.2
  • pandas >=1.0.0
  • pyarrow *
  • python-dateutil *
  • requests *
  • scikit-learn >=0.18
  • scipy >=0.13.3
  • xmltodict *