Recent Releases of openml

openml - v0.15.1

Will clean up release notes later, highlights:

  • Fix usage of environment variables for locating the default cache and configuration directories by @eddiebergman in https://github.com/openml/openml-python/pull/1359
  • Allow skip trying to download parquet files by setting the OPENML_SKIP_PARQUET variable to true by @PGijsbers in https://github.com/openml/openml-python/pull/1388
  • a lot of maintenance work by @eddiebergman and @LennartPurucker

Thanks to everyone who contributed in any way ❤️

What's Changed

  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/openml/openml-python/pull/1329
  • Bump codecov/codecov-action from 3 to 4 by @dependabot in https://github.com/openml/openml-python/pull/1328
  • Disable docker release on PR by @LennartPurucker in https://github.com/openml/openml-python/pull/1360
  • fix(datasets): Add code 111 for dataset description not found error by @eddiebergman in https://github.com/openml/openml-python/pull/1356
  • Test Fixes for v0.15.1 by @LennartPurucker in https://github.com/openml/openml-python/pull/1358
  • fix: Avoid Random State and Other Test Bug by @LennartPurucker in https://github.com/openml/openml-python/pull/1362
  • fix/maint: Make Docs Work Again and Stop Progress.rst Usage by @LennartPurucker in https://github.com/openml/openml-python/pull/1365
  • doc: README Rework by @LennartPurucker in https://github.com/openml/openml-python/pull/1361
  • doc: make all examples use names instead of IDs as reference. by @LennartPurucker in https://github.com/openml/openml-python/pull/1367
  • fix: avoid stripping whitespaces for feature names by @LennartPurucker in https://github.com/openml/openml-python/pull/1368
  • fix: workaround for git test workflow for Python 3.8 by @LennartPurucker in https://github.com/openml/openml-python/pull/1369
  • add: test for dataset comparison and ignore fields by @LennartPurucker in https://github.com/openml/openml-python/pull/1370
  • fix: github workflows and pytest issue by @LennartPurucker in https://github.com/openml/openml-python/pull/1373
  • feat: support for loose init model from run by @LennartPurucker in https://github.com/openml/openml-python/pull/1371
  • fix/maint: avoid exit code (which kills the docs building) by @LennartPurucker in https://github.com/openml/openml-python/pull/1374
  • ux: Provide helpful link to documentation when error due to missing API token by @eddiebergman in https://github.com/openml/openml-python/pull/1364
  • ci: Docker/build-push-action from 5 to 6 by @dependabot in https://github.com/openml/openml-python/pull/1357
  • ci: Bumb peter-evans/dockerhub-description from 3 to 4 by @dependabot in https://github.com/openml/openml-python/pull/1326
  • fix: resolve Sphinx style error by @LennartPurucker in https://github.com/openml/openml-python/pull/1375
  • docs: fix borken links after openml.org rework by @LennartPurucker in https://github.com/openml/openml-python/pull/1376
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/openml/openml-python/pull/1380
  • [pre-commit.ci] pre-commit autoupdate by @pre-commit-ci in https://github.com/openml/openml-python/pull/1381
  • Mark test as production by @PGijsbers in https://github.com/openml/openml-python/pull/1384
  • Patch release bump by @PGijsbers in https://github.com/openml/openml-python/pull/1389

Full Changelog: https://github.com/openml/openml-python/compare/v0.15.0...v0.15.1

- Python
Published by PGijsbers about 1 year ago

openml - v0.15.0

What's Changed

  • ADD #1335: Improve MinIO support.
    • Add progress bar for downloading MinIO files. Enable it with setting show_progress to true on either openml.config or the configuration file.
    • When using download_all_files, files are only downloaded if they do not yet exist in the cache.
  • FIX #1338: Read the configuration file without overwriting it.
  • MAINT #1340: Add Numpy 2.0 support. Update tests to work with scikit-learn <= 1.5.
  • ADD #1342: Add HTTP header to requests to indicate they are from openml-python.
  • ADD #1345: task.get_dataset now takes the same parameters as openml.datasets.get_dataset to allow fine-grained control over file downloads.
  • MAINT #1346: The ARFF file of a dataset is now only downloaded if parquet is not available.
  • MAINT #1349: Removed usage of the disutils module, which allows for Py3.12 compatibility.
  • MAINT #1351: Image archives are now automatically deleted after they have been downloaded and extracted.
  • MAINT #1352, 1354: When fetching tasks and datasets, file download parameters now default to not downloading the file. Files will be downloaded only when a user tries to access properties which require them (e.g., dataset.qualities or dataset.get_data).

New Contributors

  • @BrunoBelucci made their first contribution in https://github.com/openml/openml-python/pull/1338
  • @knyazer made their first contribution in https://github.com/openml/openml-python/pull/1345

Full Changelog: https://github.com/openml/openml-python/compare/v0.14.2...v0.15.0

- Python
Published by PGijsbers over 1 year ago

openml - Version 0.14.2

This is a minor release to support several hotfixes and technical debt.

  • MAINT #1280: Use the server-provided parquet_url instead of minio_url to determine the location of the parquet file.
  • ADD #716: add documentation for remaining attributes of classes and functions.
  • ADD #1261: more annotations for type hints.
  • MAINT #1294: update tests to new tag specification.
  • FIX #1314: Update fetching a bucket from MinIO.
  • FIX #1315: Make class label retrieval more lenient.
  • ADD #1316: add feature descriptions ontologies support.
  • MAINT #1310/#1307: switch to ruff and resolve all mypy errors.

- Python
Published by LennartPurucker about 2 years ago

openml - Version 0.14

IMPORTANT: This release paves the way towards a breaking update of OpenML-Python. From version 0.15, functions that had the option to return a pandas DataFrame will return a pandas DataFrame by default. This version (0.14) emits a warning if you still use the old access functionality.

More concretely:

  • In 0.15 we will drop the ability to return dictionaries in listing calls and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using output_format="dataframe").
  • In 0.15 we will drop the ability to return datasets as numpy arrays and only provide pandas DataFrames. To disable warnings in 0.14 you have to request a pandas DataFrame (using dataset_format="dataframe").

Furthermore, from version 0.15, OpenML-Python will no longer download datasets and dataset metadata by default. This version (0.14) emits a warning if you don't explicitly specify the desired behavior.

Please see the pull requests #1258 and #1260 for further information.

  • ADD #1081: New flag that allows disabling downloading dataset features.
  • ADD #1132: New flag that forces a redownload of cached data.
  • FIX #1244: Fixes a rare bug where task listing could fail when the server returned invalid data.
  • DOC #1229: Fixes a comment string for the main example.
  • DOC #1241: Fixes a comment in an example.
  • MAINT #1124: Improve naming of helper functions that govern the cache directories.
  • MAINT #1223, #1250: Update tools used in pre-commit to the latest versions (black==23.30, mypy==1.3.0, flake8==6.0.0).
  • MAINT #1253: Update the citation request to the JMLR paper.
  • MAINT #1246: Add a warning that warns the user that checking for duplicate runs on the server cannot be done without an API key.

- Python
Published by mfeurer over 2 years ago

openml - Version 0.13.1

  • ADD #1028: Add functions to delete runs, flows, datasets, and tasks (e.g., openml.datasets.delete_dataset).
  • ADD #1144: Add locally computed results to the OpenMLRun object’s representation if the run was created locally and not downloaded from the server.
  • ADD #1180: Improve the error message when the checksum of a downloaded dataset does not match the checksum provided by the API.
  • ADD #1201: Make OpenMLTraceIteration a dataclass.
  • DOC #1069: Add argument documentation for the OpenMLRun class.
  • FIX #1197 #559 #1131: Fix the order of ground truth and predictions in the OpenMLRun object and in format_prediction.
  • FIX #1198: Support numpy 1.24 and higher.
  • FIX #1216: Allow unknown task types on the server. This is only relevant when new task types are added to the test server.
  • MAINT #1155: Add dependabot github action to automatically update other github actions.
  • MAINT #1199: Obtain pre-commit’s flake8 from github.com instead of gitlab.com.
  • MAINT #1215: Support latest numpy version.
  • MAINT #1218: Test Python3.6 on Ubuntu 20.04 instead of the latest Ubuntu (which is 22.04).
  • MAINT #1221 #1212 #1206 #1211: Update github actions to the latest versions.

- Python
Published by mfeurer almost 3 years ago

openml - Version 0.13.0

Version 0.13.0

  • FIX #1030: pre-commit hooks now no longer should issue a warning.
  • FIX #1058, #1100: Avoid NoneType error when printing task without class_labels attribute.
  • FIX #1110: Make arguments to create_study and create_suite that are defined as optional by the OpenML XSD actually optional.
  • FIX #1147: openml.flow.flow_exists no longer requires an API key.
  • FIX #1184: Automatically resolve proxies when downloading from minio. Turn this off by setting environment variable no_proxy="*".
  • MAIN #1088: Do CI for Windows on Github Actions instead of Appveyor.
  • MAINT #1104: Fix outdated docstring for list_task.
  • MAIN #1146: Update the pre-commit dependencies.
  • ADD #1103: Add a predictions property to OpenMLRun for easy accessibility of prediction data.
  • ADD #1188: EXPERIMENTAL. Allow downloading all files from a minio bucket with download_all_files=True for get_dataset.

- Python
Published by LennartPurucker about 3 years ago

openml - Version 0.12.1

Version 0.12.1

  • ADD #895/#1038: Measure runtimes of scikit-learn runs also for models which are parallelized via the joblib.
  • DOC #1050: Refer to the webpage instead of the XML file in the main example.
  • DOC #1051: Document existing extensions to OpenML-Python besides the shipped scikit-learn extension.
  • FIX #1035: Render class attributes and methods again.
  • FIX #1042: Fixes a rare concurrency issue with OpenML-Python and joblib which caused the joblib worker pool to fail.
  • FIX #1053: Fixes a bug which could prevent importing the package in a docker container.

- Python
Published by mfeurer almost 5 years ago

openml - Version 0.12.0

0.11.1

  • ADD #964: Validate ignore_attribute, default_target_attribute, row_id_attribute are set to attributes that exist on the dataset when calling create_dataset.
  • ADD #979: Dataset features and qualities are now also cached in pickle format.
  • ADD #982: Add helper functions for column transformers.
  • ADD #989: run_model_on_task will now warn the user the the model passed has already been fitted.
  • ADD #1009 : Give possibility to not download the dataset qualities. The cached version is used even so download attribute is false.
  • ADD #1016: Add scikit-learn 0.24 support.
  • ADD #1020: Add option to parallelize evaluation of tasks with joblib.
  • ADD #1022: Allow minimum version of dependencies to be listed for a flow, use more accurate minimum versions for scikit-learn dependencies.
  • ADD #1023: Add admin-only calls for adding topics to datasets.
  • ADD #1029: Add support for fetching dataset from a minio server in parquet format.
  • ADD #1031: Generally improve runtime measurements, add them for some previously unsupported flows (e.g. BaseSearchCV derived flows).
  • DOC #973 : Change the task used in the welcome page example so it no longer fails using numerical dataset.
  • MAINT #671: Improved the performance of check_datasets_active by only querying the given list of datasets in contrast to querying all datasets. Modified the corresponding unit test.
  • MAINT #891: Changed the way that numerical features are stored. Numerical features that range from 0 to 255 are now stored as uint8, which reduces the storage space required as well as storing and loading times.
  • MAINT #975, #988: Add CI through Github Actions.
  • MAINT #977: Allow short and long scenarios for unit tests. Reduce the workload for some unit tests.
  • MAINT #985, #1000: Improve unit test stability and output readability, and adds load balancing.
  • MAINT #1018: Refactor data loading and storage. Data is now compressed on the first call to get_data.
  • MAINT #1024: Remove flaky decorator for study unit test.
  • FIX #883 #884 #906 #972: Various improvements to the caching system.
  • FIX #980: Speed up check_datasets_active.
  • FIX #984: Add a retry mechanism when the server encounters a database issue.
  • FIX #1004: Fixed an issue that prevented installation on some systems (e.g. Ubuntu).
  • FIX #1013: Fixes a bug where OpenMLRun.setup_string was not uploaded to the server, prepares for run_details being sent from the server.
  • FIX #1021: Fixes an issue that could occur when running unit tests and openml-python was not in PATH.
  • FIX #1037: Fixes a bug where a dataset could not be loaded if a categorical value had listed nan-like as a possible category.

- Python
Published by mfeurer almost 5 years ago

openml - Version 0.11.0

  • ADD #753: Allows uploading custom flows to OpenML via OpenML-Python.
  • ADD #777: Allows running a flow on pandas dataframes (in addition to numpy arrays).
  • ADD #888: Allow passing a task_id to run_model_on_task.
  • ADD #894: Support caching of datasets using feather format as an option.
  • ADD #929: Add edit_dataset and fork_dataset to allow editing and forking of uploaded datasets.
  • ADD #866, #943: Add support for scikit-learn's passthrough and drop when uploading flows to OpenML.
  • ADD #879: Add support for scikit-learn's MLP hyperparameter layer_sizes.
  • ADD #894: Support caching of datasets using feather format as an option.
  • ADD #945: PEP 561 compliance for distributing Type information.
  • DOC #660: Remove nonexistent argument from docstring.
  • DOC #901: The API reference now documents the config file and its options.
  • DOC #912: API reference now shows create_task.
  • DOC #954: Remove TODO text from documentation.
  • DOC #960: document how to upload multiple ignore attributes.
  • FIX #873: Fixes an issue which resulted in incorrect URLs when printing OpenML objects after switching the server.
  • FIX #885: Logger no longer registered by default. Added utility functions to easily register logging to console and file.
  • FIX #890: Correct the scaling of data in the SVM example.
  • MAINT #371: list_evaluations default size changed from None to 10_000.
  • MAINT #767: Source distribution installation is now unit-tested.
  • MAINT #781: Add pre-commit and automated code formatting with black.
  • MAINT #804: Rename arguments of list_evaluations to indicate they expect lists of ids.
  • MAINT #836: OpenML supports only pandas version 1.0.0 or above.
  • MAINT #865: OpenML no longer bundles test files in the source distribution.
  • MAINT #881: Improve the error message for too-long URIs.
  • MAINT #897: Dropping support for Python 3.5.
  • MAINT #916: Adding support for Python 3.8.
  • MAINT #920: Improve error messages for dataset upload.
  • MAINT #921: Improve hangling of the OpenML server URL in the config file.
  • MAINT #925: Improve error handling and error message when loading datasets.
  • MAINT #928: Restructures the contributing documentation.
  • MAINT #936: Adding support for scikit-learn 0.23.X.
  • MAINT #945: Make OpenML-Python PEP562 compliant.
  • MAINT #951: Converts TaskType class to a TaskType enum.

- Python
Published by mfeurer over 5 years ago

openml - Version 0.10.2

  • ADD #857: Adds task type ID to list_runs
  • DOC #862: Added license BSD 3-Clause to each of the source files.

- Python
Published by mfeurer over 6 years ago

openml - Version 0.10.1

  • ADD #175: Automatically adds the docstring of scikit-learn objects to flow and its parameters.
  • ADD #737: New evaluation listing call that includes the hyperparameter settings.
  • ADD #744: It is now possible to only issue a warning and not raise an exception if the package versions for a flow are not met when deserializing it.
  • ADD #783: The URL to download the predictions for a run is now stored in the run object.
  • ADD #790: Adds the uploader name and id as new filtering options for list_evaluations.
  • ADD #792: New convenience function openml.flow.get_flow_id.
  • ADD #861: Debug-level log information now being written to a file in the cache directory (at most 2 MB).
  • DOC #778: Introduces instructions on how to publish an extension to support other libraries than scikit-learn.
  • DOC #785: The examples section is completely restructured into simple simple examples, advanced examples and examples showcasing the use of OpenML-Python to reproduce papers which were done with OpenML-Python.
  • DOC #788: New example on manually iterating through the split of a task.
  • DOC #789: Improve the usage of dataframes in the examples.
  • DOC #791: New example for the paper Efficient and Robust Automated Machine Learning by Feurer et al. (2015).
  • DOC #803: New example for the paper Don’t Rule Out Simple Models Prematurely: A Large Scale Benchmark Comparing Linear and Non-linear Classifiers in OpenML by Benjamin Strang et al. (2018).
  • DOC #808: New example demonstrating basic use cases of a dataset.
  • DOC #810: New example demonstrating the use of benchmarking studies and suites.
  • DOC #832: New example for the paper Scalable Hyperparameter Transfer Learning by Valerio Perrone et al. (2019)
  • DOC #834: New example showing how to plot the loss surface for a support vector machine.
  • FIX #305: Do not require the external version in the flow XML when loading an object.
  • FIX #734: Better handling of "old" flows.
  • FIX #736: Attach a StreamHandler to the openml logger instead of the root logger.
  • FIX #758: Fixes an error which made the client API crash when loading a sparse data with categorical variables.
  • FIX #779: Do not fail on corrupt pickle
  • FIX #782: Assign the study id to the correct class attribute.
  • FIX #819: Automatically convert column names to type string when uploading a dataset.
  • FIX #820: Make __repr__ work for datasets which do not have an id.
  • MAINT #796: Rename an argument to make the function list_evaluations more consistent.
  • MAINT #811: Print the full error message given by the server.
  • MAINT #828: Create base class for OpenML entity classes.
  • MAINT #829: Reduce the number of data conversion warnings.
  • MAINT #831: Warn if there's an empty flow description when publishing a flow.
  • MAINT #837: Also print the flow XML if a flow fails to validate.
  • FIX #838: Fix listevaluationssetups to work when evaluations are not a 100 multiple.
  • FIX #847: Fixes an issue where the client API would crash when trying to download a dataset when there are no qualities available on the server.
  • MAINT #849: Move logic of most different publish functions into the base class.
  • MAINt #850: Remove outdated test code.

- Python
Published by mfeurer over 6 years ago

openml - Version 0.10.0

- Python
Published by mfeurer over 6 years ago

openml - Version 0.8.0

- Python
Published by mfeurer about 7 years ago

openml - v0.5.0

Version that complies with the code provided in 'OpenML Benchmarking Suites and the OpenML100' (https://arxiv.org/abs/1708.03731)

- Python
Published by janvanrijn over 8 years ago