dysweep

Extending the capabilities of Weights and Biases sweep.

https://github.com/hamidrezakmk/dysweep

Last synced: 6 months ago · JSON representation ·

Repository

Extending the capabilities of Weights and Biases sweep.

Basic Info

Host: GitHub
Owner: HamidrezaKmK
License: mit
Language: Python
Default Branch: main
Size: 500 KB

Statistics

Stars: 21
Watchers: 1
Forks: 2
Open Issues: 4
Releases: 15

Created almost 3 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

Dysweep

Enhanced Weights and Biases Sweeps for Systematic Experimentation!

[![PyPI version](https://badge.fury.io/py/dysweep.svg)](https://pypi.org/project/dysweep/) [![DOI](https://zenodo.org/badge/643277847.svg)](https://zenodo.org/badge/latestdoi/643277847) [![Slack](https://img.shields.io/badge/chat-on%20slack-red.svg)](https://join.slack.com/t/dysweep/shared_invite/zt-1ynkfdpdc-wiYHkiLzjrZ8yGqYkM9brQ) ![Bird's Eye View](./images/main-figure.svg) Use the extended capabilities of *Dysweep* library on top of Weights & Biases for fast and comprehensive experimentation in your research projects.

Dysweep is an innovative Python library designed to extend and enhance the functionalities of the Weights and Biases (WandB) sweep library. Dysweep is built with the belief that an entire experiment should be executable through a configuration dictionary, whether it's formatted as a YAML or JSON file. Moreover, using generic hierarchical configurations, managing a wide variety of research tasks is feasible. With the re-running and resuming capabilities, this package works well in tandem with any cloud computing service or cluster that allows large-scale parallel computing.

Features

Dysweep introduces two major enhancements:

Checkpointing for the Sweep Server

Dysweep introduces checkpointing for the sweep server, which becomes extremely beneficial when dealing with preemptions or specific bugs that can interrupt the sweep process. This feature ensures that even if a sweep is running on a machine that may preempt the tasks or if certain configurations encounter specific bugs, the sweep process can resume from a checkpoint directory. Unlike the original WandB sweep, where a lost configuration is ignored by the WandB agent function, Dysweep overlays an API on top of this agent function, thereby enabling certain runs to resume. This is especially useful when only a small fraction of runs fail, thus eliminating the need to re-run the entire sweep.

Running Sweeps Over Hierarchies

Perhaps the most significant capability of Dysweep is its ability to run sweeps over hierarchically structured parameters. The original WandB sweep configuration is limited to flat parameter sets. However, deep learning experiments often demand more complex, nested sets of configurations. Dysweep enables this, effectively eliminating the need for hard-coding the selection between different classes with primitive methods. Instead, you can define a new YAML that automatically selects between class types and initialization arguments, streamlining the setup process and making it more robust.

Dysweep is inspired by DyPy, a library used for deep learning experimentation, and mirrors its vision of facilitating fully generic configuration YAML files that encapsulate code snippets. Hence, Dysweep offers a versatile configuration set that empowers you to define experiments at any layer of abstraction.

Applications

Dysweep is envisioned to particularly facilitate the following applications:

Large Scale Hyper Parameter Tuning: Dysweep is geared towards conducting large scale hyper parameter tuning not just within the confines of a specific model, but across various models and methods. This functionality paves the way for a more comprehensive and detailed study of the effects of hyperparameters.
Running Models Over Different Configurations and Datasets: Once a model is ready, Dysweep enables it to run over a multitude of configurations and datasets. It provides a systematic way to define a sweep in WandB, allowing every experiment to run in parallel across different machines. This significantly eases the process of large-scale computing and data gathering for a particular model.

Installation

You can install the Dysweep library through PyPi using the following command:

shell pip install dysweep

Usage

Once the Dysweep library is successfully installed, it comes with two scripts:

One script to initialize a sweep based on a defined base configuration
Another script to run agents, as well as to resume or re-run specified configurations

A sweep configuration can be defined via a config.yaml file. This configuration file can then be utilized in the command-line to run the desired function as follows: bash dysweep_create -c config.yaml The config.yaml file is a standard ResumanbleSweepConfig configuration file. It includes the following key fields: 1. base_config: Defines the base configuration for creating the sweep. 2. sweep_configuration: Specifies the hierarchical configuration used to update (upsert) the base configuration. 3. project: Identifies the project name under which the sweep is created. 4. entity: Designates the Weights & Biases (WandB) entity employed in the sweep creation.

You can also find additional parameters from ResumanbleSweepConfig that may be of interest from the source code.

For instance, you can run the following command: bash dysweep_create --config config.yaml --project <my_project> --entity <my_entity> Once you've executed the function, it will output a sweep identifier. This identifier can be used across multiple machines to run various sweep configurations.

In addition, you can utilize the dysweep_run_resume script to execute the agent. This script allows you to define the sweep identifier and the number of run counts with a specific function from a package to run. If you need to resume a run, specify the run identifier and set resume to True. Detailed guidance is available in our Tutorial.

If for example, you have a function main in a file denoted by path.to.my.package, Here's an example of running the agent:

bash dysweep_run_resume --package <path.to.my.package> --function <main> --sweep_id <sweep_id> --count <run_count>

And here are examples for resuming a single run or multiple runs:

bash dysweep_run_resume --package <path.to.my.package> --function <main> --sweep_id <sweep_id> --rerun_id <run_id> --resume True

bash dysweep_run_resume --package <path.to.my.package> --function <main> --sweep_id <sweep_id> --count <run_count> --resume True

Visualizing the Sweep

Using the sweep_alias and sweep_identifier values, each of the subtrees of the directory you are sweeping upon will be visualized as the sweep_identifier value you've set for it to be. This is especially useful when you have a particular knob in your configuration that you want to sweep over, but it is burried deep within the hierarchical configuration.

As for values, we know that with dysweep the values we sweep upon are no longer just a primitive type, but rather a dictionary. In order to visualize the values, we have to specify a sweep_alias for each of the values we want to visualize. This sweep_alias will be a key that sits instead of those values for better visualization.

In addition to that, each of the runs will contain a dy_config in their wandb.config that you can check from the weights and biases UI. Using these configurations, you can filter runs, group them, or compare them. For example, if we have a configuration hierarchy where the dataset type is in dataset.type, we can filter runs by dataset type by using the following query:

dy_config.dataset.type = "cifar10"

Tutorial and Use-Cases

We selected a standard task in deep learning - image classification - and utilized various convolutional models and datasets to demonstrate the broad capabilities of Dysweep. We subjected this problem to multiple configurations through our pipeline. For a hands-on understanding of the process, you can refer to our detailed Jupyter notebook available here.

Make sure to install all the requirements before running the tutorial section:

```bash

The requirements of the main package itself

pip install -r requirements.txt

The requirements for running the examples

pip install -r requirements-testing.txt ```

License

Dysweep is released under MIT License.

Owner

Name: Hamidreza Kamkari
Login: HamidrezaKmK
Kind: user
Location: Toronto, Canada

Website: https://hamidrezakmk.github.io/
Twitter: hamid_R_kamkar
Repositories: 6
Profile: https://github.com/HamidrezaKmK

Deep Learning Enthusiast and Competitive programmer. Master's student at the University of Toronto.

Citation (CITATION.cff)

cff-version: 1.2.0
title: '"DySweep": Enhanced Weights and Biases Sweeps for Systematic Experimentation'
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Hamid, Hamidreza
    family-names: Kamkari
    email: hamidrezakamkari@gmail.com
repository-code: "https://github.com/HamidrezaKmK/dysweep"
abstract: >-
  Dysweep is a powerful Python library designed to enhance the functionality of Weights and Biases sweeps. With Dysweep, conducting systematic and efficient deep learning experiments becomes a breeze. Its features include checkpointing for the Sweep Server, allowing for the resumption of specific runs, and the ability to run sweeps over hierarchies, eliminating the need for hard-coded selection between different classes. Inspired by DyPy, Dysweep provides a versatile configuration set, enabling the definition of experiments at any level of abstraction. Whether it's large-scale hyperparameter tuning or parallel execution of experiments, Dysweep empowers researchers with a systematic and streamlined approach to deep learning experimentation.
keywords:
  - Lazy Configurations
  - Weights and Biases
  - Sweep
  - Hierarchical Configuration
  - Deep Learning
  - Experiments
license: MIT
date-released: "2023-05-21"

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time

Total issues: 4
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

HamidrezaKmK (4)

Pull Request Authors

Top Labels

Issue Labels

documentation (1) enhancement (1) bug (1)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- pypi 47 last-month

Total dependent packages: 0
Total dependent repositories: 1
Total versions: 19
Total maintainers: 1

pypi.org: dysweep

Use Weights and Biases Sweeps for Dynamic Configuration generation.

Homepage: https://github.com/HamidrezaKmK/dysweep
Documentation: https://dysweep.readthedocs.io/
License: MIT
Latest release: 0.1.6
published over 2 years ago

Versions: 19
Dependent Packages: 0
Dependent Repositories: 1
Downloads: 47 Last month

Rankings

Dependent packages count: 10.0%

Stargazers count: 14.2%

Average: 16.3%

Downloads: 16.6%

Forks count: 19.1%

Dependent repos count: 21.8%

Maintainers (1)

HamidK