dask-cookbook
Notebooks to demonstrate dask functionalities
Science Score: 54.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.7%) to scientific vocabulary
Repository
Notebooks to demonstrate dask functionalities
Basic Info
- Host: GitHub
- Owner: ProjectPythia
- License: apache-2.0
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://projectpythia.org/dask-cookbook/
- Size: 48.5 MB
Statistics
- Stars: 5
- Watchers: 2
- Forks: 7
- Open Issues: 3
- Releases: 2
Metadata Files
README.md
Dask Cookbook
This Project Pythia Cookbook provides a comprehensive guide to understanding the basic concepts and collections of Dask as well as its integration with Xarray. Dask is a parallel computing library that allows you to scale your computations to multiple cores or even clusters, while Xarray is a library that enables working with labelled multi-dimensional arrays, with a focus on working with netCDF datasets.
Motivation
The motivation behind this repository is to provide a clear and concise resource for anyone looking to learn about the basic concepts of Dask and its integration with Xarray. By providing step-by-step tutorials, we hope to make it easy for users to understand the fundamental concepts of parallel computing and distributed data processing, as well as how to apply them in practice using Dask and Dask+Xarray.
Authors
Negin Sobhani, Brian Vanderwende, Deepak Cherian, and Ben Kirk
Contributors
Note on Content Origin
This cookbook is derived from the extensive material used in the NCAR tutorial, "Using Dask on HPC systems", which was held in February 2023. The NCAR tutorial series also includes an in-depth exploration and practical use cases of Dask on HPC systems and best practices for Dask on HPC. For the complete set of NCAR tutorial materials, including these additional insights on Dask on HPC, please refer to the main NCAR tutorial content available here.
Structure
In the first chapter of this cookbook, we provide step-by-step tutorials on the basic concepts of Dask, including Dask arrays and Dask dataframes, which are powerful tools for parallel computing and distributed data processing. We explain the key differences between these Dask data structures and their counterparts in NumPy and Pandas.
In the second chapter of the repository, we move on to more advanced topics, such as distributed computing and Dask+Xarray integration. We provide examples of how to use Dask+Xarray to efficiently work with large, labelled multi-dimensional datasets. Finally, we will discuss some best practices regarding Dask+Xarray.
Running the Notebooks
You can either run the notebook using Binder or on your local machine.
Running on Binder
The simplest way to interact with a Jupyter Notebook is through
Binder, which enables the execution of a
Jupyter Book in the cloud. The details of how this works are not
important for now. All you need to know is how to launch a Pythia
Cookbooks chapter via Binder. Simply navigate your mouse to
the top right corner of the book chapter you are viewing and click
on the rocket ship icon, (see figure below), and be sure to select
“launch Binder”. After a moment you should be presented with a
notebook that you can interact with. I.e. you’ll be able to execute
and even change the example programs. You’ll see that the code cells
have no output at first, until you execute them by pressing
{kbd}Shift+{kbd}Enter. Complete details on how to interact with
a live Jupyter notebook are described in Getting Started with
Jupyter.
Running on Your Own Machine
If you are interested in running this material locally on your computer, you will need to follow this workflow:
- Clone the
https://github.com/ProjectPythia/dask-cookbookrepository:
bash
git clone https://github.com/ProjectPythia/dask-cookbook.git
- Move into the
dask-cookbookdirectory
bash
cd dask-cookbook
- Create and activate your conda environment from the
environment.ymlfile
bash
conda env create -f environment.yml
conda activate dask-cookbook
- Move into the
notebooksdirectory and start up Jupyterlab
bash
cd notebooks/
jupyter lab
Acknowledgments
- NCAR CISL/CSG Team
- ESDS Initiative
Owner
- Name: Project Pythia
- Login: ProjectPythia
- Kind: organization
- Email: projectpythia@ucar.edu
- Location: United States of America
- Website: projectpythia.org
- Twitter: Project_Pythia
- Repositories: 21
- Profile: https://github.com/ProjectPythia
Community learning resource for Python-based computing in the geosciences
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this cookbook, please cite it as below."
authors:
# add additional entries for each author -- see https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md
- family-names: Sobhani
given-names: Negin
website: https://github.com/negin513
- family-names: Brian
given-names: Vanderwende
- family-names: Cherian
given-names: Deepak
website: https://github.com/dcherian
- family-names: Kirk
given-names: Ben
- name: "Dask Cookbook contributors" # use the 'name' field to acknowledge organizations
website: "https://github.com/ProjectPythia/dask-cookbook/graphs/contributors"
title: "Dask Cookbook"
abstract: "A cookbook for Dask workflows."
GitHub Events
Total
- Issues event: 3
- Watch event: 1
- Delete event: 2
- Issue comment event: 10
- Push event: 70
- Pull request review event: 3
- Pull request event: 9
- Create event: 5
Last Year
- Issues event: 3
- Watch event: 1
- Delete event: 2
- Issue comment event: 10
- Push event: 70
- Pull request review event: 3
- Pull request event: 9
- Create event: 5
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 6 months
- Total issue authors: 0
- Total pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 1.8
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: about 2 hours
- Issue authors: 0
- Pull request authors: 3
- Average comments per issue: 0
- Average comments per pull request: 0.75
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- erogluorhan (2)
Pull Request Authors
- jukent (3)
- erogluorhan (2)
- brian-rose (2)
- dependabot[bot] (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- cfgrib
- cftime
- dask
- dask-labextension
- distributed
- h5netcdf
- hvplot
- ipywidgets
- jupyter-book
- jupyter_server
- jupyterlab >=3
- jupyterlab-system-monitor
- matplotlib
- nbterm
- nc-time-axis
- netcdf4
- nodejs
- pandas
- pre-commit
- pydap
- python-graphviz
- scipy
- sphinx-pythia-theme
- xarray