egu-2025-course

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

https://github.com/pangeo-data/egu-2025-course

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.3%) to scientific vocabulary

Keywords

course egu tutorial
Last synced: 6 months ago · JSON representation

Repository

Harnessing the Power of Pangeo: Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

Basic Info
Statistics
  • Stars: 10
  • Watchers: 10
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
course egu tutorial
Created 11 months ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License

README.md

EGU 2025 SC 4.14: Harnessing the Power of Pangeo

Enhancing Your Scientific Data Analysis Workflow with scalable open source tools

{note} This course is made possible thanks to the Pangeo@EOSC platform — a reference deployment of the Pangeo ecosystem on the European Open Science Cloud — developed with the support of [CESNET](https://www.cesnet.cz/en/) through the [EGI-ACE](https://youtu.be/Vc9SZNa2-Os) and [C-SCALE](https://youtu.be/-jBkR_2_vg8) projects. We gratefully acknowledge their contributions.

The analysis and visualisation of data is fundamental to research across the earth and space sciences. The Pangeo community has built an ecosystem of tools designed to simplify these workflows, centred around the Xarray library for n-dimensional data handling and Dask for parallel computing. In this short course, we will offer a gradual introduction to the Pangeo toolkit, through which participants will learn the skills required to scale their local scientific workflows through cloud computing or large HPC with minimal changes to existing codes. The course is beginner-friendly but assumes a prior understanding of the Python language. We will guide you through hands-on jupyter notebooks that showcase scalable analysis of in-situ, satellite observation and earth system modelling datasets to apply your learning. By the end of this course, you will understand how to: - Efficiently access large public data archives from Cloud storage using the Pangeo ecosystem of open source software and infrastructure. - Leverage labelled arrays in Xarray to build accessible, reproducible workflows. - Use chunking to scale a scientific data analysis with Dask.

All the Python packages and training materials used are open-source (e.g., MIT, Apache-2, CC-BY-4). Participants will need a laptop and internet access but will not need to install anything. We will be using the free and open Pangeo@EOSC (European Open Science Cloud) platform for this course. We encourage attendees from all career stages and fields of study (e.g., atmospheric sciences, cryosphere, climate, geodesy, ocean sciences) to join us for this short course. We look forward to an interactive session and will be hosting a Q&A and discussion forum at the end of the course, including opportunities to get more involved in Pangeo and open source software development. Join us to learn about open, reproducible, and scalable Earth science!

Prerequisites

We recommend learners with no prior knowledge of Python review resources such as the Software Carpentry training material and Project Pythia in advance of this short course. Participants should bring a laptop with an internet connection. No software installation is required as resources will be accessed online using the Pangeo@EOSC platform. Temporary user accounts will be provided for the course and we will also teach attendees how to request an account on Pangeo@EOSC to continue working on the platform after the training course.

Set up

If you are participating in this short course, you are welcome to register to Pangeo@EOSC.

First, navigate to https://aai.egi.eu/signup to sign up for an account.

Then, navigate to https://aai.egi.eu/auth/realms/id/account/#/enroll?groupPath=/vo.pangeo.eu to request access.

Lastly, navigate to Access Pangeo@EOSC via https://pangeo-eosc.vm.fedcloud.eu/ and sign in. Select the quay.io/pangeo/pangeo-notebook option.

Owner

  • Name: Pangeo
  • Login: pangeo-data
  • Kind: organization
  • Location: earth

A community effort for big data geoscience

GitHub Events

Total
  • Watch event: 10
  • Delete event: 2
  • Issue comment event: 2
  • Member event: 4
  • Push event: 18
  • Pull request event: 10
  • Create event: 7
Last Year
  • Watch event: 10
  • Delete event: 2
  • Issue comment event: 2
  • Member event: 4
  • Push event: 18
  • Pull request event: 10
  • Create event: 7

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 20
  • Total Committers: 3
  • Avg Commits per committer: 6.667
  • Development Distribution Score (DDS): 0.4
Past Year
  • Commits: 20
  • Committers: 3
  • Avg Commits per committer: 6.667
  • Development Distribution Score (DDS): 0.4
Top Committers
Name Email Commits
Anne Fouilloux a****f@s****o 12
Max Jones 1****s 5
Scott Henderson s****q@g****m 3
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 21 minutes
  • Total issue authors: 0
  • Total pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.6
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 5
  • Average time to close issues: N/A
  • Average time to close pull requests: 21 minutes
  • Issue authors: 0
  • Pull request authors: 3
  • Average comments per issue: 0
  • Average comments per pull request: 0.6
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • maxrjones (4)
  • annefou (4)
  • scottyhq (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/deploy.yml actions
  • actions/checkout v4 composite
  • actions/deploy-pages v4 composite
  • actions/upload-pages-artifact v3 composite
  • mamba-org/setup-micromamba v2 composite
environment.yml conda
  • black-jupyter
  • bottleneck 1.4.2.*
  • cartopy
  • dask
  • dask-gateway
  • folium
  • fsspec
  • graphviz
  • h5netcdf
  • hvplot
  • ipykernel
  • ipyleaflet
  • jupyter-book
  • jupyter_server
  • jupyterlab-myst
  • kerchunk
  • mapclassify
  • matplotlib 3.10.0.*
  • matplotlib-inline 0.1.7.*
  • mystmd
  • netcdf4
  • numpy 1.26.4.*
  • pip
  • pooch
  • pre-commit
  • s3fs
  • scikit-learn
  • xarray 2025.3.0.*
  • zarr >=3.0.6