weather-tools

Tools to make weather data accessible and useful.

https://github.com/google/weather-tools

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.2%) to scientific vocabulary

Keywords

apache-beam python weather
Last synced: 6 months ago · JSON representation

Repository

Tools to make weather data accessible and useful.

Basic Info
Statistics
  • Stars: 233
  • Watchers: 15
  • Forks: 44
  • Open Issues: 87
  • Releases: 8
Topics
apache-beam python weather
Created over 4 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License

README.md

weather-tools

Apache Beam pipelines to make weather data accessible and useful.

CI Documentation Status

Introduction

This project contributes a series of command-line tools to make common data engineering tasks easier for researchers in climate and weather. These solutions were born out of the need to improve repeated work performed by research teams across Alphabet.

The first tool created was the weather downloader (weather-dl). This makes it easier to ingest data from the European Center for Medium Range Forecasts (ECMWF). weather-dl enables users to describe very specifically what data they'd like to ingest from ECMWF's catalogs. It also offers them control over how to parallelize requests, empowering users to retrieve data efficiently. Downloads are driven from a configuration file, which can be reviewed (and version-controlled) independently of pipeline or analysis code.

We also provide two additional tools to aid climate and weather researchers: the weather mover (weather-mv) and the weather splitter (weather-sp). These CLIs are still in their alpha stages of development. Yet, they have been used for production workflows for several partner teams.

We created the weather mover (weather-mv) to load geospatial data from cloud buckets into Google BigQuery. This enables rapid exploratory analysis and visualization of weather data: From BigQuery, scientists can load arbitrary climate data fields into a Pandas or XArray dataframe via a simple SQL query.

The weather splitter (weather-sp) helps normalize how archival weather data is stored in cloud buckets: Whether you're trying to merge two datasets with overlapping variables or, you simply need to open Grib data from XArray, it's really useful to split datasets into their component variables.

Installing

It is currently recommended that you create a local python environment (with Anaconda) and install the sources as follows:

shell conda env create --name weather-tools --file=environment.yml conda activate weather-tools

Note: Due to its use of 3rd-party binary dependencies such as GDAL and MetView, weather-tools is transitioning from PyPi to Conda for its main release channel. The instructions above are a temporary workaround before our Conda-forge release.

From here, you can use the weather-* tools from your python environment. Currently, the following tools are available:

  • weather-dl (beta) Download weather data (namely, from ECMWF's API).
  • weather-mv (alpha) Load weather data into analytics engines, like BigQuery.
  • weather-sp (alpha) Split weather data by arbitrary dimensions.

Quickstart

In this tutorial, we will download the Era 5 pressure level dataset and ingest it into Google BigQuery using weather-dl and weather-mv, respectively.

Prerequisites

  1. Register here and here for a license from ECMWF's Copernicus (CDS) API.
  2. User must agree to the Terms of Use of a dataset before downloading any data out of dataset.(E.g.: accept terms & condition from here.)
  3. Install your license by copying your API url & key from this page to a new file $HOME/.cdsapirc.[^1] The file should look like this: url: https://cds.climate.copernicus.eu/api key: <YOUR_USER_ID>:<YOUR_API_KEY>
  4. If you do not already have a Google Cloud project, create one by following these steps. If you are working on an existing project, make sure your user has the BigQuery Admin role. To learn more about granting IAM roles to users in Google Cloud, visit the official docs.
  5. Create an empty BigQuery Dataset. This can be done using the Google Cloud Console or via the bq CLI tool. For example: shell bq mk --project_id=$PROJECT_ID $DATASET_ID
  6. Follow these steps to create a bucket for staging temporary files in Google Cloud Storage.

Steps

For the purpose of this tutorial, we will use your local machine to run the data pipelines. Note that all weather-tools can also be run in Cloud Dataflow which is easier to scale and fully managed.

  1. Use weather-dl to download the Era 5 pressure level dataset. bash weather-dl configs/era5_example_config_local_run.cfg \ --local-run # Use the local machine

Recommendation: Pass the -d, --dry-run flag to any of these commands to preview the effects.

NOTE: By default, local downloads are saved to the ./local_run directory unless another file system is specified. The recommended output location for weather-dl is Cloud Storage. The source and destination of the download are configured using the .cfg configuration file which is passed to the command. To learn more about this configuration file's format and features, see this reference. To learn more about the weather-dl command, visit here.

  1. (optional) Split your downloaded dataset up with weather-sp:

shell weather-sp --input-pattern "./local_run/era5-*.nc" \ --output-dir "split_data"

Visit the weather-sp docs for more information.

  1. Use weather-mv to ingest the downloaded data into BigQuery, in a structured format.

bash weather-mv bigquery --uris "./local_run/**.nc" \ # or "./split_data/**.nc" if weather-sp is used --output_table "$PROJECT.$DATASET_ID.$TABLE_ID" \ # The path to the destination BigQuery table --temp_location "gs://$BUCKET/tmp" \ # Needed for stage temporary files before writing to BigQuery --direct_num_workers 2

See these docs for more about the weather-mv command.

That's it! After the pipeline is completed, you should be able to query the ingested dataset in BigQuery SQL workspace and analyze it using BigQuery ML.

Contributing

The weather tools are under active development, and contributions are welcome! Please check out our guide to get started.

License

This is not an official Google product.

``` Copyright 2021 Google LLC

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

https://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. ```

[^1]: Note that you need to be logged in for the CDS API page to actually show your user ID and API key. Otherwise, it will display a placeholder, which is confusing to some users.

Owner

  • Name: Google
  • Login: google
  • Kind: organization
  • Email: opensource@google.com
  • Location: United States of America

Google ❤️ Open Source

GitHub Events

Total
  • Create event: 9
  • Issues event: 7
  • Watch event: 19
  • Delete event: 8
  • Member event: 4
  • Issue comment event: 18
  • Push event: 30
  • Pull request review comment event: 53
  • Pull request event: 36
  • Pull request review event: 79
  • Fork event: 4
Last Year
  • Create event: 9
  • Issues event: 7
  • Watch event: 19
  • Delete event: 8
  • Member event: 4
  • Issue comment event: 18
  • Push event: 30
  • Pull request review comment event: 53
  • Pull request event: 36
  • Pull request review event: 79
  • Fork event: 4

Committers

Last synced: 6 months ago

All Time
  • Total Commits: 361
  • Total Committers: 41
  • Avg Commits per committer: 8.805
  • Development Distribution Score (DDS): 0.789
Past Year
  • Commits: 27
  • Committers: 9
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.778
Top Committers
Name Email Commits
Alex Merose a****r@g****m 76
Rahul Mahrsee 8****7@u****m 38
DeepGabani 6****8@u****m 27
Alex Merose al@m****m 22
Darshan Prajapati 9****9@u****m 21
aniketinfocusp 1****p@u****m 20
Alex Merose a****n@g****m 18
Alex Rosengarten a****n@g****m 14
dabhi_cusp 1****p@u****m 13
Aniket Singh Rawat 1****t@u****m 12
Jash Rana 1****4@u****m 12
Ulrike Hager u****r@g****m 10
Cillian Fennell c****s@g****m 8
Piyush-Ingale 1****e@u****m 6
Shail Parekh 1****h@u****m 6
Ulrike Hager u****2@g****m 6
ksic8 1****8@u****m 5
pbattaglia p****a@u****m 4
pramodg 6****g@u****m 4
Aaron Bell a****l@g****m 4
David Lowell d****l@g****m 3
Iman Akbari i****i@g****m 3
Steven Greenberg s****g@g****m 3
ksic8 k****h@g****m 3
Alex Rosengarten a****n@g****m 2
Sean Campbell c****n@g****m 2
Valliappa (Lak) Lakshmanan l****k@v****m 2
dependabot[bot] 4****]@u****m 2
ksic8 k****l@i****n 2
mahrsee1997 r****l@i****n 2
and 11 more...
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 150
  • Total pull requests: 416
  • Average time to close issues: 3 months
  • Average time to close pull requests: 9 days
  • Total issue authors: 21
  • Total pull request authors: 30
  • Average comments per issue: 1.17
  • Average comments per pull request: 0.51
  • Merged pull requests: 347
  • Bot issues: 0
  • Bot pull requests: 5
Past Year
  • Issues: 3
  • Pull requests: 58
  • Average time to close issues: 1 day
  • Average time to close pull requests: 5 days
  • Issue authors: 3
  • Pull request authors: 11
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.19
  • Merged pull requests: 43
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • alxmrs (84)
  • mahrsee1997 (13)
  • blackvvine (12)
  • deepgabani8 (8)
  • dabhicusp (5)
  • ksic8 (4)
  • CillianFn (4)
  • mt467 (4)
  • aniketinfocusp (2)
  • DarshanSP19 (2)
  • floraxue (2)
  • jongwooo (1)
  • pramodg (1)
  • heyanand (1)
  • lakshmanok (1)
Pull Request Authors
  • alxmrs (85)
  • aniketinfocusp (58)
  • mahrsee1997 (47)
  • deepgabani8 (41)
  • DarshanSP19 (35)
  • j9sh264 (23)
  • aniketsinghrawat (22)
  • dabhicusp (16)
  • ksic8 (11)
  • shail-parekh (11)
  • Piyush-Ingale (10)
  • CillianFn (9)
  • pbattaglia (8)
  • blackvvine (6)
  • pramodg (5)
Top Labels
Issue Labels
weather-dl (31) weather-mv (30) bug (27) P1 (23) P2 (20) good first issue (17) enhancement (10) documentation (7) help wanted (5) P0 (5) P3 (4) weather-sp (4) can't reproduce (1) 20% (1) wontfix (1)
Pull Request Labels
enhancement (7) weather-mv (7) dependencies (5) weather-dl (3) weather-sp (3) github_actions (3) bug (1)

Packages

  • Total packages: 2
  • Total downloads:
    • pypi 26 last-month
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 15
  • Total maintainers: 1
proxy.golang.org: github.com/google/weather-tools
  • Versions: 8
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
pypi.org: google-weather-tools

Apache Beam pipelines to make weather data accessible and useful.

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 26 Last month
Rankings
Dependent packages count: 10.0%
Average: 21.3%
Dependent repos count: 21.7%
Downloads: 32.3%
Maintainers (1)
Last synced: 6 months ago

Dependencies

docs/requirements.txt pypi
  • myst-parser ==0.13.7
  • sphinx >=2.1
setup.py pypi
  • apache-beam *
.github/workflows/ci.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • conda-incubator/setup-miniconda v2 composite
  • s-weigand/setup-conda v1 composite
  • styfle/cancel-workflow-action 0.7.0 composite
.github/workflows/publish.yml actions
  • actions/checkout v2 composite
  • actions/download-artifact v2 composite
  • actions/setup-python v2.3.1 composite
  • actions/upload-artifact v2 composite
  • pypa/gh-action-pypi-publish v1.4.2 composite
Dockerfile docker
  • apache/beam_python${py_version}_sdk 2.40.0 build
  • continuumio/miniconda3 latest build
environment.yml pypi
  • cython ==0.29.34
  • earthengine-api ==0.1.329
  • firebase-admin ==6.0.1
pyproject.toml pypi
weather_dl/setup.py pypi
weather_mv/setup.py pypi
weather_sp/setup.py pypi