cra5

A large compression model for weather and climate data, which compresses a 400+ TB ERA5 dataset into a new 0.8 TB CRA5 dataset.

https://github.com/taohan10200/cra5

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, nature.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (18.7%) to scientific vocabulary

Keywords

auto-encoder data-compression era5 numerical-weather-forecasting
Last synced: 7 months ago · JSON representation

Repository

A large compression model for weather and climate data, which compresses a 400+ TB ERA5 dataset into a new 0.8 TB CRA5 dataset.

Basic Info
  • Host: GitHub
  • Owner: taohan10200
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 15.4 MB
Statistics
  • Stars: 80
  • Watchers: 5
  • Forks: 5
  • Open Issues: 5
  • Releases: 0
Topics
auto-encoder data-compression era5 numerical-weather-forecasting
Created almost 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme

Readme.md

License PyPI Downloads

Paper:CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer

Introduction and get started

CRA5 dataset now is available at OneDrive

CRA5 is a extreme compressed weather dataset of the most popular ERA5 reanalysis dataset. The repository also includes compression models, forecasting model for researchers to conduct portable weather and climate research.

CRA5 currently provides:

  • A customized variaitional transformer (VAEformer) for climate data compression
  • A dataset CRA5 less than 1 TiB, but contains the same information with 400+ TiB ERA5 dataset. Covering houly ERA5 from year 1979 to 2023.
  • A pre-trained Auto-Encoder on the climate/weather data to support some potential weather research.

Note: Multi-GPU support is now experimental.

Installation

CRA5 supports python 3.8+ and PyTorch 1.7+. conda create --name cra5 python=3.10 -y conda activate cra5

Please install cra5 from source:

A C++17 compiler, a recent version of pip (19.0+), and common python packages are also required (see setup.py for the full list).

To get started locally and install the development version of CRA5, run the following commands in a virtual environment:

```bash git clone https://github.com/taohan10200/CRA5 cd CRA5

pip install -U pip && pip install -e . ```

Test

python test.py

Usages

Using with API:

Supporting functions like: Compression / decompression / latents representation / feature visulization / reconstructed visulization ```python

We build a downloader to help use download the original ERA5 netcdf files for testing.

data/ERA5/2024/2024-06-01T00:00:00pressure.nc (513MiB) and data/ERA5/2024/2024-06-01T00:00:00single.nc (18MiB)

from cra5.api.era5downloader import era5downloader ERA5data = era5downloader('./cra5/api/era5config.py') #specify the dataset config for what we want to download data = ERA5data.getformtimestamp(timestamp="2024-06-01T00:00:00", localroot='./data/ERA5')

After getting the ERA5 data ready, you can explore the compression.

from cra5.api import cra5api cra5API = cra5_api()

=======================compression functions=====================

Return a continuous latent y for ERA5 data at 2024-06-01T00:00:00

y = cra5API.encodetolatent(timestamp="2024-06-01T00:00:00")

Return the the arithmetic coded binary stream of y

binstream = cra5API.latenttobin(y=y)

Or if you want to directly compress and save the binary stream to a folder

cra5API.encodeera5asbin(timestamp="2024-06-01T00:00:00", saveroot='./data/cra5')

=======================decompression functions=====================

Starting from the bin_stream, you can decode the binary file to the quantized latent.

yhat = cra5API.bintolatent(bin_path="./data/CRA5/2024/2024-06-01T00:00:00.bin") # Decoding from binary can only get the quantized latent.

Return the normalized cra5 data

normlizedxhat = cra5API.latenttoreconstruction(yhat=y_hat)

If you have saveed or downloaded the binary file, then you can directly restore the binary file into reconstruction.

normlizedxhat = cra5API.decodefrombin("2024-06-01T00:00:00", returnformat='normalized') # Return the normalized cra5 data xhat = cra5API.decodefrombin("2024-06-01T00:00:00", returnformat='denormalized') # Return the de-normalized cra5 data

Show some channels of the latent

cra5API.showlatent( latent=yhat.squeeze(0).cpu().numpy(), timestamp="2024-06-01T00:00:00", showchannels=[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100, 110, 120, 130, 140, 150], savepath = './data/vis')

``` <!-- ID-CompressAI-logo -->

```python

show some variables for the constructed data

cra5API.showimage( reconstructdata=xhat.cpu().numpy(), timestamp="2024-06-01T00:00:00", showvariables=['z500', 'q500', 'u500', 'v500', 't500', 'w500'], save_path = './data/vis') ```

Or using with the pre-trained model

```python import os import torch from cra5.models.compressai.zoo import vaeformerpretrained device = 'cuda' if torch.cuda.isavailable() else 'cpu' print(device) net = vaeformerpretrained(quality=268, pretrained=True).eval().to(device) inputdata_norm = torch.rand(1,268, 721,1440).to(device) #This is a proxy weather data. It actually should be a

print(x.shape) with torch.nograd(): outnet = net.compress(x)

print(out_net) ```

Features

1. CRA5 dataset is a product of the VAEformer applied in the atmospheric science. We explore this to facilitate the research in weather and climate.

  • Train the large data-driven numerical weather forecasting models with our CRA5

Note: For researches who do not have enough disk space to store the 300 TiB+ ERA5 dataset, but have interests to train a large weather forecasting model, like FengWu-GHR, this research can help you save it into less than 1 TiB disk.

Our preliminary attemp has proven that the CRA5 dataset can train the very very similar NWP model compared with the original ERA5 dataset. Also, with this dataset, you can easily train a Nature published forecasting model, like Pangu-Weather.

2. VAEformer is a powerful compression model, we hope it can be extended to other domains, like image and video compression.

3 VAEformer is based on the Auto-Encoder-Decoder, we provide a pretrained VAE for the weather research, you can use our VAEformer to get the latents for downstream research, like diffusion-based or other generation-based forecasting methods.

  • Using it as a Auto-Encoder-Decoder

Note: For people who are intersted in diffusion-based or other generation-based forecasting methods, we can provide an Auto Encoder and decoder for the weather research, you can use our VAEformer to get the latents for downstream research.

License

CompressAI is licensed under the BSD 3-Clause Clear License

Contributing

We welcome feedback and contributions. Please open a GitHub issue to report bugs, request enhancements or if you have any questions.

Before contributing, please read the CONTRIBUTING.md file.

Authors

Citation

If you use this project, please cite the relevant original publications for the models and datasets, and cite this project as:

@article{han2024cra5extremecompressionera5, title={CRA5: Extreme Compression of ERA5 for Portable Global Climate and Weather Research via an Efficient Variational Transformer}, author={Tao Han and Zhenghao Chen and Song Guo and Wanghan Xu and Lei Bai}, year={2024}, eprint={2405.03376}, archivePrefix={arXiv}, primaryClass={cs.LG}, url={https://arxiv.org/abs/2405.03376}, }

For any work related to the forecasting models, please cite @article{han2024fengwughr, title={FengWu-GHR: Learning the Kilometer-scale Medium-range Global Weather Forecasting}, author={Tao Han and Song Guo and Fenghua Ling and Kang Chen and Junchao Gong and Jingjia Luo and Junxia Gu and Kan Dai and Wanli Ouyang and Lei Bai}, year={2024}, eprint={2402.00059}, archivePrefix={arXiv}, primaryClass={cs.LG} }

The weather variabls supported in CRA5 and their numerical error

CRA5 contains a total of 268 variables, including 7 pressure-level variables from the ERA5 pressure level archive and 9 surface variables .

| Variable | channel | error | Variable | channel | error | Variable | channel | error |Variable | channel | error |Variable | channel | error |
|--------|---------|-----------|--------|---------|-----------|--------|---------|-----------|--------|---------|-----------|--------|---------|-----------| | geopotential | z1000 | 9.386 | specifichumidity | q1000 | 0.00033 | ucomponentofwind | u1000 | 0.416 |vcomponentofwind | v1000 | 0.411 | temperature | t1000 | 0.405 | | geopotential | z975 | 7.857 | specifichumidity | q975 | 0.00032 | ucomponentofwind | u975 | 0.448 |vcomponentofwind | v975 | 0.442 | temperature | t975 | 0.380 | | geopotential | z950 | 6.802 | specifichumidity | q950 | 0.00035 | ucomponentofwind | u950 | 0.491 |vcomponentofwind | v950 | 0.479 | temperature | t950 | 0.352 | | geopotential | z925 | 6.088 | specifichumidity | q925 | 0.00037 | ucomponentofwind | u925 | 0.520 |vcomponentofwind | v925 | 0.505 | temperature | t925 | 0.333 | | geopotential | z900 | 5.575 | specifichumidity | q900 | 0.00036 | ucomponentofwind | u900 | 0.518 |vcomponentofwind | v900 | 0.503 | temperature | t900 | 0.321 | | geopotential | z875 | 5.259 | specifichumidity | q875 | 0.00035 | ucomponentofwind | u875 | 0.517 |vcomponentofwind | v875 | 0.503 | temperature | t875 | 0.309 | | geopotential | z850 | 5.061 | specifichumidity | q850 | 0.00034 | ucomponentofwind | u850 | 0.508 |vcomponentofwind | v850 | 0.493 | temperature | t850 | 0.294 | | geopotential | z825 | 4.941 | specifichumidity | q825 | 0.00031 | ucomponentofwind | u825 | 0.496 |vcomponentofwind | v825 | 0.481 | temperature | t825 | 0.276 | | geopotential | z800 | 4.897 | specifichumidity | q800 | 0.00029 | ucomponentofwind | u800 | 0.487 |vcomponentofwind | v800 | 0.472 | temperature | t800 | 0.259 | | geopotential | z775 | 4.947 | specifichumidity | q775 | 0.00027 | ucomponentofwind | u775 | 0.486 |vcomponentofwind | v775 | 0.468 | temperature | t775 | 0.250 | | geopotential | z750 | 5.120 | specifichumidity | q750 | 0.00029 | ucomponentofwind | u750 | 0.545 |vcomponentofwind | v750 | 0.524 | temperature | t750 | 0.250 | | geopotential | z700 | 5.593 | specifichumidity | q700 | 0.00029 | ucomponentofwind | u700 | 0.638 |vcomponentofwind | v700 | 0.607 | temperature | t700 | 0.242 | | geopotential | z650 | 5.810 | specifichumidity | q650 | 0.00025 | ucomponentofwind | u650 | 0.634 |vcomponentofwind | v650 | 0.610 | temperature | t700 | 0.242 | | geopotential | z600 | 5.882 | specifichumidity | q600 | 0.00020 | ucomponentofwind | u600 | 0.633 |vcomponentofwind | v600 | 0.597 | temperature | t650 | 0.240 | | geopotential | z550 | 5.958 | specifichumidity | q550 | 0.00018 | ucomponentofwind | u550 | 0.668 |vcomponentofwind | v550 | 0.616 | temperature | t600 | 0.222 | | geopotential | z500 | 6.098 | specifichumidity | q500 | 0.00014 | ucomponentofwind | u500 | 0.676 |vcomponentofwind | v500 | 0.603 | temperature | t550 | 0.201 | | geopotential | z450 | 6.408 | specifichumidity | q450 | 0.00010 | ucomponentofwind | u450 | 0.699 |vcomponentofwind | v450 | 0.649 | temperature | t500 | 0.185 | | geopotential | z400 | 6.851 | specifichumidity | q400 | 0.00007 | ucomponentofwind | u400 | 0.733 |vcomponentofwind | v400 | 0.686 | temperature | t450 | 0.185 | | geopotential | z350 | 7.366 | specifichumidity | q350 | 0.00004 | ucomponentofwind | u350 | 0.760 |vcomponentofwind | v350 | 0.704 | temperature | t400 | 0.179 | | geopotential | z300 | 8.324 | specifichumidity | q300 | 0.00002 | ucomponentofwind | u300 | 0.744 |vcomponentofwind | v300 | 0.704 | temperature | t350 | 0.170 | | geopotential | z250 | 8.100 | specifichumidity | q250 | 0.00001 | ucomponentofwind | u250 | 0.765 |vcomponentofwind | v250 | 0.701 | temperature | t300 | 0.160 | | geopotential | z225 | 7.698 | specifichumidity | q225 | 0.00001 | ucomponentofwind | u225 | 0.722 |vcomponentofwind | v225 | 0.642 | temperature | t250 | 0.166 | | geopotential | z200 | 7.900 | specifichumidity | q200 | 0.00000 | ucomponentofwind | u200 | 0.646 |vcomponentofwind | v200 | 0.563 | temperature | t225 | 0.169 | | geopotential | z175 | 8.059 | specifichumidity | q175 | 0.00000 | ucomponentofwind | u175 | 0.565 |vcomponentofwind | v175 | 0.509 | temperature | t200 | 0.158 | | geopotential | z150 | 8.928 | specifichumidity | q150 | 0.00000 | ucomponentofwind | u150 | 0.525 |vcomponentofwind | v150 | 0.458 | temperature | t150 | 0.149 | | geopotential | z125 | 10.813 | specifichumidity | q125 | 0.00000 | ucomponentofwind | u125 | 0.479 |vcomponentofwind | v125 | 0.417 | temperature | t125 | 0.158 | | geopotential | z100 | 15.956 | specifichumidity | q100 | 0.00000 | ucomponentofwind | u100 | 0.447 |vcomponentofwind | v100 | 0.373 | temperature | t100 | 0.178 | | geopotential | z70 | 11.158 | specifichumidity | q70 | 0.00000 | ucomponentofwind | u70 | 0.360 |vcomponentofwind | v70 | 0.275 | temperature | t70 | 0.155 | | geopotential | z50 | 11.962 | specifichumidity | q50 | 0.00000 | ucomponentofwind | u50 | 0.356 |vcomponentofwind | v50 | 0.242 | temperature | t50 | 0.158 | | geopotential | z30 | 13.317 | specifichumidity | q30 | 0.00000 | ucomponentofwind | u30 | 0.348 |vcomponentofwind | v30 | 0.221 | temperature | t30 | 0.153 | | geopotential | z20 | 16.538 | specifichumidity | q20 | 0.00000 | ucomponentofwind | u20 | 0.361 |vcomponentofwind | v20 | 0.229 | temperature | t20 | 0.161 | | geopotential | z10 | 19.751 | specifichumidity | q10 | 0.00000 | ucomponentofwind | u10 | 0.350 |vcomponentofwind | v10 | 0.232 | temperature | t10 | 0.166 | | geopotential | z7 | 20.925 | specifichumidity | q7 | 0.00000 | ucomponentofwind | u7 | 0.315 |vcomponentofwind | v7 | 0.225 | temperature | t7 | 0.161 | | geopotential | z5 | 20.825 | specifichumidity | q5 | 0.00000 | ucomponentofwind | u5 | 0.307 |vcomponentofwind | v5 | 0.212 | temperature | t5 | 0.160 | | geopotential | z3 | 24.529 | specifichumidity | q3 | 0.00000 | ucomponentofwind | u3 | 0.333 |vcomponentofwind | v3 | 0.246 | temperature | t3 | 0.194 | | geopotential | z2 | 28.055 | specifichumidity | q2 | 0.00000 | ucomponentofwind | u2 | 0.338 |vcomponentofwind | v2 | 0.239 | temperature | t2 | 0.184 | | geopotential | z1 | 27.987 | specifichumidity | q1 | 0.00000 | ucomponentofwind | u1 | 0.363 |vcomponentofwind | v1 | 0.245 | temperature | t1 | 0.182 | |--------|---------|-----------|--------|---------|-----------|--------|---------|-----------|--------|---------|-----------|--------|---------|-----------| | relativehumidity | r1000 | 3.073 | verticalvelocity w1000 | 0.059 || 10mvcomponentofwind | v10 | 0.367 | | relativehumidity | r975 | 3.192 | verticalvelocity w975 | 0.067 || 10mucomponentofwind | u10 | 0.379 | | relativehumidity | r950 | 3.588 | verticalvelocity w950 | 0.078 || 100mvcomponentofwind | v100 | 0.435 | | relativehumidity | r925 | 3.877 | verticalvelocity w925 | 0.086 || 100mucomponentofwind | u100 | 0.445 | | relativehumidity | r900 | 3.982 | verticalvelocity w900 | 0.090 || 2mtemperature | t2m | 0.720 | | relativehumidity | r875 | 4.011 | verticalvelocity w875 | 0.092 || totalcloudcover | tcc | 0.146 | | relativehumidity | r850 | 3.933 | verticalvelocity w850 | 0.093 || surfacepressure | sp | 480.222
| relativehumidity | r825 | 3.789 | verticalvelocity w825 | 0.094 || totalprecipitation | tp1h | 0.264 | | relativehumidity | r800 | 3.555 | verticalvelocity w800 | 0.096 || meansealevelpressure | msl | 12.685 | | relativehumidity | r775 | 3.449 | verticalvelocity w775 | 0.099 | | relativehumidity | r750 | 3.816 | verticalvelocity w750 | 0.102 | | relativehumidity | r700 | 4.265 | verticalvelocity w700 | 0.110 | | relativehumidity | r650 | 4.223 | verticalvelocity w650 | 0.114 | | relativehumidity | r600 | 4.183 | verticalvelocity w600 | 0.112 | | relativehumidity | r550 | 4.411 | verticalvelocity w550 | 0.106 | | relativehumidity | r500 | 4.409 | verticalvelocity w500 | 0.101 | | relativehumidity | r450 | 4.675 | verticalvelocity w450 | 0.096 | | relativehumidity | r400 | 4.831 | verticalvelocity w400 | 0.091 | | relativehumidity | r350 | 4.932 | verticalvelocity w350 | 0.084 | | relativehumidity | r300 | 5.151 | verticalvelocity w300 | 0.075 | | relativehumidity | r250 | 5.134 | verticalvelocity w250 | 0.056 | | relativehumidity | r225 | 4.682 | verticalvelocity w225 | 0.046 | | relativehumidity | r200 | 3.899 | verticalvelocity w200 | 0.039 | | relativehumidity | r175 | 3.063 | verticalvelocity w175 | 0.034 | | relativehumidity | r150 | 2.508 | verticalvelocity w150 | 0.029 | | relativehumidity | r125 | 2.123 | verticalvelocity w125 | 0.024 | | relativehumidity | r100 | 1.844 | verticalvelocity w100 | 0.018 | | relativehumidity | r70 | 0.487 | verticalvelocity w70 | 0.010 | | relativehumidity | r50 | 0.151 | verticalvelocity w50 | 0.007 | | relativehumidity | r30 | 0.097 | verticalvelocity w30 | 0.005 | | relativehumidity | r20 | 0.083 | verticalvelocity w20 | 0.003 | | relativehumidity | r10 | 0.033 | verticalvelocity w10 | 0.002 | | relativehumidity | r7 | 0.016 | verticalvelocity w7 | 0.001 | | relativehumidity | r5 | 0.008 | verticalvelocity w5 | 0.001 | | relativehumidity | r3 | 0.003 | verticalvelocity w3 | 0.001 | | relativehumidity | r2 | 0.001 | verticalvelocity w2 | 0.000 | | relativehumidity | r1 | 0.000 | verticalvelocity w1 | 0.000 |

Related links

  • CompressAI Library: https://github.com/InterDigitalInc/CompressAI

Owner

  • Name: tao han
  • Login: taohan10200
  • Kind: user

GitHub Events

Total
  • Issues event: 6
  • Watch event: 45
  • Issue comment event: 5
  • Push event: 1
  • Fork event: 5
Last Year
  • Issues event: 6
  • Watch event: 45
  • Issue comment event: 5
  • Push event: 1
  • Fork event: 5

Committers

Last synced: 8 months ago

All Time
  • Total Commits: 55
  • Total Committers: 1
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 33
  • Committers: 1
  • Avg Commits per committer: 33.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
taohan10200 t****0@1****m 55
Committer Domains (Top 20 + Academic)
163.com: 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 10
  • Total pull requests: 0
  • Average time to close issues: 2 months
  • Average time to close pull requests: N/A
  • Total issue authors: 8
  • Total pull request authors: 0
  • Average comments per issue: 2.4
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 7
  • Pull requests: 0
  • Average time to close issues: 18 days
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 0
  • Average comments per issue: 1.86
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • px39n (2)
  • tung-nd (2)
  • 0rhisia0 (1)
  • siddevkota (1)
  • Mapirlet (1)
  • gerome-andry (1)
  • Sardingfish (1)
  • vitusbenson (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 21 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
pypi.org: cra5

A large compression model for weather and climate data, which compresses a 200+ TB ERA5 dataset into a new 0.7TB CRA5 dataset.

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 21 Last month
Rankings
Dependent packages count: 10.7%
Average: 35.5%
Dependent repos count: 60.3%
Maintainers (1)
Last synced: 7 months ago