atmorep

AtmoRep model code

https://github.com/clessig/atmorep

Science Score: 64.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

AtmoRep model code

Basic Info
  • Host: GitHub
  • Owner: clessig
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 1.37 MB
Statistics
  • Stars: 46
  • Watchers: 8
  • Forks: 15
  • Open Issues: 46
  • Releases: 2
Created over 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation

README.md

IMPORTANT NOTE:

Please note that that the folder is not maintained anymore since March 1st. \ Please use the WeatherGenerator code instead: https://github.com/ecmwf/WeatherGenerator

AtmoRep

This repository contains the source code for the AtmoRep models for large scale representation learning of atmospheric dynamics as well as links to the pre-trained models and the required model input data.

The pre-print for the work is available on ArXiv: https://arxiv.org/abs/2308.13280.

@misc{Lessig2023atmorep, title = {AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning}, author = {Christian Lessig and Ilaria Luise and Bing Gong and Michael Langguth and Scarlet Stadler and Martin Schultz}, eprint = {2308.13280}, primaryclass = {physics.ao-ph}, url = {https://arxiv.org/abs/2308.13280}, year = {2023},

Starter README

1. Pull code

%> wget git@github.com:clessig/atmorep.git This creates a directory atmorep with the code that contains the source code including the python scripts for model training and evaluation.

After following the steps described below, the final directory structure will look as follows: └── atmorep/ ├── atmorep/ │ └── ... ├── data/ <- top level data directory │ ├── normalisation/ <- directory for data normalisations │ ├── vorticity/ │ │ ├── ml105/ <- model levels with monthly GRIB files │ │ │ ├── era5_vorticity_y2021_m03_ml137.grib <- grib data file │ │ │ ├── ... │ │ ├── ml114/ │ │ ├── ml123/ │ │ ├── ml137/ │ │ ├── ml96/ . . . │ ├── temperature/ . . ├── models │ ├── id4nvwbetz <- Directory containing model weights and config │ │ ├── model_id4nvwbetz.json │ │ └── AtmoRep_id4nvwbetz.mod │ ├── id<model_id> . . └── results ├── id4nvwbetz ... The directories data, models, and results need to be created if they do not exist. All directories might be large and should thus be on a directory with sufficient storage space; in this case they can be soft-linked to the default ones above or they can be set in atmorep/config/config.

2. Download the data

2.1 Download pre-trained models

Models can be downloaded from: https://datapub.fz-juelich.de/atmorep/trained-models.html

An example for downloading the pre-trained models is given here, in this case for the vorticity model.

% atmorep/> mkdir models % atmorep/> cd models % atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/models/model_id4nvwbetz.tar.gz % atmorep/data/> tar xvzf model_id4nvwbetz.tar.gz % atmorep/data/> ls id4nvwbetz AtmoRep_id4nvwbetz.mod model_id4nvwbetz.json

2.2 Download model input data (ERA5)

The input data in the required structure can be downloaded from the Jülich datapub server. Direct link to WebDAV https://datapub.fz-juelich.de/atmorep/data/. Alternatively, it can be directly downloaded from MARS using the following script.

Download a subset of files

All data files (fields and normalizations) should be downloaded into the data directory. Un-taring the files will generate the correct folder structure. For example (we will use the vorticity example also below to run the first model so it is recommended to download it as a first step): % atmorep/> mkdir data % atmorep/> cd data % atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/data/vorticity/ml137/era5_vorticity_y2021_ml137.tar % atmorep/data/> tar xvf era5_vorticity_y2021_ml137.tar % atmorep/data/> ls -lah vorticity/ml137/ total 18G era5_vorticity_y2021_m01_ml137.grib era5_vorticity_y2021_m02_ml137.grib ... era5_vorticity_y2021_m12_ml137.grib For efficiency reasons, AtmoRep takes monthly ERA5 data as input. Therefore, each tar file contains 12 GRIB files of about 1.5 GBytes each.

Coefficients for data normalization per field and level can be downloaded here: https://datapub.fz-juelich.de/atmorep/data/normalization/. They should also be located in the data directory: % atmorep/data/> wget https://datapub.fz-juelich.de/atmorep/data/normalization/normalization_vorticity_ml137.tar.gz % atmorep/data/> tar xvzf normalization_vorticity_ml137.tar.gz

3. Install python packages

Create a python environment, e.g.

% atmorep/> python3 -m venv pyenv

and activate the environment:

% atmorep/> source pyenv/bin/activate conda is also possible, no environment is strictly required although we would recommend it. Please make sure to use a recent python version (we tested with python3.10). Then install the AtmoRep package: % atmorep/> % atmorep/> pip install -e .

torch is currently not included (since it is often available or has particular dependencies, e.g. a specific Cuda version). In the simplest case, it can just be installed by:

% atmorep/> pip install torch We require torch 2.x. (A container solution allows to run even on systems where torch 2.x is not available.)

4. Run model:

Pre-trained models can normally be run by: % atmorep/> python atmorep/core/evaluate.py You can easily adapt the configuration by selecting the corresponding modelid_ in evaluate.py (see below). It defaults to the single-field configuration of vorticity, of which we have downloaded the data above.

Depending on your compute hardware, you might also have to run the computations by submitting the job using a batch system or allocate a compute node in interactive mode (if an interactive seesion is possible, then this is recommended). If you run an interactive session you will likely need to use the following: % atmorep/> export CUDA_VISIBLE_DEVICES=0,1,2,3 % atmorep/> MASTER_ADDR="$(scontrol show hostnames "$SLURM_JOB_NODELIST" | head -n 1)"

The default evaluation mode is currently global forecast. The output will be (similar to) this: ```` devices : ['cuda:0', 'cuda:1', 'cuda:2', 'cuda:3'] Wandb run: atmorep-ztvyw7k6-8932958 Running Evaluate.evaluate with mode = globalforecast Loaded AtmoRep id=4nvwbetz, ignoring/missing 2 elements. Loaded model id = 4nvwbetz at epoch = -2. Number of batches per global forecast: 14 INFO:: data stats vorticity : 5.374998363549821e-05 / 0.9978392720222473 numaccspertask : 1 withhvd : True hvdrank : 0

...

wandbid : ztvyw7k6 dates : [[2021, 2, 10, 12]] tokenoverlap : [0, 0] forecastnumtokens : 1 validation loss for strategy=forecast at epoch 0 : 0.12402566522359848 validation loss for vorticity : 0.12402566522359848 wandb: Waiting for W&B process to finish... (success). wandb: wandb: Run history: wandb: val. loss forecast ▁ wandb: val., forecast, vorticity ▁ wandb: wandb: Run summary: wandb: val. loss forecast 0.12403 wandb: val., forecast, vorticity 0.12403 wandb: wandb: You can sync this run to the cloud by running: wandb: wandb sync /p/project/atmo-rep/lessig/atmorep/atmorep/lessig-cleanup/atmorep/wandb/offline-run-20231124095428-ztvyw7k6 `` For the vorticity example above, we evaluate withglobalforecastfor a specific date and using only a single model level: mode, options = 'global_forecast', { 'fields[0][2]' : [137], 'dates' : [ [2021, 2, 10, 12] ], 'token_overlap' : [0, 0], 'forecast_num_tokens' : 1, 'attention' : False} We perform a 3 hour forecast, since 1 token is 3 hours wide. Another mode is the BERT masked token model mode used for pre-training: mode, options = 'BERT', {'years_test' : [2021], 'fields[0][2]' : [123, 137]} `` Again, we chose some custom options by using two levels instead of the five ones that are default and were used during pre-training and by using 2021 as the test year (since we downloaded the data).

The generated model output (stored in ./results/id{wandbid}) for the global_forecast example can be post-processed into a spatial map with the following code. The runid at the top needs to be replaced by the wandbid of your run, it can be read off from the console output. Results will be stored as example_0000{0,1,2}.png. The code is also an as-simple-as-possible example with many parameters hard-coded, see our analysis code for a proper handling.

Owner

  • Name: Christian Lessig
  • Login: clessig
  • Kind: user
  • Location: Potsdam
  • Company: European Center for Medium Range Weather Forecasts

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: Atmorep
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Christian
    family-names: Lessig
    email: christian.lessig@ecmwf.int
    affiliation: European Centre for Medium-Range Weather Forecasts (ECMWF)
  - given-names: Ilaria
    family-names: Luise
    email: ilaria.luise@cern.ch
    affiliation: European Organization for Nuclear Research (CERN)
   - given-names: Martin
    family-names: Schultz
    email: m.schultz@fz-juelich.de
    orcid: 'https://orcid.org/0000-0003-3455-774X'
    affiliation: Forschungszentrum Jülich (FZJ)
  - given-names: Michael
    family-names: Langguth
    email: m.langguth@fz-juelich.de
    orcid: 'https://orcid.org/0000-0003-3354-5333'
    affiliation: Forschungszentrum Jülich (FZJ)
identifiers:
  - type: url
    value: 'https://arxiv.org/abs/2308.13280'
    description: corresponding Preprint
repository-code: 'https://isggit.cs.uni-magdeburg.de/atmorep/atmorep'
url: 'https://www.atmorep.org'
abstract: >-
  AtmoRep is a novel, task-independent stochastic computer 
  model of atmospheric dynamics that can provide skillful 
  results for a wide range of applications. AtmoRep uses 
  large-scale representation learning from artificial 
  intelligence to determine a general description of the 
  highly complex, stochastic dynamics of the atmosphere 
  from the best available estimate of the system's historical 
  trajectory as constrained by observations. This is enabled 
  by a novel self-supervised learning objective and a unique 
  ensemble that samples from the stochastic model with a 
  variability informed by the one in the historical record. 
  Our work establishes that large-scale neural networks can
  provide skillful, task-independent models of atmospheric
  dynamics. With this, they provide a novel means to make
  the large record of atmospheric observations accessible
  for applications and for scientific inquiry, complementing
  existing simulations based on first principles.
license: MIT
commit: b0da5b32ec70295914bbb486dbcb77885671dc45
version: 2.0 (preprint)
date-released: '2023-11-28'

GitHub Events

Total
  • Fork event: 5
  • Create event: 25
  • Commit comment event: 2
  • Issues event: 42
  • Watch event: 8
  • Delete event: 5
  • Member event: 2
  • Issue comment event: 87
  • Push event: 103
  • Gollum event: 10
  • Pull request event: 21
  • Pull request review comment event: 27
  • Pull request review event: 15
Last Year
  • Fork event: 5
  • Create event: 25
  • Commit comment event: 2
  • Issues event: 42
  • Watch event: 8
  • Delete event: 5
  • Member event: 2
  • Issue comment event: 87
  • Push event: 103
  • Gollum event: 10
  • Pull request event: 21
  • Pull request review comment event: 27
  • Pull request review event: 15

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 96
  • Total Committers: 4
  • Avg Commits per committer: 24.0
  • Development Distribution Score (DDS): 0.646
Past Year
  • Commits: 14
  • Committers: 3
  • Avg Commits per committer: 4.667
  • Development Distribution Score (DDS): 0.214
Top Committers
Name Email Commits
Ilaria Luise i****e@c****h 34
Christian Lessig c****g@o****e 23
Christian Lessig c****g@g****m 20
iluise l****a@g****m 19
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 60
  • Total pull requests: 37
  • Average time to close issues: 2 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 14
  • Total pull request authors: 8
  • Average comments per issue: 3.25
  • Average comments per pull request: 0.76
  • Merged pull requests: 23
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 38
  • Pull requests: 26
  • Average time to close issues: 19 days
  • Average time to close pull requests: 12 days
  • Issue authors: 12
  • Pull request authors: 8
  • Average comments per issue: 3.11
  • Average comments per pull request: 0.92
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • iluise (13)
  • sbAsma (8)
  • mlangguth89 (7)
  • kacpnowak (6)
  • clessig (5)
  • ankitpatnala (5)
  • grassesi (4)
  • nish03 (4)
  • sascholle (3)
  • maruf-anu (2)
  • dancivitarese (1)
  • jpolz (1)
  • Sindhu-Vasireddy (1)
  • javak87 (1)
Pull Request Authors
  • iluise (27)
  • grassesi (8)
  • clessig (8)
  • sbAsma (6)
  • kacpnowak (4)
  • mlangguth89 (3)
  • zalbanob (2)
  • jpolz (2)
Top Labels
Issue Labels
enhancement (10) bug (8) core model (6) I/O (5) good first issue (4) scientific (2) help wanted (2) performance (1) analysis (1) triaged (1) question (1)
Pull Request Labels
core model (6) bug (5) enhancement (4) I/O (2)

Dependencies

setup.py pypi
  • cfgrib *
  • cloudpickle *
  • ecmwflibs *
  • matplotlib *
  • netcdf4 *
  • numpy *
  • pandas *
  • pathlib *
  • pytz *
  • torchinfo *
  • typing_extensions *
  • wandb *
  • xarray *
  • zarr *