https://github.com/blutjens/climate-emulator

A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)

https://github.com/blutjens/climate-emulator

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords

climate climate-model climatebench emulator internal-variability machine-learning surrogate-modeling
Last synced: 5 months ago · JSON representation

Repository

A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)

Basic Info
  • Host: GitHub
  • Owner: blutjens
  • License: cc-by-4.0
  • Language: Jupyter Notebook
  • Default Branch: public
  • Homepage:
  • Size: 10.1 MB
Statistics
  • Stars: 16
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
climate climate-model climatebench emulator internal-variability machine-learning surrogate-modeling
Created over 1 year ago · Last pushed 6 months ago
Metadata Files
Readme License

README.md

climate-emulator

Official repository for the paper 'The impact of internal variability on benchmarking deep learning climate emulators'. This repository computes the ClimateBenchv1.0 scores for linear pattern scaling. Then, we compare linear pattern scaling with a CNN-LSTM using our data summary from the MPI-ESM1.2-LR model that has more realizations. The repository contains all code to download data and reproduce the paper results.

For a tutorial on how to use pattern scaling for climate emulation, please see: climateemulatortutorial.ipynb

Installation

git clone git@github.com:blutjens/climate-emulator.git cd climate-emulator conda create --name emcli conda activate emcli conda install pip pip install -r requirements.txt pip install -e . ipython kernel install --user --name=emcli # Link conda environment to jupyter notebook

Download Em-MPI data summary (<10GB)

``` export DATADIR=/path/to/data/dir mkdir -p $DATADIR python downloademmpi.py --datadir $DATA_DIR

alternatively, follow instructions at https://huggingface.co/datasets/blutjens/em-mpi

```

Download input4mips emission inputs and ClimateBench NorESM2-LM targets (<2GB)

export PATH_CLIMATEBENCH_DATA=$DATA_DIR/data/raw/climatebench/ mkdir -p $PATH_CLIMATEBENCH_DATA wget https://zenodo.org/record/7064308/files/train_val.tar.gz -P $PATH_CLIMATEBENCH_DATA tar -xvf "$PATH_CLIMATEBENCH_DATA/train_val.tar.gz" -C $PATH_CLIMATEBENCH_DATA rm $PATH_CLIMATEBENCH_DATA/train_val.tar.gz wget https://zenodo.org/record/7064308/files/test.tar.gz -P $PATH_CLIMATEBENCH_DATA tar -xvf "$PATH_CLIMATEBENCH_DATA/test.tar.gz" -C $PATH_CLIMATEBENCH_DATA rm $PATH_CLIMATEBENCH_DATA/test.tar.gz

Reproduce linear pattern scaling (LPS) results on ClimateBench

```

Calculate LPS entry on ClimateBench scoreboard; plot LPS error map for tas, pr, dtr, pr90; and plot correlation between temperature and global cumulative CO2

jupyter notebook notebooks/calculateclimatebenchmetrics.ipynb

The trained weights of the LPS model are also stored in runs/pattern_scaling/default/models/

```

Reproduce internal variability experiment

First code test: Train and evaluate CNN-LSTM on 50-member ensemble-mean Em-MPI data

``` wandb login # call in a terminal with internet access. export TFGPUALLOCATOR=cudamallocasync export KERAS_BACKEND=torch

(optional) export WANDB_MODE='offline' # Use if compute node does have internet access.

vim runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml # Then edit paths to point to /path/to/data/dir python emcli2/models/cnnlstm/train.py --cfgpath 'runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml' --data_var 'pr' --verbose

(optional) set config.yaml -> epochs=100 to train the CNN-LSTM and not just test if the training works.

```

Second code test: Train LPS and CNN-LSTM on single draws of subsets with 1,2,...,50 members. Then plot RMSE over number of realizations.

```

(Need to first edit paths in below configs)

python emcli2/models/cnnlstm/train.py --trainmmembersubsets --cfgpath runs/cnnlstm/mpi-esm1-2-lr/mmembersubsetswithm50evalonallspcmpdwpmanyr/config/config.yaml --data_var pr

python emcli2/models/patternscaling/model.py --trainmmembersubsets --cfgpath runs/patternscaling/mpi-esm1-2-lr/mmembersubsetswithm50replaceFalseevalonallmanyr/config/config.yaml --data_var pr

python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr ```

Full experiment: Train LPS and CNN-LSTM (with multiple seeds) on multiple draws of subsets with 1,2,...,50 members using SLURM. Plot RMSE over number of realizations including uncertainty bars. Also, reports final scores on Em-MPI 50-member dataset.

```

Send CNN-LSTM off to supercomputer

sbatch train.sh

Send pattern scaling to supercomputer

sbatch trainpatternscaling.sh

python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr

repeat for --datavar pr and --datavar tas

```

Reproduce the other figures in JAMES24 paper submission

```

notebooks/explorelinearrelationships.ipynb -> Plot functional relationships in cumlative CO2 emissions, surface temperature, and precipitation; also plot for multiple regions

notebooks/explorelocalinternal_variability.ipynb -> Plot internal variability in 3-member NorESM2-LM vs 50-member MPI-ESM1.2-LR ensemble-mean; also plot for multiple regions

notebooks/exploreprdistribution_mpi.ipynb -> Plot precipitation distributions to show they're not log-normally distributed

notebooks/energy_balance.ipynb -> Plot the bias-variance tradeoff experiment on the energy balance model

```

Reproduce the Em-MPI data summary from raw CMIP6 data.

Download raw MPI-ESM1.2-LR from CMIP6 data on ESGF (tested on svante)

``` cd ClimateSet

Download inputs / forcers:

python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/core_dataset.yaml

Download outputs / climate variables:

python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/mpi-esm1-2-lr.yaml

Delete years in piControl that are past np.datetime64. Maintain 400yrs.

cd /d0/lutjens/raw/CMIP6/MPI-ESM1-2-LR/ ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km//@(day|mon)/@(23|24|25|26|27|28|29*) # replace ls w. rm -r ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km/@(day|mon)/@(225|226|227|228|229*) # replace ls w. rm -r ```

Reprocess MPI-ESM1.2-LR raw data to get the interim Em-MPI data

``` cd ~/climate-emulator

Get ensemble_summary.nc that contains statistical summaries across members

Monthly variables, e.g., tas, pr

python emcli2/dataset/mpiesm12lr.py --datavar 'huss' --getensemblesummaries

Compute Diurnal Temperature Range; takes ~ 16min per scenario for 30 realizations.

python emcli2/dataset/mpiesm12lr.py --datavar 'dtr' --getensemblesummaries

Compute extreme precipitation; takes 5hrs 30min for historical scenario and 50 realizations.

python emcli2/dataset/mpiesm12lr.py --datavar 'pr90' --getensemblesummaries

optional: Daily variables, e.g., pr, tasmax, tasmin

python emcli2/dataset/mpiesm12lr.py --datavar 'tasmax' --getensemblesummaries --frequency 'day'

Get ensemble.nc that merges all years and ensemble members

python emcli2/dataset/mpiesm12lr.py --datavar 'tas' --getensembleconcat ```

Known issues

```

In case of import tensorflow error in wandblogger. $ pip install tensorflow

In case xr.openmfdataset(..., parallel=True) crashes: $ conda install --channel=conda-forge eccodes

In case Segmentation fault: try config.yaml -> "opendataparallel: False"

```

Reference

If this repository is useful for your analysis please consider citing: @article{lutjens24internalvar, title={The impact of internal variability on benchmarking deep learning climate emulators}, author={Björn Lütjens and Raffaele Ferrari and Duncan Watson-Parris and Noelle Selin}, year={2024}, eprint={2408.05288}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2408.05288}, }

Owner

  • Name: Björn Lütjens (he/him)
  • Login: blutjens
  • Kind: user
  • Company: MIT

Postdoctoral Associate in tackling climate change with AI @ MIT. Project overview at https://blutjens.github.io/

GitHub Events

Total
  • Release event: 1
  • Watch event: 11
  • Delete event: 1
  • Push event: 4
  • Pull request event: 2
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 11
  • Delete event: 1
  • Push event: 4
  • Pull request event: 2
  • Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • blutjens (1)
Top Labels
Issue Labels
Pull Request Labels