https://github.com/blutjens/climate-emulator

A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary

Keywords

climate climate-model climatebench emulator internal-variability machine-learning surrogate-modeling

Last synced: 5 months ago · JSON representation

Repository

A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)

Basic Info

Host: GitHub
Owner: blutjens
License: cc-by-4.0
Language: Jupyter Notebook
Default Branch: public
Homepage:
Size: 10.1 MB

Statistics

Stars: 16
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Topics

climate climate-model climatebench emulator internal-variability machine-learning surrogate-modeling

Created over 1 year ago · Last pushed 6 months ago

Metadata Files

Readme License

climate-emulator

Official repository for the paper 'The impact of internal variability on benchmarking deep learning climate emulators'. This repository computes the ClimateBenchv1.0 scores for linear pattern scaling. Then, we compare linear pattern scaling with a CNN-LSTM using our data summary from the MPI-ESM1.2-LR model that has more realizations. The repository contains all code to download data and reproduce the paper results.

For a tutorial on how to use pattern scaling for climate emulation, please see: climateemulatortutorial.ipynb

Installation

git clone git@github.com:blutjens/climate-emulator.git cd climate-emulator conda create --name emcli conda activate emcli conda install pip pip install -r requirements.txt pip install -e . ipython kernel install --user --name=emcli # Link conda environment to jupyter notebook

Download Em-MPI data summary (<10GB)

``` export DATADIR=/path/to/data/dir mkdir -p $DATADIR python downloademmpi.py --datadir $DATA_DIR

alternatively, follow instructions at https://huggingface.co/datasets/blutjens/em-mpi

```

Download input4mips emission inputs and ClimateBench NorESM2-LM targets (<2GB)

export PATH_CLIMATEBENCH_DATA=$DATA_DIR/data/raw/climatebench/ mkdir -p $PATH_CLIMATEBENCH_DATA wget https://zenodo.org/record/7064308/files/train_val.tar.gz -P $PATH_CLIMATEBENCH_DATA tar -xvf "$PATH_CLIMATEBENCH_DATA/train_val.tar.gz" -C $PATH_CLIMATEBENCH_DATA rm $PATH_CLIMATEBENCH_DATA/train_val.tar.gz wget https://zenodo.org/record/7064308/files/test.tar.gz -P $PATH_CLIMATEBENCH_DATA tar -xvf "$PATH_CLIMATEBENCH_DATA/test.tar.gz" -C $PATH_CLIMATEBENCH_DATA rm $PATH_CLIMATEBENCH_DATA/test.tar.gz

Reproduce linear pattern scaling (LPS) results on ClimateBench

```

Calculate LPS entry on ClimateBench scoreboard; plot LPS error map for tas, pr, dtr, pr90; and plot correlation between temperature and global cumulative CO2

jupyter notebook notebooks/calculateclimatebenchmetrics.ipynb

The trained weights of the LPS model are also stored in runs/pattern_scaling/default/models/

```

Reproduce internal variability experiment

First code test: Train and evaluate CNN-LSTM on 50-member ensemble-mean Em-MPI data

``` wandb login # call in a terminal with internet access. export TFGPUALLOCATOR=cudamallocasync export KERAS_BACKEND=torch

(optional) export WANDB_MODE='offline' # Use if compute node does have internet access.

vim runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml # Then edit paths to point to /path/to/data/dir python emcli2/models/cnnlstm/train.py --cfgpath 'runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml' --data_var 'pr' --verbose

(optional) set config.yaml -> epochs=100 to train the CNN-LSTM and not just test if the training works.

```

Second code test: Train LPS and CNN-LSTM on single draws of subsets with 1,2,...,50 members. Then plot RMSE over number of realizations.

```

(Need to first edit paths in below configs)

python emcli2/models/cnnlstm/train.py --trainmmembersubsets --cfgpath runs/cnnlstm/mpi-esm1-2-lr/mmembersubsetswithm50evalonallspcmpdwpmanyr/config/config.yaml --data_var pr

python emcli2/models/patternscaling/model.py --trainmmembersubsets --cfgpath runs/patternscaling/mpi-esm1-2-lr/mmembersubsetswithm50replaceFalseevalonallmanyr/config/config.yaml --data_var pr

python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr ```

Full experiment: Train LPS and CNN-LSTM (with multiple seeds) on multiple draws of subsets with 1,2,...,50 members using SLURM. Plot RMSE over number of realizations including uncertainty bars. Also, reports final scores on Em-MPI 50-member dataset.

```

Send CNN-LSTM off to supercomputer

sbatch train.sh

Send pattern scaling to supercomputer

sbatch trainpatternscaling.sh

python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr

repeat for --datavar pr and --datavar tas

```

Reproduce the other figures in JAMES24 paper submission

```

notebooks/explorelinearrelationships.ipynb -> Plot functional relationships in cumlative CO2 emissions, surface temperature, and precipitation; also plot for multiple regions

notebooks/explorelocalinternal_variability.ipynb -> Plot internal variability in 3-member NorESM2-LM vs 50-member MPI-ESM1.2-LR ensemble-mean; also plot for multiple regions

notebooks/exploreprdistribution_mpi.ipynb -> Plot precipitation distributions to show they're not log-normally distributed

notebooks/energy_balance.ipynb -> Plot the bias-variance tradeoff experiment on the energy balance model

```

Reproduce the Em-MPI data summary from raw CMIP6 data.

Download raw MPI-ESM1.2-LR from CMIP6 data on ESGF (tested on svante)

``` cd ClimateSet

Download inputs / forcers:

python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/core_dataset.yaml

Download outputs / climate variables:

python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/mpi-esm1-2-lr.yaml

Delete years in piControl that are past np.datetime64. Maintain 400yrs.

cd /d0/lutjens/raw/CMIP6/MPI-ESM1-2-LR/ ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km//@(day|mon)/@(23|24|25|26|27|28|29*) # replace ls w. rm -r ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km/@(day|mon)/@(225|226|227|228|229*) # replace ls w. rm -r ```

Reprocess MPI-ESM1.2-LR raw data to get the interim Em-MPI data

``` cd ~/climate-emulator

Get ensemble_summary.nc that contains statistical summaries across members

Monthly variables, e.g., tas, pr

python emcli2/dataset/mpiesm12lr.py --datavar 'huss' --getensemblesummaries

Compute Diurnal Temperature Range; takes ~ 16min per scenario for 30 realizations.

python emcli2/dataset/mpiesm12lr.py --datavar 'dtr' --getensemblesummaries

Compute extreme precipitation; takes 5hrs 30min for historical scenario and 50 realizations.

python emcli2/dataset/mpiesm12lr.py --datavar 'pr90' --getensemblesummaries

optional: Daily variables, e.g., pr, tasmax, tasmin

python emcli2/dataset/mpiesm12lr.py --datavar 'tasmax' --getensemblesummaries --frequency 'day'

Get ensemble.nc that merges all years and ensemble members

python emcli2/dataset/mpiesm12lr.py --datavar 'tas' --getensembleconcat ```

Known issues

```

In case of import tensorflow error in wandblogger. $ pip install tensorflow

In case xr.openmfdataset(..., parallel=True) crashes: $ conda install --channel=conda-forge eccodes

In case Segmentation fault: try config.yaml -> "opendataparallel: False"

```

Reference

If this repository is useful for your analysis please consider citing: @article{lutjens24internalvar, title={The impact of internal variability on benchmarking deep learning climate emulators}, author={Björn Lütjens and Raffaele Ferrari and Duncan Watson-Parris and Noelle Selin}, year={2024}, eprint={2408.05288}, archivePrefix={arXiv}, url={https://arxiv.org/abs/2408.05288}, }

Owner

Name: Björn Lütjens (he/him)
Login: blutjens
Kind: user
Company: MIT

Website: https://blutjens.github.io/
Twitter: bjornlutjens
Repositories: 31
Profile: https://github.com/blutjens

Postdoctoral Associate in tackling climate change with AI @ MIT. Project overview at https://blutjens.github.io/

GitHub Events

Total

Release event: 1
Watch event: 11
Delete event: 1
Push event: 4
Pull request event: 2
Create event: 1

Last Year

Release event: 1
Watch event: 11
Delete event: 1
Push event: 4
Pull request event: 2
Create event: 1

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 0
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 0
Total pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

https://github.com/blutjens/climate-emulator

Science Score: 36.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

climate-emulator

Installation

Download Em-MPI data summary (<10GB)

alternatively, follow instructions at https://huggingface.co/datasets/blutjens/em-mpi

Download input4mips emission inputs and ClimateBench NorESM2-LM targets (<2GB)

Reproduce linear pattern scaling (LPS) results on ClimateBench

Calculate LPS entry on ClimateBench scoreboard; plot LPS error map for tas, pr, dtr, pr90; and plot correlation between temperature and global cumulative CO2

The trained weights of the LPS model are also stored in runs/pattern_scaling/default/models/

Reproduce internal variability experiment

First code test: Train and evaluate CNN-LSTM on 50-member ensemble-mean Em-MPI data

(optional) export WANDB_MODE='offline' # Use if compute node does have internet access.

(optional) set config.yaml -> epochs=100 to train the CNN-LSTM and not just test if the training works.

Second code test: Train LPS and CNN-LSTM on single draws of subsets with 1,2,...,50 members. Then plot RMSE over number of realizations.

(Need to first edit paths in below configs)

Full experiment: Train LPS and CNN-LSTM (with multiple seeds) on multiple draws of subsets with 1,2,...,50 members using SLURM. Plot RMSE over number of realizations including uncertainty bars. Also, reports final scores on Em-MPI 50-member dataset.

Send CNN-LSTM off to supercomputer

Send pattern scaling to supercomputer

repeat for --datavar pr and --datavar tas

Reproduce the other figures in JAMES24 paper submission

notebooks/explorelinearrelationships.ipynb -> Plot functional relationships in cumlative CO2 emissions, surface temperature, and precipitation; also plot for multiple regions

notebooks/explorelocalinternal_variability.ipynb -> Plot internal variability in 3-member NorESM2-LM vs 50-member MPI-ESM1.2-LR ensemble-mean; also plot for multiple regions

notebooks/exploreprdistribution_mpi.ipynb -> Plot precipitation distributions to show they're not log-normally distributed

notebooks/energy_balance.ipynb -> Plot the bias-variance tradeoff experiment on the energy balance model

Reproduce the Em-MPI data summary from raw CMIP6 data.

Download raw MPI-ESM1.2-LR from CMIP6 data on ESGF (tested on svante)

Download inputs / forcers:

Download outputs / climate variables:

Delete years in piControl that are past np.datetime64. Maintain 400yrs.

Reprocess MPI-ESM1.2-LR raw data to get the interim Em-MPI data

Get ensemble_summary.nc that contains statistical summaries across members

Monthly variables, e.g., tas, pr

Compute Diurnal Temperature Range; takes ~ 16min per scenario for 30 realizations.

Compute extreme precipitation; takes 5hrs 30min for historical scenario and 50 realizations.

optional: Daily variables, e.g., pr, tasmax, tasmin

Get ensemble.nc that merges all years and ensemble members

Known issues

In case of import tensorflow error in wandblogger. $ pip install tensorflow

In case xr.openmfdataset(..., parallel=True) crashes: $ conda install --channel=conda-forge eccodes

In case Segmentation fault: try config.yaml -> "opendataparallel: False"

Reference

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels