https://github.com/blutjens/climate-emulator
A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords
Repository
A comparison of linear regression vs. deep learning for emulating climate models in the presence of internal variability. (public)
Basic Info
Statistics
- Stars: 16
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Topics
Metadata Files
README.md
climate-emulator
Official repository for the paper 'The impact of internal variability on benchmarking deep learning climate emulators'. This repository computes the ClimateBenchv1.0 scores for linear pattern scaling. Then, we compare linear pattern scaling with a CNN-LSTM using our data summary from the MPI-ESM1.2-LR model that has more realizations. The repository contains all code to download data and reproduce the paper results.
For a tutorial on how to use pattern scaling for climate emulation, please see: climateemulatortutorial.ipynb
Installation
git clone git@github.com:blutjens/climate-emulator.git
cd climate-emulator
conda create --name emcli
conda activate emcli
conda install pip
pip install -r requirements.txt
pip install -e .
ipython kernel install --user --name=emcli # Link conda environment to jupyter notebook
Download Em-MPI data summary (<10GB)
``` export DATADIR=/path/to/data/dir mkdir -p $DATADIR python downloademmpi.py --datadir $DATA_DIR
alternatively, follow instructions at https://huggingface.co/datasets/blutjens/em-mpi
```
Download input4mips emission inputs and ClimateBench NorESM2-LM targets (<2GB)
export PATH_CLIMATEBENCH_DATA=$DATA_DIR/data/raw/climatebench/
mkdir -p $PATH_CLIMATEBENCH_DATA
wget https://zenodo.org/record/7064308/files/train_val.tar.gz -P $PATH_CLIMATEBENCH_DATA
tar -xvf "$PATH_CLIMATEBENCH_DATA/train_val.tar.gz" -C $PATH_CLIMATEBENCH_DATA
rm $PATH_CLIMATEBENCH_DATA/train_val.tar.gz
wget https://zenodo.org/record/7064308/files/test.tar.gz -P $PATH_CLIMATEBENCH_DATA
tar -xvf "$PATH_CLIMATEBENCH_DATA/test.tar.gz" -C $PATH_CLIMATEBENCH_DATA
rm $PATH_CLIMATEBENCH_DATA/test.tar.gz
Reproduce linear pattern scaling (LPS) results on ClimateBench
```
Calculate LPS entry on ClimateBench scoreboard; plot LPS error map for tas, pr, dtr, pr90; and plot correlation between temperature and global cumulative CO2
jupyter notebook notebooks/calculateclimatebenchmetrics.ipynb
The trained weights of the LPS model are also stored in runs/pattern_scaling/default/models/
```
Reproduce internal variability experiment
First code test: Train and evaluate CNN-LSTM on 50-member ensemble-mean Em-MPI data
``` wandb login # call in a terminal with internet access. export TFGPUALLOCATOR=cudamallocasync export KERAS_BACKEND=torch
(optional) export WANDB_MODE='offline' # Use if compute node does have internet access.
vim runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml # Then edit paths to point to /path/to/data/dir python emcli2/models/cnnlstm/train.py --cfgpath 'runs/cnnlstm/mpi-esm1-2-lr/default/config/config.yaml' --data_var 'pr' --verbose
(optional) set config.yaml -> epochs=100 to train the CNN-LSTM and not just test if the training works.
```
Second code test: Train LPS and CNN-LSTM on single draws of subsets with 1,2,...,50 members. Then plot RMSE over number of realizations.
```
(Need to first edit paths in below configs)
python emcli2/models/cnnlstm/train.py --trainmmembersubsets --cfgpath runs/cnnlstm/mpi-esm1-2-lr/mmembersubsetswithm50evalonallspcmpdwpmanyr/config/config.yaml --data_var pr
python emcli2/models/patternscaling/model.py --trainmmembersubsets --cfgpath runs/patternscaling/mpi-esm1-2-lr/mmembersubsetswithm50replaceFalseevalonallmanyr/config/config.yaml --data_var pr
python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr ```
Full experiment: Train LPS and CNN-LSTM (with multiple seeds) on multiple draws of subsets with 1,2,...,50 members using SLURM. Plot RMSE over number of realizations including uncertainty bars. Also, reports final scores on Em-MPI 50-member dataset.
```
Send CNN-LSTM off to supercomputer
sbatch train.sh
Send pattern scaling to supercomputer
sbatch trainpatternscaling.sh
python emcli2/utils/plotting.py --plotmmembersubsetsexperiment --data_var pr
repeat for --datavar pr and --datavar tas
```
Reproduce the other figures in JAMES24 paper submission
```
notebooks/explorelinearrelationships.ipynb -> Plot functional relationships in cumlative CO2 emissions, surface temperature, and precipitation; also plot for multiple regions
notebooks/explorelocalinternal_variability.ipynb -> Plot internal variability in 3-member NorESM2-LM vs 50-member MPI-ESM1.2-LR ensemble-mean; also plot for multiple regions
notebooks/exploreprdistribution_mpi.ipynb -> Plot precipitation distributions to show they're not log-normally distributed
notebooks/energy_balance.ipynb -> Plot the bias-variance tradeoff experiment on the energy balance model
```
Reproduce the Em-MPI data summary from raw CMIP6 data.
Download raw MPI-ESM1.2-LR from CMIP6 data on ESGF (tested on svante)
``` cd ClimateSet
Download inputs / forcers:
python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/core_dataset.yaml
Download outputs / climate variables:
python -m databuilding.builders.downloader --cfg databuilding/configs/downloader/mpi-esm1-2-lr.yaml
Delete years in piControl that are past np.datetime64. Maintain 400yrs.
cd /d0/lutjens/raw/CMIP6/MPI-ESM1-2-LR/ ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km//@(day|mon)/@(23|24|25|26|27|28|29*) # replace ls w. rm -r ls r1i1p1f1/piControl/@(tas|pr|uas|vas|psl|huss|tasmax|tasmin)/250km/@(day|mon)/@(225|226|227|228|229*) # replace ls w. rm -r ```
Reprocess MPI-ESM1.2-LR raw data to get the interim Em-MPI data
``` cd ~/climate-emulator
Get ensemble_summary.nc that contains statistical summaries across members
Monthly variables, e.g., tas, pr
python emcli2/dataset/mpiesm12lr.py --datavar 'huss' --getensemblesummaries
Compute Diurnal Temperature Range; takes ~ 16min per scenario for 30 realizations.
python emcli2/dataset/mpiesm12lr.py --datavar 'dtr' --getensemblesummaries
Compute extreme precipitation; takes 5hrs 30min for historical scenario and 50 realizations.
python emcli2/dataset/mpiesm12lr.py --datavar 'pr90' --getensemblesummaries
optional: Daily variables, e.g., pr, tasmax, tasmin
python emcli2/dataset/mpiesm12lr.py --datavar 'tasmax' --getensemblesummaries --frequency 'day'
Get ensemble.nc that merges all years and ensemble members
python emcli2/dataset/mpiesm12lr.py --datavar 'tas' --getensembleconcat ```
Known issues
```
In case of import tensorflow error in wandblogger. $ pip install tensorflow
In case xr.openmfdataset(..., parallel=True) crashes: $ conda install --channel=conda-forge eccodes
In case Segmentation fault: try config.yaml -> "opendataparallel: False"
```
Reference
If this repository is useful for your analysis please consider citing:
@article{lutjens24internalvar,
title={The impact of internal variability on benchmarking deep learning climate emulators},
author={Björn Lütjens and Raffaele Ferrari and Duncan Watson-Parris and Noelle Selin},
year={2024},
eprint={2408.05288},
archivePrefix={arXiv},
url={https://arxiv.org/abs/2408.05288},
}
Owner
- Name: Björn Lütjens (he/him)
- Login: blutjens
- Kind: user
- Company: MIT
- Website: https://blutjens.github.io/
- Twitter: bjornlutjens
- Repositories: 31
- Profile: https://github.com/blutjens
Postdoctoral Associate in tackling climate change with AI @ MIT. Project overview at https://blutjens.github.io/
GitHub Events
Total
- Release event: 1
- Watch event: 11
- Delete event: 1
- Push event: 4
- Pull request event: 2
- Create event: 1
Last Year
- Release event: 1
- Watch event: 11
- Delete event: 1
- Push event: 4
- Pull request event: 2
- Create event: 1
Issues and Pull Requests
Last synced: over 1 year ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- blutjens (1)