climatereconstructionai

Software to train/evaluate models to reconstruct missing values in climate data (e.g., HadCRUT4) based on a U-Net with partial convolutions

https://github.com/freva-clint/climatereconstructionai

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
2 of 13 committers (15.4%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Software to train/evaluate models to reconstruct missing values in climate data (e.g., HadCRUT4) based on a U-Net with partial convolutions

Basic Info

Host: GitHub
Owner: FREVA-CLINT
License: bsd-3-clause
Language: Python
Default Branch: main
Homepage:
Size: 382 MB

Statistics

Stars: 87
Watchers: 8
Forks: 27
Open Issues: 1
Releases: 7

Created about 6 years ago · Last pushed 11 months ago

Metadata Files

Readme License

CRAI (Climate Reconstruction AI)

Software to train/evaluate models to reconstruct missing values in climate data (e.g., HadCRUT4) based on a U-Net with partial convolutions.

Dependencies

pytorch>=1.11.0
tqdm>=4.64.0
torchvision>=0.12.0
torchmetrics>=0.11.2
numpy>=1.21.6
matplotlib>=3.5.1
tensorboardX>=2.5
tensorboard>=2.9.0
xarray>=2022.3.0
dask>=2022.7.0
netcdf4>=1.5.8
setuptools==59.5.0
xesmf>=0.6.2
cartopy>=0.20.2
numba>=0.55.1

An Anaconda environment with all the required dependencies can be created using environment.yml: bash conda env create -f environment.yml To activate the environment, use: bash conda activate crai

environment-cuda.yml should be used when working with GPUs using CUDA.

Installation

climatereconstructionAI can be installed using pip in the current directory: bash pip install .

Usage

The software can be used to: - train a model (training) - infill climate datasets using a trained model (evaluation)

Input data

The directory containing the climate datasets should have the following sub-directories: - data and val for training - test for evaluation

The climate datasets should be in netCDF format and placed in the corresponding sub-directories.

The missing values can be defined separately as masks containing zeros (for the missing values) and ones (for the valid values). These masks should be in netCDF format and have the same dimension as the climate dataset. For the training, it is possible to shuffle the sequence of masks by using the "shuffle-masks" option.

A PyTorch model is required for the evaluation.

Execution

Once installed, the package can be used as: - a command line interface (CLI): - training: bash crai-train - evaluation: bash crai-evaluate - a Python library: - training: python from climatereconstructionai import train train() - evaluation: python from climatereconstructionai import evaluate evaluate()

For more information about the arguments: ```bash usage: crai-train [-h] [--data-root-dir DATAROOTDIR] [--mask-dir MASKDIR] [--log-dir LOGDIR] [--data-names DATANAMES] [--mask-names MASKNAMES] [--data-types DATATYPES] [--n-target-data NTARGETDATA] [--device DEVICE] [--shuffle-masks] [--channel-steps CHANNEL_STEPS] [--lstm-steps LSTMSTEPS] [--gru-steps GRU_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--conv-factor CONV_FACTOR] [--weights WEIGHTS] [--steady-masks STEADY_MASKS] [--loop-random-seed LOOPRANDOMSEED] [--cuda-random-seed CUDARANDOMSEED] [--deterministic] [--attention] [--channel-reduction-rate CHANNELREDUCTIONRATE] [--disable-skip-layers] [--disable-first-bn] [--masked-bn] [--lazy-load] [--global-padding] [--normalize-data] [--n-filters N_FILTERS] [--out-channels OUT_CHANNELS] [--dataset-name DATASET_NAME] [--min-bounds MIN_BOUNDS] [--max-bounds MAX_BOUNDS] [--profile] [--val-names VAL_NAMES] [--snapshot-dir SNAPSHOT_DIR] [--resume-iter RESUME_ITER] [--batch-size BATCH_SIZE] [--n-threads N_THREADS] [--multi-gpus] [--finetune] [--lr LR] [--lr-finetune LR_FINETUNE] [--max-iter MAX_ITER] [--log-interval LOG_INTERVAL] [--lr-scheduler-patience LRSCHEDULERPATIENCE] [--save-model-interval SAVEMODELINTERVAL] [--n-final-models NFINALMODELS] [--final-models-interval FINALMODELSINTERVAL] [--loss-criterion LOSS_CRITERION] [--eval-timesteps EVAL_TIMESTEPS] [-f LOADFROMFILE] [--vlim VLIM] [--lambda-loss LAMBDALOSS] [--val-metrics VALMETRICS] [--tensor-plots TENSORPLOTS] [--early-stopping-delta EARLYSTOPPING_DELTA] [--early-stopping-patience EARLYSTOPPING_PATIENCE] [--n-iters-val NITERS_VAL]

options: -h, --help show this help message and exit --data-root-dir DATAROOTDIR Root directory containing the climate datasets --mask-dir MASKDIR Directory containing the mask datasets --log-dir LOGDIR Directory where the log files will be stored --data-names DATANAMES Comma separated list of netCDF files (climate dataset) for training/infilling --mask-names MASKNAMES Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset --data-types DATATYPES Comma separated list of variable types, in the same order as data-names and mask-names --n-target-data NTARGETDATA Number of data-names (from last) to be used as target data --device DEVICE Device used by PyTorch (cuda or cpu) --shuffle-masks Select mask indices randomly --channel-steps CHANNELSTEPS Comma separated number of considered sequences for channeled memory:paststeps,futuresteps --lstm-steps LSTMSTEPS Comma separated number of considered sequences for lstm: paststeps,futuresteps --gru-steps GRUSTEPS Comma separated number of considered sequences for gru: paststeps,futuresteps --encoding-layers ENCODINGLAYERS Number of encoding layers in the CNN --pooling-layers POOLINGLAYERS Number of pooling layers in the CNN --conv-factor CONVFACTOR Number of channels in the deepest layer --weights WEIGHTS Initialization weight --steady-masks STEADYMASKS Comma separated list of netCDF files containing a single mask to be applied to all timesteps. The number of steady-masks must be the same as out-channels --loop-random-seed LOOPRANDOMSEED Random seed for iteration loop --cuda-random-seed CUDARANDOMSEED Random seed for CUDA --deterministic Disable cudnn backends for reproducibility --attention Enable the attention module --channel-reduction-rate CHANNELREDUCTIONRATE Channel reduction rate for the attention module --disable-skip-layers Disable the skip layers --disable-first-bn Disable the batch normalization on the first layer --masked-bn Use masked batch normalization instead of standard BN --lazy-load Use lazy loading for large datasets --global-padding Use a custom padding for global dataset --normalize-data Normalize the input climate data to 0 mean and 1 std --n-filters NFILTERS Number of filters for the first/last layer --out-channels OUTCHANNELS Number of channels for the output data --dataset-name DATASETNAME Name of the dataset for format checking --min-bounds MINBOUNDS Comma separated list of values defining the permitted lower-bound of output values --max-bounds MAXBOUNDS Comma separated list of values defining the permitted upper-bound of output values --profile Profile code using tensorboard profiler --val-names VALNAMES Comma separated list of netCDF files (climate dataset) for validation --snapshot-dir SNAPSHOTDIR Parent directory of the training checkpoints and the snapshot images --resume-iter RESUMEITER Iteration step from which the training will be resumed --batch-size BATCHSIZE Batch size --n-threads NTHREADS Number of workers used in the data loader --multi-gpus Use multiple GPUs, if any --finetune Enable the fine tuning mode (use fine tuning parameterization and disable batch normalization --lr LR Learning rate --lr-finetune LRFINETUNE Learning rate for fine tuning --max-iter MAXITER Maximum number of iterations --log-interval LOGINTERVAL Iteration step interval at which a tensorboard summary log should be written --lr-scheduler-patience LRSCHEDULERPATIENCE Patience for the lr scheduler --save-model-interval SAVEMODELINTERVAL Iteration step interval at which the model should be saved --n-final-models NFINALMODELS Number of final models to be saved --final-models-interval FINALMODELSINTERVAL Iteration step interval at which the final models should be saved --loss-criterion LOSSCRITERION Index defining the loss function (0=original from Liu et al., 1=MAE of the hole region) --eval-timesteps EVALTIMESTEPS Sample indices for which a snapshot is created at each iter defined by log-interval -f LOADFROMFILE, --load-from-file LOADFROMFILE Load all the arguments from a text file --vlim VLIM Comma separated list of vmin,vmax values for the color scale of the snapshot images --lambda-loss LAMBDALOSS Comma separated list of lambda factors (key) followed by their corresponding values.Overrides the losscriterion pre-setting --val-metrics VALMETRICS Comma separated list of metrics that are evaluated on the val dataset at log-interval --tensor-plots TENSORPLOTS Comma separated list of 2D plots to be added to tensorboard (error, distribution, correlation) --early-stopping-delta EARLYSTOPPINGDELTA Mean relative delta of the val loss used for the termination criterion --early-stopping-patience EARLYSTOPPINGPATIENCE Number of log-interval iterations used for the termination criterion --n-iters-val NITERS_VAL Number of batch iterations used to average the validation loss ```

```bash usage: crai-evaluate [-h] [--data-root-dir DATAROOTDIR] [--mask-dir MASKDIR] [--log-dir LOG_DIR] [--data-names DATA_NAMES] [--mask-names MASK_NAMES] [--data-types DATA_TYPES] [--n-target-data NTARGETDATA] [--device DEVICE] [--shuffle-masks] [--channel-steps CHANNEL_STEPS] [--lstm-steps LSTM_STEPS] [--gru-steps GRU_STEPS] [--encoding-layers ENCODING_LAYERS] [--pooling-layers POOLING_LAYERS] [--conv-factor CONV_FACTOR] [--weights WEIGHTS] [--steady-masks STEADY_MASKS] [--loop-random-seed LOOPRANDOMSEED] [--cuda-random-seed CUDARANDOMSEED] [--deterministic] [--attention] [--channel-reduction-rate CHANNELREDUCTIONRATE] [--disable-skip-layers] [--disable-first-bn] [--masked-bn] [--lazy-load] [--global-padding] [--normalize-data] [--n-filters N_FILTERS] [--out-channels OUT_CHANNELS] [--dataset-name DATASET_NAME] [--min-bounds MIN_BOUNDS] [--max-bounds MAX_BOUNDS] [--profile] [--model-dir MODEL_DIR] [--model-names MODEL_NAMES] [--evaluation-dirs EVALUATION_DIRS] [--eval-names EVAL_NAMES] [--use-train-stats] [--create-graph] [--plot-results PLOT_RESULTS] [--partitions PARTITIONS] [--maxmem MAXMEM] [--split-outputs] [-f LOADFROM_FILE]

options: -h, --help show this help message and exit --data-root-dir DATAROOTDIR Root directory containing the climate datasets --mask-dir MASKDIR Directory containing the mask datasets --log-dir LOGDIR Directory where the log files will be stored --data-names DATANAMES Comma separated list of netCDF files (climate dataset) for training/infilling --mask-names MASKNAMES Comma separated list of netCDF files (mask dataset). If None, it extracts the masks from the climate dataset --data-types DATATYPES Comma separated list of variable types, in the same order as data-names and mask-names --n-target-data NTARGETDATA Number of data-names (from last) to be used as target data --device DEVICE Device used by PyTorch (cuda or cpu) --shuffle-masks Select mask indices randomly --channel-steps CHANNELSTEPS Comma separated number of considered sequences for channeled memory:paststeps,futuresteps --lstm-steps LSTMSTEPS Comma separated number of considered sequences for lstm: paststeps,futuresteps --gru-steps GRUSTEPS Comma separated number of considered sequences for gru: paststeps,futuresteps --encoding-layers ENCODINGLAYERS Number of encoding layers in the CNN --pooling-layers POOLINGLAYERS Number of pooling layers in the CNN --conv-factor CONVFACTOR Number of channels in the deepest layer --weights WEIGHTS Initialization weight --steady-masks STEADYMASKS Comma separated list of netCDF files containing a single mask to be applied to all timesteps. The number of steady-masks must be the same as out-channels --loop-random-seed LOOPRANDOMSEED Random seed for iteration loop --cuda-random-seed CUDARANDOMSEED Random seed for CUDA --deterministic Disable cudnn backends for reproducibility --attention Enable the attention module --channel-reduction-rate CHANNELREDUCTIONRATE Channel reduction rate for the attention module --disable-skip-layers Disable the skip layers --disable-first-bn Disable the batch normalization on the first layer --masked-bn Use masked batch normalization instead of standard BN --lazy-load Use lazy loading for large datasets --global-padding Use a custom padding for global dataset --normalize-data Normalize the input climate data to 0 mean and 1 std --n-filters NFILTERS Number of filters for the first/last layer --out-channels OUTCHANNELS Number of channels for the output data --dataset-name DATASETNAME Name of the dataset for format checking --min-bounds MINBOUNDS Comma separated list of values defining the permitted lower-bound of output values --max-bounds MAXBOUNDS Comma separated list of values defining the permitted upper-bound of output values --profile Profile code using tensorboard profiler --model-dir MODELDIR Directory of the trained models --model-names MODELNAMES Model names --evaluation-dirs EVALUATIONDIRS Directory where the output files will be stored --eval-names EVALNAMES Prefix used for the output filenames --use-train-stats Use mean and std from training data for normalization --create-graph Create a Tensorboard graph of the NN --plot-results PLOTRESULTS Create plot images of the results for the comma separated list of time indices --partitions PARTITIONS Split the climate dataset into several partitions along the time coordinate --maxmem MAXMEM Maximum available memory in MB (overwrite partitions parameter) --split-outputs Do not merge the outputs when using multiple models and/or partitions -f LOADFROMFILE, --load-from-file LOADFROMFILE Load all the arguments from a text file ```

Example

An example can be found in the directory demo. The instructions to run the example are given in the README.md file.

License

CRAI is licensed under the terms of the BSD 3-Clause license.

Contributions

CRAI is maintained by the Climate Informatics and Technology group at DKRZ (Deutsches Klimarechenzentrum). - Previous contributing authors: Naoto Inoue, Christopher Kadow, Stephan Seitz - Current contributing authors: Johannes Meuer, Maximilian Witte, Étienne Plésiat.

Owner

Name: Climate Informatics and Technologies (CLINT)
Login: FREVA-CLINT
Kind: organization
Location: Germany

Repositories: 7
Profile: https://github.com/FREVA-CLINT

GitHub Events

Total

Issues event: 5
Watch event: 28
Issue comment event: 8
Push event: 25
Fork event: 7
Create event: 1

Last Year

Issues event: 5
Watch event: 28
Issue comment event: 8
Push event: 25
Fork event: 7
Create event: 1

Committers

Last synced: 10 months ago

All Time

Total Commits: 1,056
Total Committers: 13
Avg Commits per committer: 81.231
Development Distribution Score (DDS): 0.241

Past Year

Commits: 1
Committers: 1
Avg Commits per committer: 1.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Johannes Meuer	j**r@g**m	802
Étienne Plésiat	p**t@d**e	196
Naoto Inoue	k**4@g**m	20
Naoto Inoue	i**e@h**p	17
Maximilian Witte	m**e@g**e	10
Johannes Meuer	k**3@m**e	3
Stephan Seitz	s**z@f**e	2
Christopher Kadow	c**w@m**e	1
Maximilian Witte	w**e@d**e	1
Maximilian Witte	k**4@l**e	1
Johannes Meuer	k**3@m**e	1
Etienne Plésiat	k**9@l**l	1
Christopher Kadow	b**1@m**e	1

Committer Domains (Top 20 + Academic)

dkrz.de: 2 miklip5.hpc.dkrz.de: 1 miklip3.hpc.dkrz.de: 1 levante1.lvt.dkrz.de: 1 met.fu-berlin.de: 1 fau.de: 1 miklip4.hpc.dkrz.de: 1 gmx.de: 1 hal.t.u-tokyo.ac.jp: 1

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 4
Total pull requests: 27
Average time to close issues: about 2 months
Average time to close pull requests: about 1 month
Total issue authors: 4
Total pull request authors: 4
Average comments per issue: 2.75
Average comments per pull request: 0.26
Merged pull requests: 18
Bot issues: 0
Bot pull requests: 6

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: about 1 month
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

zihoosoon (1)
clhinrichs (1)
zazass8 (1)
Lzy-song (1)

Pull Request Authors

eplesiat (13)
johannesmeuer (8)
dependabot[bot] (6)
faxmitte (3)

Top Labels

Issue Labels

stale (1)

Pull Request Labels

dependencies (6)

Packages

Total packages: 2
Total downloads: unknown

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 0
(may contain duplicates)
Total versions: 10

proxy.golang.org: github.com/FREVA-CLINT/climatereconstructionAI

Documentation: https://pkg.go.dev/github.com/FREVA-CLINT/climatereconstructionAI#section-documentation
License: bsd-3-clause
Latest release: v1.0.4
published almost 2 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 10 months ago

proxy.golang.org: github.com/freva-clint/climatereconstructionai

Documentation: https://pkg.go.dev/github.com/freva-clint/climatereconstructionai#section-documentation
License: bsd-3-clause
Latest release: v1.0.4
published almost 2 years ago

Versions: 5
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 10 months ago

Dependencies

pyproject.toml pypi

cartopy >=0.20.2
matplotlib >= 3.4.3
netcdf4 >=1.5.8
numba >=0.55.1
numpy >= 1.20.1
python >= 3.7
setuptools ==59.5.0
tensorboard >=2.8.0
tensorboardX >= 2.4.0
torch >= 1.8.0
torchvision >= 0.2.1
tqdm >= 4.59.0
xarray >= 0.20.2
xesmf >=0.6.2

requirements.txt pypi

cartopy >=0.20.2
matplotlib >=3.4.3
netcdf4 >=1.5.8
numba >=0.55.1
numpy >=1.20.1
pytorch >=1.8.0
setuptools ==59.5.0
tensorboard >=2.8.0
tensorboardX >=2.4.0
torchvision >=0.2.1
tqdm >=4.59.0
xarray >=0.20.2
xesmf >=0.6.2

.github/workflows/main.yml actions

actions/checkout v2 composite
s-weigand/setup-conda v1 composite

setup.py pypi

environment.yml pypi