icenet-pipeline

The icenet-pipeline repository illustrates operational execution of the IceNet model

https://github.com/icenet-ai/icenet-pipeline

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

The icenet-pipeline repository illustrates operational execution of the IceNet model

Basic Info
  • Host: GitHub
  • Owner: icenet-ai
  • License: mit
  • Language: Shell
  • Default Branch: main
  • Homepage:
  • Size: 3.37 MB
Statistics
  • Stars: 3
  • Watchers: 5
  • Forks: 5
  • Open Issues: 18
  • Releases: 0
Created over 4 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

icenet-pipeline

Pipelining tools for operational execution of the IceNet model

Overview

The structure of this repository is to provide CLI commands that allow you to run the icenet model end-to-end, allowing you to make daily sea ice predictions.

Get the repositories

Please note this repository is tagged to corresponding icenet versions: if you want to get a particular tag add --branch vM.M.R to the clone command.

bash git clone git@github.com:icenet-ai/icenet-pipeline.git green ln -s green pipeline

Creating the environment

In spite of using the latest conda, the following may not work due to ongoing issues with the solver not failing / logging clearly. 1

Using conda

Conda can be used to manage system dependencies for HPC usage, we've tested on the BAS and JASMIN (NERC) HPCs. Obviously your dependencies for conda will change based on what is in your system, so please treat this as illustrative.

```bash cd pipeline conda env create -n icenet -f environment.yml conda activate icenet

Environment specifics

BAS HPC just continue

For JASMIN you'll be missing some things

module load jaspy/3.8 conda install -c conda-forge geos proj

For your own HPC, who knows... the HPC specific instructions are very

changeable even for those tested, so please adapt as required. ;)

Additional linkage instructions for Tensorflow GPU usage BEFORE ICENET

mkdir -p $CONDAPREFIX/etc/conda/activate.d echo 'export LDLIBRARYPATH=$LDLIBRARYPATH:$CONDAPREFIX/lib/' > $CONDAPREFIX/etc/conda/activate.d/envvars.sh chmod +x $CONDAPREFIX/etc/conda/activate.d/envvars.sh . $CONDAPREFIX/etc/conda/activate.d/envvars.sh ```

IceNet installation

Then install IceNet into your environment as applicable. If using conda obviously enable the environment first.

Bear in mind when installing icenet (and by dependency tensorflow) you will need to be on a CUDA/GPU enabled machine for binary linkage. As per current (end of 2022) tensorflow guidance do not install it via conda.

Developer installation

Using -e is optional, based on whether you want to be able to hack at the source!

bash cd ../icenet # or wherever you've cloned icenet pip install -e .

PyPI installation

If you don't want the source locally, you can now install via PyPI...

bash pip install icenet

Linking data folders

The system is set up to process data in certain directories. With each pipeline installation you can share the source data if you like, so use symlinks for data if applicable, and intermediate folders processed and network_datasets you might want to store on alternate storage as applicable.

In my normal setup, I run several pipelines each with one source data store:

```bash

From inside the icenet-pipeline cloned directory, assuming target exists!

ln -s ../data ```

The following kind of illustrates this for linking big stuff to different storage:

```bash

An example from deployment on JASMIN

ln -s /gws/nopw/j04/icenet/data mkdir /gws/nopw/j04/icenet/networkdatasets mkdir /gws/nopw/j04/icenet/processed ln -s /gws/nopw/j04/icenet/networkdatasets ln -s /gws/nopw/j04/icenet/processed ```

Example run of the pipeline

A note on HPCs

The pipeline is often run on SLURM. Previously the SBATCH headers for submission were included, but to avoid issues with portability these have now been removed and the instructions now exemplify running against this type of HPC with the setup passed on the command line rather than in hardcoded headers.

If you're not using SLURM, just run the commands without sbatch. To use an alternative just amend sbatch to whatever you need.

Configuration

This pipeline revolves around the ENVS file to provide the necessary configuration items. This can easily be derived from the ENVS.example file to a new file, then symbolically linked. Comments are available in ENVS.example to assist you with the editing process.

```bash cp ENVS.example ENVS.myconfig ln -sf ENVS.myconfig ENVS

Edit ENVS.myconfig to customise parameters for the pipeline

```

These variables will then be picked up during the runs via the ENVS symlink.

Running the training pipeline

This is a very high level overview, for a more detailed run-through please review the icenet-notebooks repository.

Running prediction commands from preprepared models

This might be the best starting use case if you want to build intuition about the pipeline facilities using someone elses models!

The shell you're using should be bash

```bash

Take a git clone of the pipeline

$ git clone git@github.com:icenet-ai/icenet-pipeline.git anewenv $ cd anewenv $ conda activate icenet

We identify a pipeline we want to link to

$ ls -d /data/hpcdata/users/jambyr/icenet/pipeline /data/hpcdata/users/jambyr/icenet/pipeline

Copy the environment variable file that was used for training

$ cp -v /data/hpcdata/users/jambyr/icenet/pipeline/ENVS.bas.exp23 . ‘/data/hpcdata/users/jambyr/icenet/pipeline/ENVS.bas.exp23’ -> ‘./ENVS.bas.exp23’

Repoint your ENVS to the training ENVS file you want to predict against

$ unlink ENVS $ ln -sf ENVS.bas.exp23 ENVS $ ls -l ENVS lrwxrwxrwx 1 [[REDACTED]] [[REDACTED]] 9 Feb 10 11:48 ENVS -> ENVS.bas.exp23

These can also be modified in ENVS

$ export ICENETCONDA=$CONDAPREFIX $ export ICENET_HOME=realpath .

Links to my source data store

ln -s /data/hpcdata/users/jambyr/icenet/pipeline/data

Ensures we have a data loader store directory for the pipeline

mkdir processed

Links to the training data loader store from the other pipeline

ln -s /data/hpcdata/users/jambyr/icenet/pipeline/processed/exp23_south processed/

Make sure the networks directory exists

mkdir -p results/networks

Links to the network trained in the other pipeline

ln -s /data/hpcdata/users/jambyr/icenet/pipeline/results/networks/atmos23_south results/networks/ ```

And now you can look at running prediction commands against somebody elses networks

One off: preparing SIC masks

As an additional dataset, IceNet relies on some masks being pre-prepared, so you only have to do this on first run against the data store.

bash conda activate icenet icenet_data_masks north icenet_data_masks south

Running training and prediction commands afresh

Change PREFIX to the setup you want to run through in ENVS

```bash source ENVS

SBATCHARGS="$ICENETSLURMARGS $ICENETSLURMDATAPART" sbatch $SBATCHARGS rundata.sh north $BATCH_SIZE $WORKERS

SBATCHARGS="$ICENETSLURMARGS $ICENETSLURMRUNPART" ./runtrainensemble.sh \ -b $BATCHSIZE -e 200 -f $FILTERFACTOR -p $PREPSCRIPT -q 4 \ ${TRAINDATANAME}${HEMI} ${TRAINDATANAME}${HEMI} mydemo${HEMI}

./loadertestdates.sh ${TRAINDATANAME}north >testdates.north.csv

./runpredictensemble.sh -f $FILTERFACTOR -p $PREPSCRIPT \ mydemonorth forecast aforecast test_dates.north.csv ```

Other helper commands

The following commands are illustrative of various workflows built on top of, or alongside, the workflow described above. These are useful to use independently or to base your own workflows on.

runforecastplots.sh

This leverages the IceNet plotting functionality to analyse the specified forecasts.

run_prediction.sh

This command wraps up the preparation of data and running of predictions against pre-trained networks. This contrasts to the use of the test set to run predictions that was demonstrated previously.

This command makes assumptions that source data is available for the OSI-SAF, ERA5 and ORAS5 datasets for the predictions you want to make. Use icenet_data_sic, icenet_data_era5 and icenet_data_oras5 respectively. This workflow is also easily adapted to other datasets, wink wink nudge nudge.

If you haven't already installed it, install the model-ensembler package which will work out the generation of ensemble models:

bash pip install model-ensembler

The process for running predictions is then basically:

```bash

These lines are required if not set within the ENVS file

export DEMOTESTSTART="2021-10-01" export DEMOTESTEND="$DEMOTESTSTART"

./runprediction.sh demotest modelname hemi demotest traindataname

Optionally, stick it into azure too, provided you're set up for it

icenetuploadazure -v -o results/predict/demotest.nc $DEMOTEST_START ```

as an example, to generate a training run based on the atmos23south trained model shown above (assuming you have already seeded your data store using icenetdata_* commands):

```bash export DEMOTESTSTART="2024-01-01"

export DEMOTESTEND=$DEMOTESTSTART

./runprediction.sh demoforecast atmos23south south demotest ```

Implementing and changing environments

The point of having a repository like this is to facilitate easy integration with workflow managers, as well as allow multiple pipelines to easily be co-located in the filesystem. To achieve this have a location that contains your environments and sources, for example:

``` cd hpc/icenet ls -d1 * blue data green pipeline scratch test

pipeline -> green

Optionally you might have local sources for installs (e.g. not pip installed)

icenet.blue
icenet.green
```

Change the location of the pipeline from green to blue

```bash TARGET=blue

ln -sfn $TARGET pipeline

If using a branch, go into icenet.blue and pull / checkout as required, e.g.

cd icenet.blue git pull git checkout my-feature-branch cd ..

Next update the conda environment, which will be specific to your local disk

ln -sfn $HOME/hpc/miniconda3/envs/icenet-$TARGET $HOME/hpc/miniconda3/envs/icenet cd pipeline git pull

Update the environment

conda env update -n icenet -f environment.yml conda activate icenet pip install --upgrade -r requirements-pip.txt pip install -e ../icenet.$TARGET ```

Credits

  • Tom Andersson - Lead researcher
  • James Byrne - Research Software Engineer
  • Scott Hosking - PI

License

The template_* files are not to be considered with respect to the icenet-pipeline repository, they're used in publishing forecasts!

Please see LICENSE file for license information!

Owner

  • Name: icenet-ai
  • Login: icenet-ai
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.2.0
title: icenet-pipeline
message: "If you use this software, please cite it as below."
type: software
authors:
  - given-names: James
    family-names: Byrne
    email: jambyr@bas.ac.uk
    affiliation: British Antarctic Survey
    orcid: "https://orcid.org/0000-0003-3731-2377"
  - given-names: Bryn Noel
    family-names: Ubald
    email: bryald@bas.ac.uk
    affiliation: British Antarctic Survey
    orcid: "https://orcid.org/0000-0002-0206-7140"
  - given-names: Ryan
    family-names: Chan
    email: rchan@turing.ac.uk
    affiliation: The Alan Turing Institute
repository-code: "https://github.com/icenet-ai/icenet-pipeline"
url: "https://icenet.ai/"
repository: "https://github.com/icenet-ai/icenet"
abstract: >-
  icenet-pipeline is a repository containing tools that
  enables operational execution of the IceNet probabilistic
  deep-learning library for sea-ice forecasting via a
  Command Line Interface (CLI). It is an end-to-end pipeline
  that enables the generation of forecast outputs.
keywords:
  - sea-ice
  - pipeline
  - forecast
  - machine learning
  - cryosphere
  - antarctic
  - arctic
  - ice
  - deep learning
license: MIT
version: "v0.2.9"

GitHub Events

Total
  • Issues event: 8
  • Issue comment event: 7
  • Push event: 7
  • Pull request event: 9
  • Create event: 1
Last Year
  • Issues event: 8
  • Issue comment event: 7
  • Push event: 7
  • Pull request event: 9
  • Create event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 5
  • Total pull requests: 5
  • Average time to close issues: 5 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.8
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 5
  • Average time to close issues: 21 days
  • Average time to close pull requests: about 2 months
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 0.75
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bnubald (12)
  • JimCircadian (5)
  • thomaszwagerman (1)
  • matscorse (1)
Pull Request Authors
  • bnubald (13)
  • JimCircadian (4)
Top Labels
Issue Labels
enhancement (9) bug (6) documentation (1)
Pull Request Labels
enhancement (4) bug (3)

Dependencies

environment.yml pypi