icenet-pipeline
The icenet-pipeline repository illustrates operational execution of the IceNet model
Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.1%) to scientific vocabulary
Repository
The icenet-pipeline repository illustrates operational execution of the IceNet model
Basic Info
Statistics
- Stars: 3
- Watchers: 5
- Forks: 5
- Open Issues: 18
- Releases: 0
Metadata Files
README.md
icenet-pipeline
Pipelining tools for operational execution of the IceNet model
Overview
The structure of this repository is to provide CLI commands that allow you to run the icenet model end-to-end, allowing you to make daily sea ice predictions.
Get the repositories
Please note this repository is tagged to corresponding icenet versions: if
you want to get a particular tag add --branch vM.M.R to the clone command.
bash
git clone git@github.com:icenet-ai/icenet-pipeline.git green
ln -s green pipeline
Creating the environment
In spite of using the latest conda, the following may not work due to ongoing issues with the solver not failing / logging clearly. 1
Using conda
Conda can be used to manage system dependencies for HPC usage, we've tested on the BAS and JASMIN (NERC) HPCs. Obviously your dependencies for conda will change based on what is in your system, so please treat this as illustrative.
```bash cd pipeline conda env create -n icenet -f environment.yml conda activate icenet
Environment specifics
BAS HPC just continue
For JASMIN you'll be missing some things
module load jaspy/3.8 conda install -c conda-forge geos proj
For your own HPC, who knows... the HPC specific instructions are very
changeable even for those tested, so please adapt as required. ;)
Additional linkage instructions for Tensorflow GPU usage BEFORE ICENET
mkdir -p $CONDAPREFIX/etc/conda/activate.d echo 'export LDLIBRARYPATH=$LDLIBRARYPATH:$CONDAPREFIX/lib/' > $CONDAPREFIX/etc/conda/activate.d/envvars.sh chmod +x $CONDAPREFIX/etc/conda/activate.d/envvars.sh . $CONDAPREFIX/etc/conda/activate.d/envvars.sh ```
IceNet installation
Then install IceNet into your environment as applicable. If using conda obviously enable the environment first.
Bear in mind when installing icenet (and by dependency tensorflow) you
will need to be on a CUDA/GPU enabled machine for binary linkage. As per
current (end of 2022) tensorflow guidance do not install it via conda.
Developer installation
Using -e is optional, based on whether you want to be able to hack at the
source!
bash
cd ../icenet # or wherever you've cloned icenet
pip install -e .
PyPI installation
If you don't want the source locally, you can now install via PyPI...
bash
pip install icenet
Linking data folders
The system is set up to process data in certain directories. With each pipeline
installation you can share the source data if you like, so use symlinks for
data if applicable, and intermediate folders processed and
network_datasets you might want to store on alternate storage as applicable.
In my normal setup, I run several pipelines each with one source data store:
```bash
From inside the icenet-pipeline cloned directory, assuming target exists!
ln -s ../data ```
The following kind of illustrates this for linking big stuff to different storage:
```bash
An example from deployment on JASMIN
ln -s /gws/nopw/j04/icenet/data mkdir /gws/nopw/j04/icenet/networkdatasets mkdir /gws/nopw/j04/icenet/processed ln -s /gws/nopw/j04/icenet/networkdatasets ln -s /gws/nopw/j04/icenet/processed ```
Example run of the pipeline
A note on HPCs
The pipeline is often run on SLURM. Previously the SBATCH headers for submission were included, but to avoid issues with portability these have now been removed and the instructions now exemplify running against this type of HPC with the setup passed on the command line rather than in hardcoded headers.
If you're not using SLURM, just run the commands without sbatch. To use an alternative just amend sbatch to whatever you need.
Configuration
This pipeline revolves around the ENVS file to provide the necessary
configuration items. This can easily be derived from the ENVS.example file
to a new file, then symbolically linked. Comments are available in
ENVS.example to assist you with the editing process.
```bash cp ENVS.example ENVS.myconfig ln -sf ENVS.myconfig ENVS
Edit ENVS.myconfig to customise parameters for the pipeline
```
These variables will then be picked up during the runs via the ENVS symlink.
Running the training pipeline
Running prediction commands from preprepared models
This might be the best starting use case if you want to build intuition about the pipeline facilities using someone elses models!
The shell you're using should be bash
```bash
Take a git clone of the pipeline
$ git clone git@github.com:icenet-ai/icenet-pipeline.git anewenv $ cd anewenv $ conda activate icenet
We identify a pipeline we want to link to
$ ls -d /data/hpcdata/users/jambyr/icenet/pipeline /data/hpcdata/users/jambyr/icenet/pipeline
Copy the environment variable file that was used for training
$ cp -v /data/hpcdata/users/jambyr/icenet/pipeline/ENVS.bas.exp23 . ‘/data/hpcdata/users/jambyr/icenet/pipeline/ENVS.bas.exp23’ -> ‘./ENVS.bas.exp23’
Repoint your ENVS to the training ENVS file you want to predict against
$ unlink ENVS $ ln -sf ENVS.bas.exp23 ENVS $ ls -l ENVS lrwxrwxrwx 1 [[REDACTED]] [[REDACTED]] 9 Feb 10 11:48 ENVS -> ENVS.bas.exp23
These can also be modified in ENVS
$ export ICENETCONDA=$CONDAPREFIX
$ export ICENET_HOME=realpath .
Links to my source data store
ln -s /data/hpcdata/users/jambyr/icenet/pipeline/data
Ensures we have a data loader store directory for the pipeline
mkdir processed
Links to the training data loader store from the other pipeline
ln -s /data/hpcdata/users/jambyr/icenet/pipeline/processed/exp23_south processed/
Make sure the networks directory exists
mkdir -p results/networks
Links to the network trained in the other pipeline
ln -s /data/hpcdata/users/jambyr/icenet/pipeline/results/networks/atmos23_south results/networks/ ```
And now you can look at running prediction commands against somebody elses networks
One off: preparing SIC masks
As an additional dataset, IceNet relies on some masks being pre-prepared, so you only have to do this on first run against the data store.
bash
conda activate icenet
icenet_data_masks north
icenet_data_masks south
Running training and prediction commands afresh
Change PREFIX to the setup you want to run through in ENVS
```bash source ENVS
SBATCHARGS="$ICENETSLURMARGS $ICENETSLURMDATAPART" sbatch $SBATCHARGS rundata.sh north $BATCH_SIZE $WORKERS
SBATCHARGS="$ICENETSLURMARGS $ICENETSLURMRUNPART" ./runtrainensemble.sh \ -b $BATCHSIZE -e 200 -f $FILTERFACTOR -p $PREPSCRIPT -q 4 \ ${TRAINDATANAME}${HEMI} ${TRAINDATANAME}${HEMI} mydemo${HEMI}
./loadertestdates.sh ${TRAINDATANAME}north >testdates.north.csv
./runpredictensemble.sh -f $FILTERFACTOR -p $PREPSCRIPT \ mydemonorth forecast aforecast test_dates.north.csv ```
Other helper commands
The following commands are illustrative of various workflows built on top of, or alongside, the workflow described above. These are useful to use independently or to base your own workflows on.
runforecastplots.sh
This leverages the IceNet plotting functionality to analyse the specified forecasts.
run_prediction.sh
This command wraps up the preparation of data and running of predictions against pre-trained networks. This contrasts to the use of the test set to run predictions that was demonstrated previously.
This command makes assumptions that source data is available for the OSI-SAF,
ERA5 and ORAS5 datasets for the predictions you want to make. Use
icenet_data_sic, icenet_data_era5 and icenet_data_oras5 respectively.
This workflow is also easily adapted to other datasets, wink wink nudge nudge.
If you haven't already installed it, install the model-ensembler package
which will work out the generation of ensemble models:
bash
pip install model-ensembler
The process for running predictions is then basically:
```bash
These lines are required if not set within the ENVS file
export DEMOTESTSTART="2021-10-01" export DEMOTESTEND="$DEMOTESTSTART"
./runprediction.sh demotest modelname hemi demotest traindataname
Optionally, stick it into azure too, provided you're set up for it
icenetuploadazure -v -o results/predict/demotest.nc $DEMOTEST_START ```
as an example, to generate a training run based on the atmos23south trained model shown above (assuming you have already seeded your data store using icenetdata_* commands):
```bash export DEMOTESTSTART="2024-01-01"
export DEMOTESTEND=$DEMOTESTSTART
./runprediction.sh demoforecast atmos23south south demotest ```
Implementing and changing environments
The point of having a repository like this is to facilitate easy integration with workflow managers, as well as allow multiple pipelines to easily be co-located in the filesystem. To achieve this have a location that contains your environments and sources, for example:
``` cd hpc/icenet ls -d1 * blue data green pipeline scratch test
pipeline -> green
Optionally you might have local sources for installs (e.g. not pip installed)
icenet.blue
icenet.green
```
Change the location of the pipeline from green to blue
```bash TARGET=blue
ln -sfn $TARGET pipeline
If using a branch, go into icenet.blue and pull / checkout as required, e.g.
cd icenet.blue git pull git checkout my-feature-branch cd ..
Next update the conda environment, which will be specific to your local disk
ln -sfn $HOME/hpc/miniconda3/envs/icenet-$TARGET $HOME/hpc/miniconda3/envs/icenet cd pipeline git pull
Update the environment
conda env update -n icenet -f environment.yml conda activate icenet pip install --upgrade -r requirements-pip.txt pip install -e ../icenet.$TARGET ```
Credits
- Tom Andersson - Lead researcher
- James Byrne - Research Software Engineer
- Scott Hosking - PI
License
The template_* files are not to be considered with respect to the icenet-pipeline repository, they're used in publishing forecasts!
Please see LICENSE file for license information!
Owner
- Name: icenet-ai
- Login: icenet-ai
- Kind: organization
- Repositories: 14
- Profile: https://github.com/icenet-ai
Citation (CITATION.cff)
cff-version: 1.2.0
title: icenet-pipeline
message: "If you use this software, please cite it as below."
type: software
authors:
- given-names: James
family-names: Byrne
email: jambyr@bas.ac.uk
affiliation: British Antarctic Survey
orcid: "https://orcid.org/0000-0003-3731-2377"
- given-names: Bryn Noel
family-names: Ubald
email: bryald@bas.ac.uk
affiliation: British Antarctic Survey
orcid: "https://orcid.org/0000-0002-0206-7140"
- given-names: Ryan
family-names: Chan
email: rchan@turing.ac.uk
affiliation: The Alan Turing Institute
repository-code: "https://github.com/icenet-ai/icenet-pipeline"
url: "https://icenet.ai/"
repository: "https://github.com/icenet-ai/icenet"
abstract: >-
icenet-pipeline is a repository containing tools that
enables operational execution of the IceNet probabilistic
deep-learning library for sea-ice forecasting via a
Command Line Interface (CLI). It is an end-to-end pipeline
that enables the generation of forecast outputs.
keywords:
- sea-ice
- pipeline
- forecast
- machine learning
- cryosphere
- antarctic
- arctic
- ice
- deep learning
license: MIT
version: "v0.2.9"
GitHub Events
Total
- Issues event: 8
- Issue comment event: 7
- Push event: 7
- Pull request event: 9
- Create event: 1
Last Year
- Issues event: 8
- Issue comment event: 7
- Push event: 7
- Pull request event: 9
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 5
- Total pull requests: 5
- Average time to close issues: 5 months
- Average time to close pull requests: about 2 months
- Total issue authors: 2
- Total pull request authors: 2
- Average comments per issue: 0.8
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 5
- Average time to close issues: 21 days
- Average time to close pull requests: about 2 months
- Issue authors: 2
- Pull request authors: 2
- Average comments per issue: 0.75
- Average comments per pull request: 0.0
- Merged pull requests: 4
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- bnubald (12)
- JimCircadian (5)
- thomaszwagerman (1)
- matscorse (1)
Pull Request Authors
- bnubald (13)
- JimCircadian (4)