https://github.com/alan-turing-institute/ice-station-zebra

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.7%) to scientific vocabulary

Last synced: 9 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: alan-turing-institute
License: mit
Language: Jupyter Notebook
Default Branch: main
Size: 6.79 MB

Statistics

Stars: 3
Watchers: 6
Forks: 0
Open Issues: 26
Releases: 0

Created over 1 year ago · Last pushed 10 months ago

Metadata Files

Readme Contributing License

Ice Station Zebra

A pipeline for predicting sea ice.

Setting up your environment

Tools

You will need to install the following tools if you want to develop this project:

uv

Creating your own configuration file

Create a file in config that is called <your chosen name here>.local.yaml. You will want this to inherit from base.yaml and then apply your own changes on top. For example, the following config will override the base_path option in base.yaml:

```yaml defaults: - base

base_path: /local/path/to/my/data ```

You can then run this with, e.g.:

bash uv run zebra datasets create --config-name <your local config>.yaml You can also use this config to override other options in the base.yaml file, as shown below:

```yaml defaults: - base - override /model: encodeunetdecode # Use this format if you want to use a different config

Override specific model parameters

model: processor: startoutchannels: 37 # Use this format to override specific model parameters in the named configs

base_path: /local/path/to/my/data ```

Alternatively, you can apply overrides to specific options at the command line like this:

bash uv run zebra datasets create ++base_path=/local/path/to/my/data

Note that persistence.yaml overrides the specific options in base.yaml needed to run the Persistence model.

Running on Baskerville

As uv cannot easily be installed on Baskerville, you should install the zebra package directly into a virtual environment that you have set up.

bash source /path/to/venv/activate.sh pip install -e .

This means that later commands like uv run X ... should simply be X ... instead.

Running Zebra commands

Create

You will need a CDS account to download data with anemoi.

Run uv run zebra datasets create to download all datasets locally.

Inspect

Run uv run zebra datasets inspect to inspect all datasets available locally.

Train

Run uv run zebra train to train using the datasets specified in the config.

:informationsource: This will save checkpoints to `${BASEDIR}/training/wandb/run-${DATE}$-${RANDOMSTRING}/checkpoints/${CHECKPOINTNAME}$.ckpt`.

Evaluate

Run uv run zebra evaluate --checkpoint PATH_TO_A_CHECKPOINT to evaluate using a checkpoint from a training run.

Adding a new model

Background

An ice-station-zebra model needs to be able to run over multiple different datasets with different dimensions. These are structured in NTCHW format, where: - N is the batch size, - T is the number of history (forecast) steps for inputs (outputs) - C is the number of channels or variables - H is a height dimension - W is a width dimension

N and T will be the same for all inputs, but C, H and W might vary.

Taking as an example, a batch size (N=2), 3 history steps and 4 forecast steps, we will have k inputs of shape (2, 3, C_k, H_k, W_k) and one output of shape (2, 4, C_out, H_out, W_out).

Standalone models

A standalone model will need to accept a dict[str, TensorNTCHW] which maps dataset names to an NTCHW Tensor of values. The model might want to use one or more of these for training, and will need to produce an output with shape N, T, C_out, H_out, W_out.

As can be seen in the example below, a separate instance of the model is likely to be needed for each output to be predicted.

Pros: - all input variables are available without transformation

Cons: - hard to add new inputs - hard to add new outputs

Processor models

A processor model is part of a larger encode-process-decode step. Start by defining a latent space as (C_latent, H_latent, W_latent) - in the example below, this has been set to (10, 64, 64). The encode-process-decode model automatically creates one encoder for each input and one decoder for each output. The dataset-specific encoder takes the input data and converts it to shape (N, C_latent, H_latent, W_latent), compressing the time and channels dimensions. The k encoded datasets can then be combined in latent space to give a single dataset of shape (N, k * C_latent, H_latent, W_latent).

This is then passed to the processor, which must accept input of shape (N, k * C_latent, H_latent, W_latent) and produce output of the same shape.

This output is then passed to one or more output-specific decoders which take input of shape (N, k * C_latent, H_latent, W_latent) and produce output of shape (N, T, C_out, H_out, W_out), regenerating the time dimension.

Pros: - easy to add new inputs - easy to add new outputs

Cons: - input variables have been transformed into latent space - time-step information has been compressed into the latent space

Owner

Name: The Alan Turing Institute
Login: alan-turing-institute
Kind: organization
Email: info@turing.ac.uk

Website: https://turing.ac.uk
Repositories: 477
Profile: https://github.com/alan-turing-institute

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total

Create event: 32
Issues event: 51
Watch event: 5
Delete event: 19
Issue comment event: 68
Member event: 2
Push event: 182
Pull request review comment event: 83
Pull request review event: 108
Pull request event: 44

Last Year

Create event: 32
Issues event: 51
Watch event: 5
Delete event: 19
Issue comment event: 68
Member event: 2
Push event: 182
Pull request review comment event: 83
Pull request review event: 108
Pull request event: 44

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 36
Total pull requests: 28
Average time to close issues: 7 days
Average time to close pull requests: 3 days
Total issue authors: 7
Total pull request authors: 7
Average comments per issue: 0.31
Average comments per pull request: 1.11
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 36
Pull requests: 28
Average time to close issues: 7 days
Average time to close pull requests: 3 days
Issue authors: 7
Pull request authors: 7
Average comments per issue: 0.31
Average comments per pull request: 1.11
Merged pull requests: 14
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jemrobinson (13)
marianovitasari20 (7)
aranas (6)
louisavz (4)
IFenton (3)
LydiaFrance (2)
npedrazzini (1)

Pull Request Authors

jemrobinson (10)
marianovitasari20 (8)
npedrazzini (3)
IFenton (3)
aranas (2)
erinuclkwon (1)
LydiaFrance (1)

Top Labels

Issue Labels

P1 (8) bug (1)

Pull Request Labels

Dependencies

pyproject.toml pypi

hydra-core >=1.3.2

uv.lock pypi

antlr4-python3-runtime 4.9.3
hydra-core 1.3.2
ice-station-zebra 0.1.0
omegaconf 2.3.0
packaging 24.2
pyyaml 6.0.2

notebook/environment.yml conda

icenet
ipykernel
jupyterlab
lightning
netcdf4 <1.6.1
notebook
numpy
pandas
python 3.11.*
seaborn
tensorflow
torch
torchaudio
torchmetrics
torchvision