https://github.com/alan-turing-institute/ice-station-zebra

https://github.com/alan-turing-institute/ice-station-zebra

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: alan-turing-institute
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 6.79 MB
Statistics
  • Stars: 3
  • Watchers: 6
  • Forks: 0
  • Open Issues: 26
  • Releases: 0
Created over 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Contributing License

README.md

Ice Station Zebra

A pipeline for predicting sea ice.

Setting up your environment

Tools

You will need to install the following tools if you want to develop this project:

Creating your own configuration file

Create a file in config that is called <your chosen name here>.local.yaml. You will want this to inherit from base.yaml and then apply your own changes on top. For example, the following config will override the base_path option in base.yaml:

```yaml defaults: - base

base_path: /local/path/to/my/data ```

You can then run this with, e.g.:

bash uv run zebra datasets create --config-name <your local config>.yaml You can also use this config to override other options in the base.yaml file, as shown below:

```yaml defaults: - base - override /model: encodeunetdecode # Use this format if you want to use a different config

Override specific model parameters

model: processor: startoutchannels: 37 # Use this format to override specific model parameters in the named configs

base_path: /local/path/to/my/data ```

Alternatively, you can apply overrides to specific options at the command line like this:

bash uv run zebra datasets create ++base_path=/local/path/to/my/data

Note that persistence.yaml overrides the specific options in base.yaml needed to run the Persistence model.

Running on Baskerville

As uv cannot easily be installed on Baskerville, you should install the zebra package directly into a virtual environment that you have set up.

bash source /path/to/venv/activate.sh pip install -e .

This means that later commands like uv run X ... should simply be X ... instead.

Running Zebra commands

Create

You will need a CDS account to download data with anemoi.

Run uv run zebra datasets create to download all datasets locally.

Inspect

Run uv run zebra datasets inspect to inspect all datasets available locally.

Train

Run uv run zebra train to train using the datasets specified in the config.

:informationsource: This will save checkpoints to `${BASEDIR}/training/wandb/run-${DATE}$-${RANDOMSTRING}/checkpoints/${CHECKPOINTNAME}$.ckpt`.

Evaluate

Run uv run zebra evaluate --checkpoint PATH_TO_A_CHECKPOINT to evaluate using a checkpoint from a training run.

Adding a new model

Background

An ice-station-zebra model needs to be able to run over multiple different datasets with different dimensions. These are structured in NTCHW format, where: - N is the batch size, - T is the number of history (forecast) steps for inputs (outputs) - C is the number of channels or variables - H is a height dimension - W is a width dimension

N and T will be the same for all inputs, but C, H and W might vary.

Taking as an example, a batch size (N=2), 3 history steps and 4 forecast steps, we will have k inputs of shape (2, 3, C_k, H_k, W_k) and one output of shape (2, 4, C_out, H_out, W_out).

Standalone models

A standalone model will need to accept a dict[str, TensorNTCHW] which maps dataset names to an NTCHW Tensor of values. The model might want to use one or more of these for training, and will need to produce an output with shape N, T, C_out, H_out, W_out.

As can be seen in the example below, a separate instance of the model is likely to be needed for each output to be predicted.

image

Pros: - all input variables are available without transformation

Cons: - hard to add new inputs - hard to add new outputs

Processor models

A processor model is part of a larger encode-process-decode step. Start by defining a latent space as (C_latent, H_latent, W_latent) - in the example below, this has been set to (10, 64, 64). The encode-process-decode model automatically creates one encoder for each input and one decoder for each output. The dataset-specific encoder takes the input data and converts it to shape (N, C_latent, H_latent, W_latent), compressing the time and channels dimensions. The k encoded datasets can then be combined in latent space to give a single dataset of shape (N, k * C_latent, H_latent, W_latent).

This is then passed to the processor, which must accept input of shape (N, k * C_latent, H_latent, W_latent) and produce output of the same shape.

This output is then passed to one or more output-specific decoders which take input of shape (N, k * C_latent, H_latent, W_latent) and produce output of shape (N, T, C_out, H_out, W_out), regenerating the time dimension.

image

Pros: - easy to add new inputs - easy to add new outputs

Cons: - input variables have been transformed into latent space - time-step information has been compressed into the latent space

Owner

  • Name: The Alan Turing Institute
  • Login: alan-turing-institute
  • Kind: organization
  • Email: info@turing.ac.uk

The UK's national institute for data science and artificial intelligence.

GitHub Events

Total
  • Create event: 32
  • Issues event: 51
  • Watch event: 5
  • Delete event: 19
  • Issue comment event: 68
  • Member event: 2
  • Push event: 182
  • Pull request review comment event: 83
  • Pull request review event: 108
  • Pull request event: 44
Last Year
  • Create event: 32
  • Issues event: 51
  • Watch event: 5
  • Delete event: 19
  • Issue comment event: 68
  • Member event: 2
  • Push event: 182
  • Pull request review comment event: 83
  • Pull request review event: 108
  • Pull request event: 44

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 36
  • Total pull requests: 28
  • Average time to close issues: 7 days
  • Average time to close pull requests: 3 days
  • Total issue authors: 7
  • Total pull request authors: 7
  • Average comments per issue: 0.31
  • Average comments per pull request: 1.11
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 36
  • Pull requests: 28
  • Average time to close issues: 7 days
  • Average time to close pull requests: 3 days
  • Issue authors: 7
  • Pull request authors: 7
  • Average comments per issue: 0.31
  • Average comments per pull request: 1.11
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jemrobinson (13)
  • marianovitasari20 (7)
  • aranas (6)
  • louisavz (4)
  • IFenton (3)
  • LydiaFrance (2)
  • npedrazzini (1)
Pull Request Authors
  • jemrobinson (10)
  • marianovitasari20 (8)
  • npedrazzini (3)
  • IFenton (3)
  • aranas (2)
  • erinuclkwon (1)
  • LydiaFrance (1)
Top Labels
Issue Labels
P1 (8) bug (1)
Pull Request Labels

Dependencies

pyproject.toml pypi
  • hydra-core >=1.3.2
uv.lock pypi
  • antlr4-python3-runtime 4.9.3
  • hydra-core 1.3.2
  • ice-station-zebra 0.1.0
  • omegaconf 2.3.0
  • packaging 24.2
  • pyyaml 6.0.2
notebook/environment.yml conda
  • icenet
  • ipykernel
  • jupyterlab
  • lightning
  • netcdf4 <1.6.1
  • notebook
  • numpy
  • pandas
  • python 3.11.*
  • seaborn
  • tensorflow
  • torch
  • torchaudio
  • torchmetrics
  • torchvision