s2spy

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting

https://github.com/ai4s2s/s2spy

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com, zenodo.org
  • Committers with academic emails
    2 of 10 committers (20.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.8%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting

Basic Info
Statistics
  • Stars: 21
  • Watchers: 3
  • Forks: 7
  • Open Issues: 16
  • Releases: 6
Created almost 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

s2spy: Boost (sub) seasonal forecasting with AI

Logo

github repo badge github license badge fair-software badge DOI Documentation Status build codecov

A high-level python package integrating expert knowledge and artificial intelligence to boost (sub) seasonal forecasting.

Why s2spy?

Producing reliable sub-seasonal to seasonal (S2S) forecasts with machine learning techniques remains a challenge. Currently, these data-driven S2S forecasts generally suffer from a lack of trust because of: - Intransparent data processing and poorly reproducible scientific outcomes - Technical pitfalls related to machine learning-based predictability (e.g. overfitting) - Black-box methods without sufficient explanation

To tackle these challenges, we build s2spy which is an open-source, high-level python package. It provides an interface between artificial intelligence and expert knowledge, to boost predictability and physical understanding of S2S processes. By implementing optimal data-handling and parallel-computing packages, it can efficiently run across different Big Climate Data platforms. Key components will be explainable AI and causal discovery, which will support the classical scientific interplay between theory, hypothesis-generation and data-driven hypothesis-testing, enabling knowledge-mining from data.

Developing this tool will be a community effort. It helps us achieve trustworthy data-driven forecasts by providing: - Transparent and reproducible analyses - Best practices in model verifications - Understanding the sources of predictability

Installation

workflow pypi badge supported python versions

To install the latest release of s2spy, do: console python3 -m pip install s2spy

To install the in-development version from the GitHub repository, do:

console python3 -m pip install git+https://github.com/AI4S2S/s2spy.git

Configure the package for development and testing

For developing and testing the package, please follow the developer guide, which can be found here.

Getting started

s2spy provides end-to-end solutions for machine learning (ML) based S2S forecasting.

workflow

Datetime operations & Data processing

In a typical ML-based S2S project, the first step is always data processing. Our calendar-based package, lilio, is used for time operations. For instance, a user is looking for predictors for winter climate at seasonal timescales (~180 days). First, a Calendar object is created using daily_calendar:

```py

calendar = lilio.dailycalendar(anchor="11-30", length='180d') calendar = calendar.mapyears(2020, 2021) calendar.show() iinterval -1 1 anchoryear 2021 [2021-06-03, 2021-11-30) [2021-11-30, 2022-05-29) 2020 [2020-06-03, 2020-11-30) [2020-11-30, 2021-05-29) ```

Now, the user can load the data input_data (e.g. pandas DataFrame) and resample it to the desired timescales configured in the calendar:

```py

calendar = calendar.maptodata(inputdata) bins = lilio.resample(calendar, inputdata) bins anchoryear iinterval interval mean_data target 0 2020 -1 [2020-06-03, 2020-11-30) 275.5 True 1 2020 1 [2020-11-30, 2021-05-29) 95.5 False 2 2021 -1 [2021-06-03, 2021-11-30) 640.5 True 3 2021 1 [2021-11-30, 2022-05-29) 460.5 False ```

Depending on data preparations, we can choose different types of calendars. For more information, see Lilio's documentation.

Cross-validation

Lilio can also generate train/test splits and perform cross-validation. To do that, a splitter is called from sklearn.model_selection e.g. ShuffleSplit and used to split the resampled data:

py from sklearn.model_selection import ShuffleSplit splitter = ShuffleSplit(n_splits=3) lilio.traintest.split_groups(splitter, bins)

All splitter classes from scikit-learn are supported, a list is available here. Users should follow scikit-learn documentation on how to use a different splitter class.

Dimensionality reduction

With s2spy, we can perform dimensionality reduction on data. For instance, to perform the Response Guided Dimensionality Reduction (RGDR), we configure the RGDR operator and fit it to a precursor field. Then, this cluster can be used to transform the data into the reduced clusters: py rgdr = RGDR(eps_km=600, alpha=0.05, min_area_km2=3000**2) rgdr.fit(precursor_field, target_timeseries) clustered_data = rgdr.transform(precursor_field) _ = rgdr.plot_clusters(precursor_field, target_timeseries, lag=1) clusters

(for more information about precursor_field and target_timeseries, check the complete example in this notebook.)

Currently, s2spy supports dimensionality reduction approaches from scikit-learn.

Tutorials

s2spy supports operations that are common in a machine learning pipeline of sub-seasonal to seasonal forecasting research. Tutorials covering supported methods and functionalities are listed in notebooks. To check these notebooks, users need to install Jupyter lab. More details about each method can be found in this API reference documentation.

Advanced usecases

You can achieve more by integrating s2spy and lilio into your data-driven S2S forecast workflow! We have a magic cookbook, which includes recipes for complex machine learning based forecasting usecases. These examples will show you how s2spy and lilio can facilitate your workflow.

Documentation

Documentation Status

For detailed information on using s2spy package, visit the documentation page hosted at Readthedocs.

Contributing

If you want to contribute to the development of s2spy, have a look at the contribution guidelines.

How to cite us

RSD DOI

Please use the Zenodo DOI to cite this package if you used it in your research.

Acknowledgements

This package was developed by the Netherlands eScience Center and Vrije Universiteit Amsterdam. Development was supported by the Netherlands eScience Center under grant number NLESC.OEC.2021.005.

This package was created with Cookiecutter and the NLeSC/python-template.

Owner

  • Name: AI4S2S
  • Login: AI4S2S
  • Kind: organization

Citation (CITATION.cff)

# YAML 1.2
---
cff-version: "1.1.0"
title: "s2spy"
authors:
  -
    family-names: Liu
    given-names: Yang
    orcid: "https://orcid.org/0000-0002-1966-8460"
    affilication: "Netherlands eScience Center"

  -
    family-names: Kalverla
    given-names: Peter
    orcid: "https://orcid.org/0000-0002-5025-7862"
    affiliation: "Netherlands eScience Center"

  -
    family-names: Schilperoort
    given-names: Bart
    orcid: "https://orcid.org/0000-0003-4487-9822"
    affiliation: "Netherlands eScience Center"

  -
    affiliation: "Netherlands eScience Center"
    family-names: Alidoost
    given-names: Fakhereh
    orcid: https://orcid.org/0000-0001-8407-6472

  -
    family-names: Vijverberg
    given-names: Sem
    orcid: "https://orcid.org/0000-0002-1839-2618"
    affiliation: "Vrije Universiteit Amsterdam"

  -
    family-names: van Ingen
    given-names: Jannes
    affiliation: "Vrije Universiteit Amsterdam"

  -
    affiliation: "Netherlands eScience Center"
    family-names: Donnelly
    given-names: Claire
    orcid: https://orcid.org/0000-0002-2546-4528

date-released: 2022-09-02
version: "0.4.1"
repository-code: "https://github.com/AI4S2S/s2spy"
keywords:
  - s2s
  - ai
message: "If you use this software, please cite it using these metadata."
license: Apache-2.0

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 454
  • Total Committers: 10
  • Avg Commits per committer: 45.4
  • Development Distribution Score (DDS): 0.57
Past Year
  • Commits: 3
  • Committers: 2
  • Avg Commits per committer: 1.5
  • Development Distribution Score (DDS): 0.333
Top Committers
Name Email Commits
Yang y****u@e****l 195
Bart Schilperoort b****t@g****m 182
Peter Kalverla p****a@g****m 42
Bart Schilperoort b****t@e****l 14
jannesvaningen j****n@h****m 12
Sem s****g@v****l 2
Sem Vijverberg s****g@v****l 2
Claire Donnelly 8****s 2
jannesvaningen 8****n 2
NLeSC Python template n****e 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 51
  • Total pull requests: 65
  • Average time to close issues: 2 months
  • Average time to close pull requests: 15 days
  • Total issue authors: 7
  • Total pull request authors: 7
  • Average comments per issue: 1.86
  • Average comments per pull request: 2.83
  • Merged pull requests: 54
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 2
  • Average time to close issues: 14 days
  • Average time to close pull requests: 8 days
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.5
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • geek-yang (17)
  • BSchilperoort (16)
  • semvijverberg (8)
  • Peter9192 (7)
  • ClaireDons (2)
  • jannesvaningen (1)
  • pimmeerdink (1)
Pull Request Authors
  • BSchilperoort (32)
  • geek-yang (21)
  • semvijverberg (5)
  • ClaireDons (3)
  • pimmeerdink (3)
  • jannesvaningen (2)
  • Peter9192 (2)
Top Labels
Issue Labels
RDGR (15) enhancement (10) Calendar (10) bug (4) train/test (1) question (1)
Pull Request Labels
RDGR (1)

Dependencies

.github/workflows/build.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/cffconvert.yml actions
  • actions/checkout v3 composite
  • citation-file-format/cffconvert-github-action main composite
.github/workflows/documentation.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/markdown-link-check.yml actions
  • actions/checkout v3 composite
  • gaurav-nelson/github-action-markdown-link-check v1 composite
.github/workflows/sonarcloud.yml actions
  • SonarSource/sonarcloud-github-action master composite
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
.github/workflows/python-publish.yml actions
  • actions/checkout v3 composite
  • actions/setup-python v3 composite
  • pypa/gh-action-pypi-publish 27b31702a0e7fc50959f5ad993c78deac1bdfc29 composite
pyproject.toml pypi
  • lilio *
  • matplotlib *
  • netcdf4 *
  • numpy *
  • pandas *
  • scikit-learn *
  • scipy *
  • xarray *