stats-biogeo-2021

Extracting and processing data from MIT's Darwin model, and applying statistical learning methods. In support of submitted manuscript.

https://github.com/leebardon/stats-biogeo-2021

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary

Keywords

biomass gams machine-learning mitgcm plankton python

Last synced: 6 months ago · JSON representation ·

Repository

Extracting and processing data from MIT's Darwin model, and applying statistical learning methods. In support of submitted manuscript.

Basic Info

Host: GitHub
Owner: leebardon
License: mit
Language: Python
Default Branch: master
Homepage:
Size: 908 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 1

Topics

biomass gams machine-learning mitgcm plankton python

Created almost 5 years ago · Last pushed over 2 years ago

Metadata Files

Readme License Citation

How predictable is plankton biogeography using statistical learning methods?

Codebase associated with:
Bardon, L. R., Ward, B. A., Dutkiewicz, S., & Cael, B. B. (2021). Testing the skill of a species distribution model using a 21st century virtual ecosystem. Geophysical Research Letters, 48, e2021GL093455. https://doi.org/10.1029/2021GL093455

This package contains a series of analytical tools to extract and data from output from MIT's Darwin marine ecosystem model, embedded in MITgcm. It trains statistical learning models (GAMs) on a subset of historical Darwin ocean data, sampled to mimic real-world observational data, and also on randomly-sampled datasets of various sizes. It quantifies the effect of spatial bias and of training set sample size on the resulting predictions. Altogether, the program allows us to assess GAMs model skill in predicting the virtual ocean's plankton biogeography, both in present-day spatial extrapolations, and by the end of the 21st century, as a response to climate change.

STEP 1

Extracts and cleans surface data (Z=0) from Darwin output files (1987-2008, 2079-2100)

Builds a binary sampling matrix (BSM) using a publicly-available ocean measurements dataset

Uses the BSM to sample Darwin model at real-life ocean-measurement locations

Builds an identically-sized BSM to sample the Darwin model at random locations

Plots a 3D matrix (Lat, Lon, Month) to visualise spatiotemporal distributions (pdf)

Plots histogram of measurements per month (pdf)

Builds a further 54 randomly-sampled training sets spanning 18 size classes (N=63 to N=11,557)

STEP 2

The samples are used as training datasets for Generalised Additive Models (GAMs)

Plankton species are combined into functional groups (pro, pico cocco, diaz, diatom, dino, zoo)

Biomass is selected as target variable for GAMs

Physical variables (SST, SSS, PAR) and nutrients (NO3, PO4, Fe, Si) set as predictors

GAMs are trained, and partial dependency plots are outputted (pdf)

GAMs are used to predict plankton biogeography across whole-ocean in 1987-2008, and 2079-2100

STEP 3

Global biomass maps are plotted for qualitative comparison between target and GAMs predictions

Relative difference (%) maps between Darwin 'truth' and GAMs predictions are plotted for 1987-2008 and 2079-2100

Target and predictions are quantitatively compared with a series of descriptive statistics

The above analyses are repeated for each plankton functional group

Correlations using the Distance Correlation method, Pearson's, and Spearman's are calculated

Correlation heatmaps are produced

Getting Started

These instructions will get you a copy of the project up and running on your local machine for dev. or testing.

Prerequisites

First, please ensure that you have a copy of the conda package manager installed locally (miniconda is recommended).

Fork and clone the project repository onto your local machine.

Create Environment

From the root of the cloned project, run:

make create_environment

This will create a virtual environment for the project, to install project dependencies, and minimise the possibility of conflicts with other elements of your system. You will be prompted to activate - go ahead and do so :)

Next, inform your python interpreter of the structure of the project, so it understands which internal components should be treated as callable modules:

make setup

Finally, install the project dependencies:

make requirements

Run Program

To run, ensure you're in the root directory (containing runscript.py) and enter:

python runscript.py

PLEASE NOTE

This program has only been tested on Unix environments (Mac/Linux). It may not work on Windows.

Authors

Lee Bardon - Initial work - leebardon

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Owner

Name: Lee Bardon
Login: leebardon
Kind: user
Location: Los Angeles, California
Company: PhD student at USC

Website: https://leebardon.github.io/
Twitter: teatauri
Repositories: 6
Profile: https://github.com/leebardon

< From Software to Science >

Citation (CITATION.cff)

cff-version: 1.2.0
title: StatsBG
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Lee
    family-names: Bardon
    email: leerbardon@gmail.com
    affiliation: University of Southern California
    orcid: 'https://orcid.org/0000-0001-5470-903X'
identifiers:
  - type: doi
    value: 10.1029/2021GL093455
    description: Code base associated with above DOI.
url: "https://github.com/leebardon/stats-biogeo-2021"
version: 1.0.0
date-released: 2021-11-18
abstract: >-
  This package contains a series of analytical tools to
  extract and data from output from MIT's Darwin marine
  ecosystem model, embedded in MITgcm. It trains statistical
  learning models (GAMs) on a subset of historical Darwin
  ocean data, sampled to mimic real-world observational
  data, and also on randomly-sampled datasets of various
  sizes. It quantifies the effect of spatial bias and of
  training set sample size on the resulting predictions.
  Altogether, the program allows us to assess GAMs model
  skill in predicting the virtual ocean's plankton
  biogeography, both in present-day spatial extrapolations,
  and by the end of the 21st century, as a response to
  climate change.
license: MIT

GitHub Events

Total

Last Year

Committers

Last synced: about 2 years ago

All Time

Total Commits: 63
Total Committers: 1
Avg Commits per committer: 63.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 3
Committers: 1
Avg Commits per committer: 3.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Lee Bardon	l**n@g**m	63

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 0
Total pull requests: 43
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 43
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

stats-biogeo-2021

Science Score: 57.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

How predictable is plankton biogeography using statistical learning methods?

STEP 1

STEP 2

STEP 3

Getting Started

Prerequisites

Create Environment

Run Program

PLEASE NOTE

Authors

License

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels