Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (18.1%) to scientific vocabulary
Repository
Code for Flow-MER eFlowEval modelling framework
Basic Info
- Host: GitHub
- Owner: galenholt
- License: other
- Language: R
- Default Branch: master
- Size: 25.8 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
Readme.md
bibliography: references.bib
eFlowEval modeling framework repository
This repository contains the code for the eFlowEval modelling framework. It is an R package, but also contains significant additional files, including scripts, notebooks and shiny apps to create the results in @holt2024 as well as shell scripts to manage running on SLURM HPCs.
To just use the package functionality, use
``` r
install.packages("devtools")
devtools::install_github("galenholt/eFlowEval") ```
If you want to use the other scripts available here, clone this repository, and use {renv} to establish the package environment renv::restore() to ensure there are no package conflicts.
A version at the time of publication is available as release v0.1.0. Note that this does not have package functionality.
Structure of a typical workflow
The directorySet.R script does some setup work to establish standard input and output directories, and make some other configuration changes depending on whether the eFlowEval framework is running on local Windows machines or remote HPCs at the CSIRO. It is likely to need to be edited to use in new environments.
The core functionality is contained as a package, with functions in /R. These are then called by notebooks. These provide tools for checking and processing data, running response models, and producing figures and other outputs. Bespoke functions can be built for the response models, which can be provided here or by the user.
The most up-to-date example of this workflow is in the SRA repository, and this directory is in the process of being updated to that system.
Data itself is typically large and so located elsewhere (with locations as set in directorySet.R). Sources for original data are provided in Supplementary Material. The code will generate a /datOut folder, where processed data is saved, with location defined in directorySet.R.
The current best practice is to use the process_with_checks function to handle both data processing (calling process_data), creating bespoke response models, and calling those with process_with_checks through process_strictures to obtain the outputs. These functions can internally use the other functionality provided, such as weighted spatial aggregations or rolling averages.
At the time of publication the workflow described above was done primarily through scripts. because each response model needs different data or processes data in different ways, the functions in Scripts/DataProcessing are bespoke for each model while using the same fundamental backbone functions and format.
Stricture and other response relationships (e.g. metabolism) are defined with scripts and functions in /Strictures. Like the dataprocessing, these functions are bespoke, capturing the particular responses of each group to the data while using the same fundamental backbone functions and format.The processed responses are then saved to a /strictOut directory the code creates.
Figures and other synthesis of the output are prepared in /Scripts/plotting, and organised and produced in Quarto notebooks in /notebooks.
Data
The data is expected to be in a directory outside the repo, (set as datDir in directorySet.R). Currently, the data is all open-source (ANAE layers, soil temp from MODIS/NASA, soil moisture from AWRA-L (Australian Bureau of Meteorology), and some simple spatial layers for the Murray-Darling Basin and RAMSAR wetland sites, with citations in Supplementary Material. As used, the data sits in a directory at CSIRO and is available on request.
Processing
Most processing is designed to occur either locally or on an HPC running a SLURM job scheduler. The SLURM approach has changed through the life of the project. The most up-to-date approach is in in the SRA repository which dispenses with all but one SLURM script. The SLURM scripts in /SLURM here are deprecated but kept for reproducibility. The HPC process would likely need to be altered for other HPC environments.
The current flow is to have a method that works both locally or on an HPC. By controlling the HPC parallelisation through foreach and future, and so use foreach loops with %dofuture% and modify the plan.
That requires a central control process to spawn subsidiary runs.
In practice, that means we use any_R.sh as the control process. It should point to the file to run. But because we use notebooks, we need to knitr::purl them to R scripts. So anyR.sh calls `runr_hpc.R`, which purls a notebook or passes through a script, and then runs it. Then, that script should start a bunch of jobs.
HPCs often have to have some set of packages already installed that need compiled C libraries (especially sf). To access those, we need to add to libPaths, but that doesn't propagate through {future}s if it's done in a script. So, in the .Rprofile, add
if (grepl('^HPCNAME', Sys.info()["nodename"])) {
renvpaths <- .libPaths()
.libPaths(new = c(renvpaths,'/path/to/hpc/R/library' ))
}
where you get the path to the HPC R library by opening R outside the renv and typing .libPaths().
Typical run
Once everything's set up, use something like
sbatch any_R.sh run_r_hpc.R "MER_data_processing/notebook_with_processing.qmd"
To start a master process in runrhpc that then fires off sub-slurms (presumably) in notebookwithprocessing.qmd.
Contact
For more information, contact Galen Holt, g.holt\@deakin.edu.au.
Reference
Owner
- Name: Galen Holt
- Login: galenholt
- Kind: user
- Repositories: 2
- Profile: https://github.com/galenholt
Citation (CITATION.cff)
# -----------------------------------------------------------
# CITATION file partially created with {cffr} R package, v1.0.0
# See also: https://docs.ropensci.org/cffr/
# -----------------------------------------------------------
cff-version: 1.2.0
message: 'To cite package "eFlowEval" in publications use:'
type: software
license: MIT
title: 'eFlowEval: Tools for assesing ecological response to environmental flows'
version: 0.2.0
abstract: This package provides implementations of the bvstep forward/backward algorithm
to find the best subset of the data that matches the full community. It further
provides functions to iterate that algorithm over some number of random starts to
assess consistency. The `peel` function then uses these to iteratively find the
best subset and remove it.
authors:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275
- family-names: Macqueen
given-names: Ashley
email: ashley.macqueen@vewh.vic.gov.au
contact:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275
preferred-citation:
type: article
title: 'A flexible consistent framework for modelling multiple interacting environmental responses to management in space and time'
authors:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275
- family-names: Macqueen
given-names: Ashley
email: ashley.macqueen@vewh.vic.gov.au
- family-names: Lester
given-names: Rebecca E.
doi: https://doi.org/10.1016/j.jenvman.2024.122054
url: https://www-sciencedirect-com.ezproxy-b.deakin.edu.au/science/article/pii/S0301479724020401
year: '2024'
publisher:
name: Elsevier
volume: 367
issue:
journal: Journal of Environmental Management
start: 122054
repository: https://github.com/galenholt/eFlowEval
repository-code: https://github.com/galenholt/eFlowEval
url: https://github.com/galenholt/eFlowEval
contact:
- family-names: Holt
given-names: Galen
email: g.holt@deakin.edu.au
orcid: https://orcid.org/0000-0002-7455-9275