gsi-wrangling-workflow
Automated data wrangling workflow for Green Stormwater Infrastructure Living Lab at University of Arizona
https://github.com/uarizonagsicampuslivinglab/gsi-wrangling-workflow
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Automated data wrangling workflow for Green Stormwater Infrastructure Living Lab at University of Arizona
Basic Info
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
gsi-wrangling-workflow
This repository contains code to automatically collect and wrangle data from the GSI Living Lab at University of Arizona. The data set is available upon request. Request data here: GSI Living Lab Data Request.
How does it work?

This repository houses gsi_wrangling.Rmd and gsi_archive.Rmd which are both published (manually) to Posit Connect as scheduled workflows.
gsi_wrangling.Rmd is run daily and to pull the most recent data for the Campus Living Lab sites from ZentraCloud, wrangle the data, and append it to a .csv file stored on Box.
gsi_archive.Rmd is run monthly to pull the most recent data and metadata from Box and upload it to Zenodo as a new version of 10.5281/zenodo.10823037.\
The gsi-dashboard repository contains code for a Shiny app that is automatically deployed to Posit Connect (using GitHub Actions) when updates are made to the main branch.
This Shiny app reads in the data from Box on start-up and provides interactive visualizations of the data.
Contributing
To run the code in this repo locally, you'll need to set up access to Zentra Cloud and access to the Box API
renv
This project uses renv for package management.
When you open this R Project, renv will bootstrap itself and should prompt you to run renv::restore() to install all dependencies.
If for some reason renv::restore() doesn't work for you, you can deactivate renv with renv::deactivate() and install packages the usual way.
renv is primarily used in this project for publishing to Posit Connect, and shouldn't be necessary for you to run any of the code locally.
Zentra Cloud
- Create a .Renviron file (e.g. with
usethis::edit_r_environ("project")) and add an environment variable for the Zentra Cloud API token
ZENTRACLOUD_TOKEN=<token>
- If for some reason
renv::restore()didn't install thezentracloudR package, you can install it from r-universe or directly from GitLab
``` r
r-universe installation
install.packages('zentracloud', repos = c('https://cct-datascience.r-universe.dev', 'https://cloud.r-project.org'))
GitLab installation
pak::pkg_install("gitlab::meter-group-inc/pubpackages/zentracloud") ```
- The token in .Renviron is not automatically read in by
zentracloud, so you'll find code to set options at the top of most scripts:
r
zentracloud::setZentracloudOptions(
token = Sys.getenv("ZENTRACLOUD_TOKEN"),
domain = "default"
)
Box
The Box API is accessed using the boxr package.
You'll find instructions on how to authenticate with Box on the boxr website.
If you're a collaborator just interested in running this code locally, you can follow these instructions to authenticate to Box as a user (this is called an "interactive app", which is a little confusing).
Once you've followed those instructions and have added a BOX_CLIENT_ID and BOX_CLIENT_SECRET to the .Renviron file, just be sure to replace box_auth_service() with box_auth() and you should be able to run the code in gsi_wrangling.Rmd.
This automated workflow uses a service app to upload data to a shared box folder when gsi_wrangling.Rmd is run on Posit Connect.
If you need to change any settings or get credentials for this service app, you'll need to request access from Vanessa Buzzard or the CCT Data Science group.
There are additional UA-specific instructions on setting up a service app here: https://cct-datascience.github.io/group-procedures/boxr.html.
R/box_app_setup.R contains some code used when setting up the Box service app authentication (think of it like notes rather than a script to run).
The only thing I've done differently from the boxr documentation is to copy the contents of the .boxr-auth file and added it as an environment variable BOX_TOKEN_TEXT.
Posit Connect
Both gsi_wrangling.Rmd and gsi_archive.Rmd are published "manually" on Posit Connect at UA (viz.datascience.arizona.edu).
They only need to be re-published if there are changes made to those documents.
Anyone who is a collaborator should be able to publish these documents via RStudio (instructions).
The environment variables ZENTRACLOUD_TOKEN and BOX_TOKEN_TEXT need to be set on Posit Connect for these workflows to run (setting env variables on Posit Connect).
This should only have to happen once, and not each time a document is re-published.
Files
in R/ you will find:
-
box_app_setup.R: some code I used when first setting up Box authentication. Not to be run again, but just as an example. -
estimate_data_size.R: a script for extrapolating data size -
gsi_get_data.R: a function,gsi_get_data(), for downloading and wrangling data from the Zentra Cloud API. -
gsi_get_eto.R: a function,gsi_get_eto(), for downloading potential evapotranspiration data from the ZentraCloud models API endpoint. - Other functions used to calculate variables such as heat index, wind chill, etc.
Notes
Two sensors at Old Main were plugged into incorrect ports upon installation. On December 11, 2024, these sensors switched logger ports. Plugs for port 3 and 5 were switched on the Old Main z6-19485 logger. This fixed the port and location pairing to match the actual location of each sensor. See site_info.csv for information on how to correct for pre-December 2024 data during analysis.
Contributors
Developed in collaboration with the University of Arizona CCT Data Science team
Owner
- Name: UArizonaGSICampusLivingLab
- Login: UArizonaGSICampusLivingLab
- Kind: organization
- Repositories: 1
- Profile: https://github.com/UArizonaGSICampusLivingLab
Citation (CITATION.cff)
# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!
cff-version: 1.2.0
title: >-
Campus Living Lab Green Stormwater Infrastructure Data
Workflow
message: >-
To cite the code in this repository, use the metadata in this
file. If you wish to use the data and cite it, please use
citation information on the data request form
<https://forms.gle/63qWCybhvHaHunuH6>.
type: software
version: 0.1.0
authors:
- given-names: Eric R
family-names: Scott
orcid: 'https://orcid.org/0000-0002-7430-7879'
affiliation: >-
Communications & Cyber Technologies, Arizona
Experiment Station, University of Arizona
email: ericrscott@arizona.edu
- given-names: Malcolm Javier
family-names: Barrios
affiliation: 'College of Engineering, University of Arizona'
- given-names: Kristina
family-names: Riemer
orcid: 'https://orcid.org/0000-0003-3802-3331'
affiliation: >-
Communications & Cyber Technologies, Arizona
Experiment Station, University of Arizona
- given-names: Vanessa
family-names: Buzzard
affiliation: >-
School of Natural Resources and the Environment,
University of Arizona
orcid: 'https://orcid.org/0000-0003-2929-0833'
repository-code: >-
https://github.com/UArizonaGSICampusLivingLab/gsi-wrangling-workflow
license: MIT
GitHub Events
Total
- Issues event: 2
- Delete event: 1
- Issue comment event: 6
- Push event: 5
- Pull request event: 4
- Create event: 2
Last Year
- Issues event: 2
- Delete event: 1
- Issue comment event: 6
- Push event: 5
- Pull request event: 4
- Create event: 2