gsi-wrangling-workflow

Automated data wrangling workflow for Green Stormwater Infrastructure Living Lab at University of Arizona

https://github.com/uarizonagsicampuslivinglab/gsi-wrangling-workflow

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Automated data wrangling workflow for Green Stormwater Infrastructure Living Lab at University of Arizona

Basic Info
  • Host: GitHub
  • Owner: UArizonaGSICampusLivingLab
  • License: mit
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 1.44 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Created over 2 years ago · Last pushed 10 months ago
Metadata Files
Readme License Citation

README.md

gsi-wrangling-workflow

Project Status: Active -- The project has reached a stable, usable state and is being actively developed. code DOI data DOI

This repository contains code to automatically collect and wrangle data from the GSI Living Lab at University of Arizona. The data set is available upon request. Request data here: GSI Living Lab Data Request.

How does it work?

This repository houses gsi_wrangling.Rmd and gsi_archive.Rmd which are both published (manually) to Posit Connect as scheduled workflows. gsi_wrangling.Rmd is run daily and to pull the most recent data for the Campus Living Lab sites from ZentraCloud, wrangle the data, and append it to a .csv file stored on Box. gsi_archive.Rmd is run monthly to pull the most recent data and metadata from Box and upload it to Zenodo as a new version of 10.5281/zenodo.10823037.\ The gsi-dashboard repository contains code for a Shiny app that is automatically deployed to Posit Connect (using GitHub Actions) when updates are made to the main branch. This Shiny app reads in the data from Box on start-up and provides interactive visualizations of the data.

Contributing

To run the code in this repo locally, you'll need to set up access to Zentra Cloud and access to the Box API

renv

This project uses renv for package management. When you open this R Project, renv will bootstrap itself and should prompt you to run renv::restore() to install all dependencies. If for some reason renv::restore() doesn't work for you, you can deactivate renv with renv::deactivate() and install packages the usual way. renv is primarily used in this project for publishing to Posit Connect, and shouldn't be necessary for you to run any of the code locally.

Zentra Cloud

  1. Create a .Renviron file (e.g. with usethis::edit_r_environ("project")) and add an environment variable for the Zentra Cloud API token

ZENTRACLOUD_TOKEN=<token>

  1. If for some reason renv::restore() didn't install the zentracloud R package, you can install it from r-universe or directly from GitLab

``` r

r-universe installation

install.packages('zentracloud', repos = c('https://cct-datascience.r-universe.dev', 'https://cloud.r-project.org'))

GitLab installation

pak::pkg_install("gitlab::meter-group-inc/pubpackages/zentracloud") ```

  1. The token in .Renviron is not automatically read in by zentracloud, so you'll find code to set options at the top of most scripts:

r zentracloud::setZentracloudOptions( token = Sys.getenv("ZENTRACLOUD_TOKEN"), domain = "default" )

Box

The Box API is accessed using the boxr package. You'll find instructions on how to authenticate with Box on the boxr website.

If you're a collaborator just interested in running this code locally, you can follow these instructions to authenticate to Box as a user (this is called an "interactive app", which is a little confusing). Once you've followed those instructions and have added a BOX_CLIENT_ID and BOX_CLIENT_SECRET to the .Renviron file, just be sure to replace box_auth_service() with box_auth() and you should be able to run the code in gsi_wrangling.Rmd.

This automated workflow uses a service app to upload data to a shared box folder when gsi_wrangling.Rmd is run on Posit Connect. If you need to change any settings or get credentials for this service app, you'll need to request access from Vanessa Buzzard or the CCT Data Science group. There are additional UA-specific instructions on setting up a service app here: https://cct-datascience.github.io/group-procedures/boxr.html.

R/box_app_setup.R contains some code used when setting up the Box service app authentication (think of it like notes rather than a script to run). The only thing I've done differently from the boxr documentation is to copy the contents of the .boxr-auth file and added it as an environment variable BOX_TOKEN_TEXT.

Posit Connect

Both gsi_wrangling.Rmd and gsi_archive.Rmd are published "manually" on Posit Connect at UA (viz.datascience.arizona.edu). They only need to be re-published if there are changes made to those documents. Anyone who is a collaborator should be able to publish these documents via RStudio (instructions). The environment variables ZENTRACLOUD_TOKEN and BOX_TOKEN_TEXT need to be set on Posit Connect for these workflows to run (setting env variables on Posit Connect). This should only have to happen once, and not each time a document is re-published.

Files

in R/ you will find:

  • box_app_setup.R: some code I used when first setting up Box authentication. Not to be run again, but just as an example.
  • estimate_data_size.R: a script for extrapolating data size
  • gsi_get_data.R: a function, gsi_get_data(), for downloading and wrangling data from the Zentra Cloud API.
  • gsi_get_eto.R: a function, gsi_get_eto(), for downloading potential evapotranspiration data from the ZentraCloud models API endpoint.
  • Other functions used to calculate variables such as heat index, wind chill, etc.

Notes

Two sensors at Old Main were plugged into incorrect ports upon installation. On December 11, 2024, these sensors switched logger ports. Plugs for port 3 and 5 were switched on the Old Main z6-19485 logger. This fixed the port and location pairing to match the actual location of each sensor. See site_info.csv for information on how to correct for pre-December 2024 data during analysis.

Contributors

Developed in collaboration with the University of Arizona CCT Data Science team

Owner

  • Name: UArizonaGSICampusLivingLab
  • Login: UArizonaGSICampusLivingLab
  • Kind: organization

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Campus Living Lab Green Stormwater Infrastructure Data
  Workflow
message: >-
  To cite the code in this repository, use the metadata in this
  file. If you wish to use the data and cite it, please use 
  citation information on the data request form 
  <https://forms.gle/63qWCybhvHaHunuH6>.
type: software
version: 0.1.0
authors:
  - given-names: Eric R
    family-names: Scott
    orcid: 'https://orcid.org/0000-0002-7430-7879'
    affiliation: >-
      Communications & Cyber Technologies, Arizona
      Experiment Station, University of Arizona
    email: ericrscott@arizona.edu
  - given-names: Malcolm Javier
    family-names: Barrios
    affiliation: 'College of Engineering, University of Arizona'
  - given-names: Kristina
    family-names: Riemer
    orcid: 'https://orcid.org/0000-0003-3802-3331'
    affiliation: >-
      Communications & Cyber Technologies, Arizona
      Experiment Station, University of Arizona
  - given-names: Vanessa
    family-names: Buzzard
    affiliation: >-
      School of Natural Resources and the Environment,
      University of Arizona
    orcid: 'https://orcid.org/0000-0003-2929-0833'
repository-code: >-
  https://github.com/UArizonaGSICampusLivingLab/gsi-wrangling-workflow
license: MIT

GitHub Events

Total
  • Issues event: 2
  • Delete event: 1
  • Issue comment event: 6
  • Push event: 5
  • Pull request event: 4
  • Create event: 2
Last Year
  • Issues event: 2
  • Delete event: 1
  • Issue comment event: 6
  • Push event: 5
  • Pull request event: 4
  • Create event: 2