https://github.com/doi-usgs/rescaling-attributes-template

This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.

https://github.com/doi-usgs/rescaling-attributes-template

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 2 committers (100.0%) from academic institutions
  • Institutional organization owner
    Organization doi-usgs has institutional domain (www.usgs.gov)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary
Last synced: 5 months ago · JSON representation

Repository

This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.

Basic Info
  • Host: GitHub
  • Owner: DOI-USGS
  • License: other
  • Language: R
  • Default Branch: main
  • Size: 2.01 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License

README.md

USGS

Rescaling Attributes (Template Pipeline)

Map of a HUC12 catchment where the outlet is the Delaware River Basin above Ranconcas Creek. The smaller NHDPlus catchments are nestled within the catchment except for one small catchment that crosses the HUC12 boundary at the outlet. There are two prominent labels: (1) `We have data here`, which points to the smaller NHDPlus catchments, and (2) `We want data here`, which points to the HUC12 boundary. The map also depicts rivers/streams and uses the World Topo Map as a base map.

This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.

Motivation

In WMA, our models and projects work with their own special geospatial boundaries. Often, we find ourselves wanting to use the data processed to a certain polygon, but first, we need that data tied to our polygons. The hard way of doing that is to recreate the initial study, and use our polygons to aggregate data. But an easier way, to get a close estimate of the values we need, is to find the amount of overlap between the two polygons, and rescale those attributes with a simple weighted mean or other aggregation methods that make sense for the data in question.​

This kind of rescaling has happened a few times that I have tried to document them in the table below: | Project | Source Attributes | Source Polygons | Target Polygons | Contact | Data Release | | :----------- | :-------------------- | :-------------- | :----------------------------- | :------------------- | :----------- | | HyTest | NHM-PRMS | NHGF V1.0-1.2 | WBD 10-2020 HUC 12 (mainstems) | Sydney Fox | ? | | Natl. IWAAs | Water Use | WBD ??? HUC12s | WBD 10-2020 HUC 12 (mainstems) | Anthony Martinez | In Progress | | PUMP | Geospatial Attributes | NHDPlus V2.1 | NHGF V1.1 | Lauren Koenig-Snyder | In Progress | | NHGF | Geospatial Attributes | NHDPlus V2.1 | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress | | RIMBE-WM | Geospatial Attributes | NHDPlus V2.1 | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress | | RIMBE-SED | SEDAC | County | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress |

Realizing this is a perpetual problem, and attempting to reduce duplicated workflows, we have made a template pipeline that can take in any source and/or target polygon.

Process

The pipeline takes in a set of variables of interest (a subset of the "CAT[attribute]" in `nhdplusTools::getcharacteristics_metadata()`). As of Feb. 2024, the pipeline has only been stress-tested with ~1,254 variables of interest as opposed to the full 14,139 available in the dataset.

The source and target polygons in the template are the NHDPlusV2 (CONUS plus crude transboundary catchments) and WBD 10-2020 HUC12s (CONUS). The Area of Interest (AOI) is defined as the delaware river basin. We defined this AOI, as opposed to using the national polygons, so as to reduce the computational burden. But the same logic can be applied to national analysis.

In phase 2, weights are built using ncdfgeom::calculate_area_intersection_weights(). The attributes are pulled with nhdplusTools::get_catchment_characteristics() and rescaled with basic dplyr functions such as mutate(), group_by(), and summarize(). The formulas below show what we are doing in the process phase.

Phase 3 contains a density plots and choropleth maps built for a one variable of interest to ensure the pipeline is running as intended. Phase 4 contains one map emphasizing areas where the weights should add to one. Caution should be taken in areas where they do not. In addition, we output a flagged version of the weights table where target polygons are flagged if their weights did not add to one.

Outputs

The pipeline produces two main outputs: a weights table and a rescaled attributes table both in .csv format under 2_process/out/.

Attributes

Weights

How to run the pipeline

Package management with renv

This project uses renv to manage packages used by the pipeline. Renv works behind the scenes to ensure that the same package versions used by the pipeline are used across contributors. It installs specific versions of packages to a renv/ folder within the project directory that it loads when library() is invoked, even if the packages are installed elsewhere (e.g., in the .libPaths() path). When opening the project, renv should, behind the scenes, initiate itself and prompt the user for any additional actions needed. If this is the first time using renv on the project, it may take R a little while to set up as specific package versions are downloaded and installed. See Collaboration in renv for more information.

If this is your first time using renv, install it from CRAN. Then try loading the targets library. If you get an error message saying "there is no package called ‘targets’", run renv::install("targets") and follow the prompts. You may need to do the same thing for tarchetypes. Now, you are ready to run tar_make(). This will install a lot of packages and may take a while.

Run the pipeline

This project uses targets to run the pipeline. We assume you have some basic familiarity with it and proficiency in R. If you need help setting up or working through errors, please, contact Ellie White (ewhite@usgs.gov). Follow these steps to run the pipeline with the example data, which the pipeline will fetch for you, and the example attributes, which it will pull from the nhdplusTools package:

1) Open the rescaling-attributes-template.Rproj file in rstudio. 2) Open the _targets.R file. 3) Load in the targets library in the console with: library(targets). 4) Run tar_make() in the console. 5) If you see "End Pipeline [x minutes]" in the console, you have ran the pipeline successfully. Go to 2_process/out to retrieve the results.

Your specific use case

Now, you can modify the pipeline's fetch targets in 1_fetch_targets.R by substituting the targets with your source, target, area of interest geometry, and attribute data and run tar_make() again. Because we are usinng renv, you will need to install additional packages you need with renv::install() and update the renv lockfile (similar to an environment.yaml in Python) with renv::snapshot(). If you want to modify the pipeline for national analysis (instead of a regional area of interest), you will need to simply substitute p2_source_intersected and p2_target_intersected with p2_source and p2_target respectively.

Profiling

The most expensive target to build is intersecting the source polygons with the area of interest taking ~6 min.

Built With

SessionInfo()

``` R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)

Matrix products: default

locale: [1] LCCOLLATE=EnglishUnited States.utf8 LCCTYPE=EnglishUnited States.utf8 LCMONETARY=EnglishUnited States.utf8 [4] LCNUMERIC=C LCTIME=English_United States.utf8

time zone: America/Chicago tzcode source: internal

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] lubridate1.9.2 forcats1.0.0 stringr1.5.1 dplyr1.1.2 purrr1.0.2 readr2.1.4 tidyr1.3.1
[8] tibble
3.2.1 ggplot23.4.2 tidyverse2.0.0 targets_1.1.3

loaded via a namespace (and not attached): [1] tidyselect1.2.0 arrow13.0.0.1 fastmap1.1.1 digest0.6.33 fst0.9.8 base64url1.4
[7] fstcore0.9.14 timechange0.2.0 mime0.12 lifecycle1.0.4 sf1.0-13 ellipsis0.3.2
[13] processx3.8.1 magrittr2.0.3 compiler4.3.0 rlang1.1.1 tools4.3.0 igraph1.4.3
[19] utf81.2.3 yaml2.3.7 data.table1.14.8 knitr1.43 nhdplusTools1.0.1 htmlwidgets1.6.2 [25] bit4.0.5 classInt0.4-9 curl5.0.1 xml21.3.4 abind1.4-5 KernSmooth2.23-20 [31] withr3.0.0 grid4.3.0 fansi1.0.4 e10711.7-13 colorspace2.1-0 scales1.2.1
[37] cli3.6.1 generics0.1.3 rstudioapi0.14 httr1.4.7 tzdb0.4.0 visNetwork2.1.2
[43] DBI1.2.1 pbapply1.7-0 ncdfgeom1.2.0 proxy0.4-27 maps3.4.1 stars0.6-4
[49] assertthat0.2.1 parallel4.3.0 vctrs0.6.5 jsonlite1.8.5 callr3.7.3 hms1.1.3
[55] bit644.0.5 units0.8-2 glue1.6.2 RNetCDF2.7-1 codetools0.2-19 ps1.7.5
[61] stringi1.8.3 ncmeta0.3.6 hydroloom1.0.0 gtable0.3.4 munsell0.5.0 pillar1.9.0
[67] htmltools0.5.5 R62.5.1 sbtools1.3.0 backports1.4.1 class7.3-21 Rcpp1.0.10
[73] zip2.3.0 xfun0.39 pkgconfig_2.0.3
```

Planning

Pipeline planning happend in Mural. plan

Versioning

  • v0.1.0 initial provisional release
  • v0.1.1 bug fixes

Authors

See also the list of contributors who participated in this project.

Contributing

We welcome contributions and suggestions from the community. Please consider reporting bugs or asking questions on the issues page. If you have contributions you would like considered for incorporation into the project you can fork this repository and submit a merge request for review.

Go here for details on adhering by the USGS Code of Scientific Conduct.

License

This project is licensed under the Creative Commons CC0 1.0 Universal License - see the LICENSE.md file for details

Suggested Citation

In the spirit of open source, please cite any re-use of the source code stored in this repository. Below is the suggested citation: * White, E. & Koenig-Snyder, L. & Blodgett, D. & Wieczorek, M. (2024). Rescaling Attributes Template. https://code.usgs.gov/wma/dsp/pipeline-templates/rescaling-attributes-template. [workflow]

This repository contains code produced for the Data Assembly function at the United States Geological Survey (USGS). As a work of the United States Government, this product is in the public domain within the United States.

Acknowledgments

  • Thanks to David Blodgett for writing/modifying the main intersecting function used.
  • Thanks to Michael Wieczorek for reviwing a major merge request and comparing this method to the one developed in Python.
  • Thanks to Anthony Martinez for helping with the security and domain review for the release.

Owner

  • Name: U.S. Geological Survey
  • Login: DOI-USGS
  • Kind: organization
  • Email: gs_help_git@usgs.gov
  • Location: United States of America

By integrating our diverse scientific expertise, we understand complex natural science phenomena and provide scientific products that lead to solutions.

GitHub Events

Total
Last Year

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 59
  • Total Committers: 2
  • Avg Commits per committer: 29.5
  • Development Distribution Score (DDS): 0.475
Past Year
  • Commits: 59
  • Committers: 2
  • Avg Commits per committer: 29.5
  • Development Distribution Score (DDS): 0.475
Top Committers
Name Email Commits
Martinez, Anthony James a****z@u****v 31
ewhite e****e@u****v 28
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels