https://github.com/doi-usgs/rescaling-attributes-template
This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 2 committers (100.0%) from academic institutions -
✓Institutional organization owner
Organization doi-usgs has institutional domain (www.usgs.gov) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.6%) to scientific vocabulary
Repository
This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.
Basic Info
- Host: GitHub
- Owner: DOI-USGS
- License: other
- Language: R
- Default Branch: main
- Size: 2.01 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md

Rescaling Attributes (Template Pipeline)

This repo contains a template {targets} pipeline for rescaling attributes to your intended spatial polygons.
Motivation
In WMA, our models and projects work with their own special geospatial boundaries. Often, we find ourselves wanting to use the data processed to a certain polygon, but first, we need that data tied to our polygons. The hard way of doing that is to recreate the initial study, and use our polygons to aggregate data. But an easier way, to get a close estimate of the values we need, is to find the amount of overlap between the two polygons, and rescale those attributes with a simple weighted mean or other aggregation methods that make sense for the data in question.
This kind of rescaling has happened a few times that I have tried to document them in the table below: | Project | Source Attributes | Source Polygons | Target Polygons | Contact | Data Release | | :----------- | :-------------------- | :-------------- | :----------------------------- | :------------------- | :----------- | | HyTest | NHM-PRMS | NHGF V1.0-1.2 | WBD 10-2020 HUC 12 (mainstems) | Sydney Fox | ? | | Natl. IWAAs | Water Use | WBD ??? HUC12s | WBD 10-2020 HUC 12 (mainstems) | Anthony Martinez | In Progress | | PUMP | Geospatial Attributes | NHDPlus V2.1 | NHGF V1.1 | Lauren Koenig-Snyder | In Progress | | NHGF | Geospatial Attributes | NHDPlus V2.1 | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress | | RIMBE-WM | Geospatial Attributes | NHDPlus V2.1 | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress | | RIMBE-SED | SEDAC | County | WBD 10-2020 HUC 12 (mainstems) | Ellie White | In Progress |
Realizing this is a perpetual problem, and attempting to reduce duplicated workflows, we have made a template pipeline that can take in any source and/or target polygon.
Process
The pipeline takes in a set of variables of interest (a subset of the "CAT[attribute]" in `nhdplusTools::getcharacteristics_metadata()`). As of Feb. 2024, the pipeline has only been stress-tested with ~1,254 variables of interest as opposed to the full 14,139 available in the dataset.
The source and target polygons in the template are the NHDPlusV2 (CONUS plus crude transboundary catchments) and WBD 10-2020 HUC12s (CONUS). The Area of Interest (AOI) is defined as the delaware river basin. We defined this AOI, as opposed to using the national polygons, so as to reduce the computational burden. But the same logic can be applied to national analysis.
In phase 2, weights are built using ncdfgeom::calculate_area_intersection_weights(). The attributes are pulled with nhdplusTools::get_catchment_characteristics() and rescaled with basic dplyr functions such as mutate(), group_by(), and summarize(). The formulas below show what we are doing in the process phase.

Phase 3 contains a density plots and choropleth maps built for a one variable of interest to ensure the pipeline is running as intended. Phase 4 contains one map emphasizing areas where the weights should add to one. Caution should be taken in areas where they do not. In addition, we output a flagged version of the weights table where target polygons are flagged if their weights did not add to one.

Outputs
The pipeline produces two main outputs: a weights table and a rescaled attributes table both in .csv format under 2_process/out/.
Attributes

Weights

How to run the pipeline
Package management with renv
This project uses renv to manage packages used by the pipeline. Renv works behind the scenes to ensure that the same package versions used by the pipeline are used across contributors. It installs specific versions of packages to a renv/ folder within the project directory that it loads when library() is invoked, even if the packages are installed elsewhere (e.g., in the .libPaths() path). When opening the project, renv should, behind the scenes, initiate itself and prompt the user for any additional actions needed. If this is the first time using renv on the project, it may take R a little while to set up as specific package versions are downloaded and installed. See Collaboration in renv for more information.
If this is your first time using renv, install it from CRAN. Then try loading the targets library. If you get an error message saying "there is no package called ‘targets’", run renv::install("targets") and follow the prompts. You may need to do the same thing for tarchetypes. Now, you are ready to run tar_make(). This will install a lot of packages and may take a while.
Run the pipeline
This project uses targets to run the pipeline. We assume you have some basic familiarity with it and proficiency in R. If you need help setting up or working through errors, please, contact Ellie White (ewhite@usgs.gov). Follow these steps to run the pipeline with the example data, which the pipeline will fetch for you, and the example attributes, which it will pull from the nhdplusTools package:
1) Open the rescaling-attributes-template.Rproj file in rstudio.
2) Open the _targets.R file.
3) Load in the targets library in the console with: library(targets).
4) Run tar_make() in the console.
5) If you see "End Pipeline [x minutes]" in the console, you have ran the pipeline successfully. Go to 2_process/out to retrieve the results.
Your specific use case
Now, you can modify the pipeline's fetch targets in 1_fetch_targets.R by substituting the targets with your source, target, area of interest geometry, and attribute data and run tar_make() again. Because we are usinng renv, you will need to install additional packages you need with renv::install() and update the renv lockfile (similar to an environment.yaml in Python) with renv::snapshot(). If you want to modify the pipeline for national analysis (instead of a regional area of interest), you will need to simply substitute p2_source_intersected and p2_target_intersected with p2_source and p2_target respectively.
Profiling
The most expensive target to build is intersecting the source polygons with the area of interest taking ~6 min.
Built With
SessionInfo()
``` R version 4.3.0 (2023-04-21 ucrt) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19045)
Matrix products: default
locale: [1] LCCOLLATE=EnglishUnited States.utf8 LCCTYPE=EnglishUnited States.utf8 LCMONETARY=EnglishUnited States.utf8 [4] LCNUMERIC=C LCTIME=English_United States.utf8
time zone: America/Chicago tzcode source: internal
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lubridate1.9.2 forcats1.0.0 stringr1.5.1 dplyr1.1.2 purrr1.0.2 readr2.1.4 tidyr1.3.1
[8] tibble3.2.1 ggplot23.4.2 tidyverse2.0.0 targets_1.1.3
loaded via a namespace (and not attached):
[1] tidyselect1.2.0 arrow13.0.0.1 fastmap1.1.1 digest0.6.33 fst0.9.8 base64url1.4
[7] fstcore0.9.14 timechange0.2.0 mime0.12 lifecycle1.0.4 sf1.0-13 ellipsis0.3.2
[13] processx3.8.1 magrittr2.0.3 compiler4.3.0 rlang1.1.1 tools4.3.0 igraph1.4.3
[19] utf81.2.3 yaml2.3.7 data.table1.14.8 knitr1.43 nhdplusTools1.0.1 htmlwidgets1.6.2
[25] bit4.0.5 classInt0.4-9 curl5.0.1 xml21.3.4 abind1.4-5 KernSmooth2.23-20
[31] withr3.0.0 grid4.3.0 fansi1.0.4 e10711.7-13 colorspace2.1-0 scales1.2.1
[37] cli3.6.1 generics0.1.3 rstudioapi0.14 httr1.4.7 tzdb0.4.0 visNetwork2.1.2
[43] DBI1.2.1 pbapply1.7-0 ncdfgeom1.2.0 proxy0.4-27 maps3.4.1 stars0.6-4
[49] assertthat0.2.1 parallel4.3.0 vctrs0.6.5 jsonlite1.8.5 callr3.7.3 hms1.1.3
[55] bit644.0.5 units0.8-2 glue1.6.2 RNetCDF2.7-1 codetools0.2-19 ps1.7.5
[61] stringi1.8.3 ncmeta0.3.6 hydroloom1.0.0 gtable0.3.4 munsell0.5.0 pillar1.9.0
[67] htmltools0.5.5 R62.5.1 sbtools1.3.0 backports1.4.1 class7.3-21 Rcpp1.0.10
[73] zip2.3.0 xfun0.39 pkgconfig_2.0.3
```
Planning
Pipeline planning happend in Mural.

Versioning
- v0.1.0 initial provisional release
- v0.1.1 bug fixes
Authors
- Ellie White - Lead Developer - USGS Water Mission Area
- Lauren Koenig-Snyder - Developer - USGS Water Mission Area
See also the list of contributors who participated in this project.
Contributing
We welcome contributions and suggestions from the community. Please consider reporting bugs or asking questions on the issues page. If you have contributions you would like considered for incorporation into the project you can fork this repository and submit a merge request for review.
Go here for details on adhering by the USGS Code of Scientific Conduct.
License
This project is licensed under the Creative Commons CC0 1.0 Universal License - see the LICENSE.md file for details
Suggested Citation
In the spirit of open source, please cite any re-use of the source code stored in this repository. Below is the suggested citation: * White, E. & Koenig-Snyder, L. & Blodgett, D. & Wieczorek, M. (2024). Rescaling Attributes Template. https://code.usgs.gov/wma/dsp/pipeline-templates/rescaling-attributes-template. [workflow]
This repository contains code produced for the Data Assembly function at the United States Geological Survey (USGS). As a work of the United States Government, this product is in the public domain within the United States.
Acknowledgments
- Thanks to David Blodgett for writing/modifying the main intersecting function used.
- Thanks to Michael Wieczorek for reviwing a major merge request and comparing this method to the one developed in Python.
- Thanks to Anthony Martinez for helping with the security and domain review for the release.
Owner
- Name: U.S. Geological Survey
- Login: DOI-USGS
- Kind: organization
- Email: gs_help_git@usgs.gov
- Location: United States of America
- Website: https://www.usgs.gov/
- Twitter: USGS
- Repositories: 59
- Profile: https://github.com/DOI-USGS
By integrating our diverse scientific expertise, we understand complex natural science phenomena and provide scientific products that lead to solutions.
GitHub Events
Total
Last Year
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Martinez, Anthony James | a****z@u****v | 31 |
| ewhite | e****e@u****v | 28 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 11 months ago
All Time
- Total issues: 0
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 0
- Total pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0