AHGestimation
AHGestimation: An R package for computing robust, mass preserving hydraulic geometries and rating curves - Published in JOSS (2024)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
2 of 2 committers (100.0%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Last synced: 6 months ago
·
JSON representation
Repository
Estimating robust, mass conserving AHG relationships
Basic Info
- Host: GitHub
- Owner: mikejohnson51
- License: mit
- Language: HTML
- Default Branch: master
- Homepage: https://mikejohnson51.github.io/AHGestimation/
- Size: 42.2 MB
Statistics
- Stars: 6
- Watchers: 3
- Forks: 2
- Open Issues: 0
- Releases: 1
Created about 6 years ago
· Last pushed 11 months ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
bibliography: [paper/paper.bib]
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dev = "jpeg",
warning = FALSE,
message = FALSE
)
library(dplyr)
library(ggplot2)
library(patchwork)
```
[](https://doi.org/10.21105/joss.06145)
[](https://github.com/mikejohnson51/AHGestimation/actions/workflows/R-CMD-check.yaml)
[](https://choosealicense.com/licenses/mit/)
[](https://www.repostatus.org/#active)
[](https://codecov.io/github/mikejohnson51/AHGestimation)
[](#)
[](https://github.com/mikejohnson51/AHGestimation/actions/workflows/pkgdown.yaml)
# AHGestimation
## Installation
```{r, eval = FALSE}
# install.packages("remotes")
remotes::install_github("mikejohnson51/AHGestimation")
```
# Introduction
The behavior of a river system at a location can be represented as the relationship between discharge (Q), mean depth (Y), mean velocity (V), and mean top width (TW). If you have these measurements over a long period of time, a relationship between how much water (Q) is in the channel and the corresponding Y, V, and TW can be established. The idea of 'at-a-station hydraulic geometry' (AHG) suggests three power laws can adequately describe these relations [@leopold1953hydraulic]:
$$
TW = a \cdot Q^b
$$
$$
Y = c \cdot Q^f
$$
$$
V = k \cdot Q^m
$$
Each **single relationships** describes a component of the channel behavior (see section [Fitting single relationships]). For example, the Q-Y relation is similar to the **traditional rating curve** that relates stage to discharge where stage is given as a height of water (H) above a datum (Ho).
$$
Q = c \cdot (H - H_o)^f
$$
Traditionally, each relationship has been fit independent of one another using ordinary least square regression (OLS) on the log transformed time series of the respective components.
Some research has moved beyond pure OLS fitting with a (typically) singular focus on the rating curve relation. For example the Texas Water Resources Institute suggests a non linear least squares regression (NLS) ([here](https://txwri.github.io/r-manual/stage-discharge.html)), and Bayesian hierarchical modeling have been shown in other cases to be beneficial ([@hrafnkelsson2022generalization], R implementation [here](https://CRAN.R-project.org/package=bdrc))
When trying to fit a **hydraulic system** - or - all three equations - there are additional considerations . Notably continuity dictates that no water is gained or lost such that:
$$
Q = TW·Y·V
$$
and:
$$
(b + f + m) = (a·c·k) = 1
$$
**Critically, neither OLS or NLS solvers can ensure this characteristic of the solution.** (see [Fitting hydraulic systems] and this [article](https://mikejohnson51.github.io/AHGestimation/articles/traditonal-ahg.html)). Instead, this can be achieved by either
(1) Preprocessing data based on thresholds (see @enzminger_thomas_l_2023_7868764 and @afshari2017statistical) (see AHGestimation functions in [here](https://mikejohnson51.github.io/AHGestimation/articles/data-filtering.html)), or
(2) as we suggest in this package, using a evolutionary solver - along with OLS and NLS fits - to optimally solve relationships in accordance with each other (see detailed description [here](https://mikejohnson51.github.io/AHGestimation/articles/improved-ahg.html)).
The aim of this package is to provide increased AHG fitting flexibility, that ensures mass conservation in hydraulic systems, and optimal curve fitting for single relations.
# AHGestimation
Using USGS field measurements at the gage (National Water Information System or NWIS site) located on [Nashua River at East Pepperell, MA](https://waterdata.usgs.gov/nwis/measurements/?site_no=01096500&agency_cd=USGS) we can illustrate 4 capabilities this package offers:
1. Fitting single relations with OLS and NLS ([Fitting single relationships])
2. Estimating hydraulic system with a mass conserving ensemble fitting method ([Fitting hydraulic systems])
3. Data preprocessing (removing outliers based on defined criteria) ([Data Filtering])
4. Deriving cross section shapes and traits from the AHG fits ([Hydraulic Shape Estimation]).
The example data is included in the package, and its development can be seen [here](https://github.com/mikejohnson51/AHGestimation/blob/master/data-raw/nwis.R).
```{r}
library(AHGestimation)
glimpse(nwis)
```
```{r, echo = F}
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = Y_m)) +
theme_light() +
labs(title = 'Q-Y Relationship', x = "Q (cms)", y = "Y (m)") +
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = TW_m)) +
theme_light() +
labs(title = 'Q-TW Relationship', x = "Q (cms)", y = "TW (m)") +
ggplot(data = nwis ) +
geom_point(aes(x = Q_cms, y = V_ms)) +
theme_light() +
labs(title = 'Q-V Relationship', x = "Q (cms)", y = "V (m/s)")
```
## Fitting single relationships
The `AHGestimation::ahg_estimate(...)` function can be used to fit single relationships using both NLS and OLS solutions. Below we fit the Q-Y relation by passing a `data.frame` with the column names `Q` and `Y`. More columns could be included in the input, but only those with `Q`, `Y`, `V`, and/or `TW` are used.
```{r}
(sf = nwis %>%
#input column names must be Q,Y,V and/or TW
select(date, Q = Q_cms, Y = Y_m) %>%
ahg_estimate())
```
The returned object rank sorts the solutions based on the nrmse of the simulated Y values compared to the input data. For convenience, the pBias is also reported. Below, we can see the variation in the NLS (blue) and OLS (red) solutions.
```{r, echo = FALSE}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = sf$coef[1] * Q_cms ^ (sf$exp[1]), col = "NLS")) +
geom_line(data = nwis, aes(x = Q_cms, y = sf$coef[2] * Q_cms ^ (sf$exp[2]), col = "OLS")) +
theme_light() +
theme(legend.position = "bottom") +
labs(title = 'Q-Y Relationship', color = "Fit", x = "Q (cms)", y = "Y (m)") +
scale_color_manual(values = c( "NLS" = "blue", "OLS" = "red"))
```
When 2 relationships (e.g. Q-Y and Q-V) are passed to `ahg_estimate(...)` the default behavior is to return a single solution based on the minimum nrmse. Since only 2 of the 3 hydraulic traits are passed, mass conservation cannot be checked here.
```{r}
(nwis %>%
#input column names must be Q,Y,V and/or TW
select(Q = Q_cms, Y = Y_m, V = V_ms) %>%
ahg_estimate())
```
If you would like all fits (OLS and NLS), the `full_fitting` parameter can be set to `TRUE`.
```{r}
(nwis %>%
#input column names must be Q,Y,V and/or TW
select(Q = Q_cms, Y = Y_m, V = V_ms) %>%
ahg_estimate(full_fitting = TRUE))
```
## Fitting hydraulic systems
When we have data for all three hydraulic relationships we can ensure the solutions found meet continuity/conserve mass.
In this mode the OLS and NLS models are fit first, and if continuity is not met in best solution (e.g. lowest nmse), then an Evolutionary Approach (nsga2; @mco) is implemented (see this [article](https://mikejohnson51.github.io/AHGestimation/articles/optimize-nsga2.html) for more details).
Doing so produces three unique fits for each relationship (27 total combinations). These are crossed to identify the best performing solution that meets continuity at a prescribed allowance. The allowance specifies the amount that each continuity expression can deviate from 1. More on this can be found at the vignette [here](https://mikejohnson51.github.io/AHGestimation/articles/improved-ahg.html)
As before, the results are rank ordered by minimum nrmse and viability (viable = does the solution meet continuity within the prescribed allowance?)
```{r}
(x = nwis %>%
select(Q = Q_cms, Y = Y_m, V = V_ms, TW = TW_m) %>%
ahg_estimate(allowance = .05))
```
Overall an combination of the NLS and nsga2 provides an error minimizing, viable solution:
```{r, echo = F}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$Y_coef[1] * Q_cms ^ (x$Y_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-Y Relationship', color = "Fit", x = "Q (cms)", y = "Y (m)") +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = TW_m), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$TW_coef[1] * Q_cms ^ (x$TW_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-TW Relationship', color = "Fit", x = "Q (cms)", y = "TW (m)") +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = V_ms), col = "black") +
geom_line(data = nwis, aes(x = Q_cms, y = x$V_coef[1] * Q_cms ^ (x$V_exp[1]), col = "bestValid")) +
theme_light() +
labs(title = 'Q-V Relationship', color = "Fit", x = "Q (cms)", y = "V (m/s)") +
plot_layout(guides = "collect") &
theme(legend.position = 'bottom')
```
## Data Filtering
Due to the volatility of river systems, hydraulic data is often very noisy. While the `ahg_estimate` tool is intended to reduce this noise and produce a mass-conserving hydraulic fit, it is also possible to filter the data prior to fitting. The range of data filtering options provided are documented in the [data-filtering vignette](https://mikejohnson51.github.io/AHGestimation/articles/data-filtering.html) and an example is provided below:
```{r}
filtered_data = nwis %>%
select(date, Q = Q_cms, Y = Y_m, V = V_ms, TW = TW_m) %>%
# Keep the most recent 10 year
date_filter(year = 10, keep_max = TRUE) %>%
# Keep data within 3 Median absolute deviations (log residuals)
mad_filter() %>%
# Keep data that respects the Q = vA criteria w/in allowance
qva_filter()
(ahg_fit = ahg_estimate(filtered_data))
```
Ultimately we recommend selecting fits that conserve mass (`viable = TRUE`) and has the lowest error (any of tot_nrmse, V_nrmse, TW_nrmse, or Y_nrmse) depending on the use case.
When the data is effectively filtered we see NLS can provide an error minimizing, valid solution for the system that is quite different then the full data fit. Further, the nsga2 algorithm did not need to be invoked:
```{r, echo = F}
ggplot() +
geom_point(data = nwis, aes(x = Q_cms, y = Y_m, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$Y_coef[1] * Q_cms ^ (x$Y_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = Y, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$Y_coef[1] * Q ^ (ahg_fit$Y_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-Y Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
ggplot(data = nwis ) +
geom_point(data = nwis, aes(x = Q_cms, y = TW_m, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$TW_coef[1] * Q_cms ^ (x$TW_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = TW, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$TW_coef[1] * Q ^ (ahg_fit$TW_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-TW Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
ggplot(data = nwis) +
geom_point(data = nwis, aes(x = Q_cms, y = V_ms, col = "Full")) +
geom_line(data = nwis, aes(x = Q_cms, y = x$V_coef[1] * Q_cms ^ (x$V_exp[1]), col = "Full")) +
geom_point(data = filtered_data, aes(x = Q, y = V, col = "Filter")) +
geom_line(data = filtered_data, aes(x = Q, y = ahg_fit$V_coef[1] * Q ^ (ahg_fit$V_exp[1]), col = "Filter")) +
theme_light() +
labs(title = 'Q-V Relationship') +
scale_color_manual(values = c("Filter" = "red", "Full" = "black")) +
plot_layout(guides = "collect") &
theme(legend.position = 'bottom')
```
## Hydraulic Shape Estimation
Lastly, a range of functions have been added to extend the AHG parameters into cross section hydraulics and geometry. These come primarily from [@dingman2018field] and are described in detail [here](https://mikejohnson51.github.io/AHGestimation/articles/hydraulics.html)
```{r}
# Compute hydraulic parameters
(hydraulic_params = compute_hydraulic_params(ahg_fit))
# Estimate roughness
# Slope is taken from the NHD reach associated with our gage
compute_n(filtered_data, S = 0.01463675)
```
Of particular note, the `r` value describes the theoretical shape of the channel ranging from a triangle (r = 1) to a rectangle (r = ∞). When paired with a TW and Depth (here assuming max of the record) a generalized cross section can be derived. The returned data.frame provide a point index (from left bank looking upstream) the associated relative x and absolute Y position, and the cross sectional area for that Y.
```{r}
cs = cross_section(r = hydraulic_params$r,
TW = max(filtered_data$TW),
Ymax = max(filtered_data$Y))
glimpse(cs)
```
```{r, echo = FALSE}
ggplot(data = cs) +
geom_point(aes(x = x, y = Y)) +
geom_line(aes(x = x, y = Y)) +
theme_light() +
geom_hline(yintercept = max(cs$Y), color = "darkgreen") +
geom_hline(yintercept = 2, color = "blue", linewidth = 2) +
geom_vline(xintercept = mean(cs$x), color = "darkgreen") +
labs(x = "Relative Width", y = "Depth", title = "Shape") +
ggplot(data = cs) +
geom_point(aes(x = A, y = Y)) +
geom_line(aes(x = A, y = Y)) +
geom_hline(yintercept = 2, color = "blue", linewidth = 2) +
theme_light() +
labs(Y = "Depth", x = "Cross Sectional Area", title = "Area to Depth")
```
# History
The development of this package began as a graduate school project between friends at UC Santa Barbara and UMass Amherst following the 2017 NOAA OWP Summer Institute and clear evidence channel shape may be a limiting factor in National Water Model Performance. It has since evolved to provide an open source utility for robust large scale data synthesis and evaluation. Funding from the National Science Foundation (Grants 1937099, 2033607) provided time to draft @preprint and apply an early version of this software to the [Continental Flood Inundation Mapping (CFIM) synthetic rating curve dataset [@cfim]. Funding from the National Oceanic and Atmospheric Administration's Office of Water Prediction supported the addition of data filtering and hydraulic estimation, improved documentation, and code improvement We are grateful to all involved.
# Contributing
First, thanks for considering a contribution! We hope to make this package a community created resource!
- Please attempt to describe what you want to do prior to contributing by submitting an issue.
- Please follow the typical github fork - pull-request workflow.
- Contributions should be tested with `testthat` by running `devtools::test()`.
- Code style should attempt to follow the [tidyverse style guide](https://style.tidyverse.org/).
- Make sure you use `roxygen` and run `devtools::check()` before contributing.
Other notes:
- Consider running `goodpractice::gp()` on the package before contributing.
- Consider running `devtools::spell_check()` and `devtools::document()` if you wrote documentation.
- Consider running `devtools::build_readme()` if you made any changes.
- This package uses pkgdown. Running `pkgdown::build_site()` will refresh it.
# References
Owner
- Name: MikeJohnson-NOAA
- Login: mikejohnson51
- Kind: user
- Location: Fort Collins, CO
- Company: @lynker-spatial, @NOAA-OWP
- Website: http://mikejohnson51.github.io
- Repositories: 134
- Profile: https://github.com/mikejohnson51
Geography | Data Science | Water Resources
JOSS Publication
AHGestimation: An R package for computing robust, mass preserving hydraulic geometries and rating curves
Published
April 18, 2024
Volume 9, Issue 96, Page 6145
Authors
Tags
hydrology AHG optimization hydraulicsGitHub Events
Total
- Push event: 13
- Pull request event: 1
Last Year
- Push event: 13
- Pull request event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| mikejohnson51 | j****0@u****u | 66 |
| arashmodrad | a****d@u****u | 4 |
Committer Domains (Top 20 + Academic)
u.boisestate.edu: 1
ucsb.edu: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 6
- Average time to close issues: 24 days
- Average time to close pull requests: about 1 month
- Total issue authors: 3
- Total pull request authors: 3
- Average comments per issue: 4.25
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 7 months
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- mikejohnson51 (2)
- mabesa (2)
Pull Request Authors
- arashmodrad (4)
- mikejohnson51 (2)
- JimColl (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
.github/workflows/R-CMD-check.yaml
actions
- actions/checkout v2 composite
- actions/upload-artifact master composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/test-coverage.yaml
actions
- actions/checkout v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
DESCRIPTION
cran
- R >= 3.5.0 depends
- DescTools * imports
- dplyr * imports
- mco * imports
- stats * imports
- distill * suggests
- ggplot2 * suggests
- kableExtra * suggests
- knitr * suggests
- nhdplusTools * suggests
- patchwork * suggests
- rmarkdown * suggests
- testthat >= 3.0.0 suggests
- tidyr * suggests
