waterquac

R package for water quality data extraction and anomaly detection

https://github.com/unclecamswaterplans/waterquac

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary

Keywords

anomaly-detection anomaly-detection-algorithm api-wrapper r-package water-quality water-resources
Last synced: 6 months ago · JSON representation ·

Repository

R package for water quality data extraction and anomaly detection

Basic Info
  • Host: GitHub
  • Owner: UncleCamsWaterPlans
  • License: apache-2.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 25.5 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Topics
anomaly-detection anomaly-detection-algorithm api-wrapper r-package water-quality water-resources
Created almost 3 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

waterQUAC

An R library for quality control and anomaly detection in water quality datasets.

Overview

waterQUAC has been developed as a way of sharing water quality anomaly detection functions and data extraction resources. This package provides tools to streamline quality control processes for researchers and water quality professionals.

Features:

Detecting anomalies in time series water quality or quantity data:

The waterQUAC R library was developed to provide water quality practitioners with tools for automated data validation and anomaly detection. The package includes the TSanom() function which identifies anomalies in time series environmental sensor data by applying a combination of rule-based checks, designed to detect: - Physical bounds violations: Detects values falling outside manufacturer-specified sensor limits. - Impossible values: Identifies values that are physically implausible, such as negative values from sensors reporting non-negative measurements. - Flatlining: Identifies repeated or constant values using rolling standard deviation over a defined time window. This is often associated with errors in digital sensors whereby an unsuccessful measurement results in reporting the last value in the memory. - Spikes: Detects sudden deviations from the rolling median centred on a predefined time window. The rolling median is then compared to a threshold calculated by the rolling standard deviation of the same window. This technique is similar to Bollinger bands, a common metric in financial trading, substituting the mean with the median to reduce sensitivity to outliers and better represent the expected baseline signal.

This function modifies the quality code for each observation with each point giving a first pass to the incoming dataset allowing end users and applications to omit obviously suspect data. Data that does not get flagged in any of the defined rules gets a default of [OK]. User-assigned quality codes can be preserved or selectively overwritten, ensuring that manual or expert-reviewed flags persist.

These methods provide a first-pass filter for raw telemetry data, enabling automated QA/QC at scale. These functions have been integrated into Microsoft-Azure based engineering pipelines which has been scaled to batch process over 100 sensors across Queensland at hourly intervals. These sensors monitor six key water quality parameters: Water Level, Conductivity, Turbidity, Temperature, Nitrate (as N), and Total Suspended Solids. The pipeline processes data hourly, aligned with telemetry intervals, and outputs validated datasets to internal systems and external stakeholders.

Data Extraction

waterQUAC includes data-extraction functions for the following resources: - Water Monitoring Information Portal - Long Paddock - SILO - OpenWeatherMap - Eagle.IO

Installation

The library can be installed directly from GitHub: r remotes::install_github("https://github.com/UncleCamsWaterPlans/waterQUAC")

Usage

Anomaly Detection

```r library(waterQUAC) library(plotly)

example Total Suspended Solids dataframe

df <- waterQUAC::TSS_data

overwritable QC codes, all else are retained. In this case, all codes will be overwritten

manual_codes = c(1:4000)

upper and lower limits for the sensor uses (Trios Opus)

sensorMin = 0 sensorMax = 650

tst <- tsanom(df = df, overwrite = manualcodes, sensorMin = 0, sensorMax = 650) tst |> plotly::plotly() |> plotly::addmarkers( x = ~ ts, y = ~ Value, type = "scatter", color = ~ Quality ) ```

Data Extraction

```r library(waterQUAC) library(plotly)

import discharge data from WMIP (Herbert River at Ingham - 1160001F)

discharge <- waterQUAC::wmip_hist("116001F", "discharge", "AT", "20220701", "20230630")

extract gridded weather obs data from that locaiton (SILO)

rain <- waterQUAC::silo_grid(lat = "-18.62831", long = "146.16486", start = "20220701", finish = "20230630", username = "example@email.com.au")

plot daily rainfall against stream discharge

discharge %>% plotly() %>% addtrace( x = ~ time, y = ~ value, mode = "lines", name = "Stream Discharge (m^3/s)", type = "scatter", fill = "tozeroy", line = list( color = "tozeroy", width = 2.5, dash = 'solid' ), connectgaps = TRUE ) %>% add_trace( x = ~ rain$Date, y = ~ rain$Rain, type = 'bar', yaxis = "y3", name = "Daily Rainfall", marker = list( color = "darkblue", opacity = 0.3, size = 10 ) ) %>%

layout( legend = list(orientation = 'h'), xaxis = list(title = FALSE, showgrid = FALSE, domain=c(0,0.85)), yaxis = list( title = list(text = "Stream Level (m)", font = list(size = 15)), showgrid = FALSE, side = "right" ), yaxis3 = list( tickfont = list(color = "darkblue"), showgrid = FALSE, overlaying = "y", side = "right", anchor = "free", position = 0.92, autorange = "reversed", title = list(text = "Rainfall (mm)", font = list(size = 15)) ) ) %>% #Add modebar buttons config( modeBarButtonsToAdd = list( 'drawline', 'drawopenpath', 'drawclosedpath', 'drawcircle', 'drawrect', 'eraseshape' ) ) ```

Citation

If you use waterQUAC in your research, please cite accordingly (see About > Cite this repository).

Owner

  • Name: Cameron Roberts
  • Login: UncleCamsWaterPlans
  • Kind: user
  • Company: Queensland Department of Environment and Science

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: "Roberts"
    given-names: "Cameron"
    orcid: "https://orcid.org/0000-0002-2976-3154"
title: "waterQUAC: An R Library for Automated Water Quality Data Validation"
version: v1.0.0
date-released: 2022-02-09
url: "https://github.com/UncleCamsWaterPlans/waterQUAC"

GitHub Events

Total
  • Release event: 1
  • Delete event: 1
  • Push event: 9
  • Pull request event: 2
  • Create event: 2
Last Year
  • Release event: 1
  • Delete event: 1
  • Push event: 9
  • Pull request event: 2
  • Create event: 2

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 61
  • Total Committers: 3
  • Avg Commits per committer: 20.333
  • Development Distribution Score (DDS): 0.41
Past Year
  • Commits: 19
  • Committers: 3
  • Avg Commits per committer: 6.333
  • Development Distribution Score (DDS): 0.105
Top Committers
Name Email Commits
UncleCamsWaterPlans c****s@d****u 36
Cameron Roberts 8****s 24
souzadiasf 1****f 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • UncleCamsWaterPlans (4)
Top Labels
Issue Labels
Pull Request Labels