https://github.com/csu-agricultural-water-quality-program/als-data-cleaning-tool

A coding tool developed in R to take water analysis results exported from the ALS WEBTRIEVE™ data portal. Exported data are cleaned, merged, and exported into archiving (e.g., CSV) or visual (e.g., HTML) formats.

https://github.com/csu-agricultural-water-quality-program/als-data-cleaning-tool

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.2%) to scientific vocabulary

Keywords

data-cleaning data-visualization water-quality
Last synced: 5 months ago · JSON representation

Repository

A coding tool developed in R to take water analysis results exported from the ALS WEBTRIEVE™ data portal. Exported data are cleaned, merged, and exported into archiving (e.g., CSV) or visual (e.g., HTML) formats.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
data-cleaning data-visualization water-quality
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Authors

README.md

AWQP Logo

ALS Data Cleaning Tool

A coding tool developed in R to take water analysis results exported from the ALS WEBTRIEVE™ data portal. Exported data are cleaned, merged, and exported into archiving (e.g., CSV) or visual (e.g., HTML) formats. The tool also merges data produced in-house (i.e., pH, Total Suspended Soilds, and Specific Electrical Conductivity) with the final exported data. Before using, please see the data disclosure statement below.

Created By:

A.J. Brown
Agricultural Data Scientist
CSU Agricultural Water Quality Program
Ansley.Brown@colostate.edu

and

Molly "Caz" Bell
Former AWQP Student Research Assitant


Table of Contents

- Data Disclosure Statement for ALS-Data-Cleaning-Tool

Repo Contents

  • Code
    • config.R - an R script where data file paths are declared for use in the other scripts. This is the only script that needs to be edited by the user to run the code locally.
    • file-merger.R - a script that takes the raw files from ALS, cleans the data, and exports as a dataframe and/or CSV file for other uses
    • site_selector.Rmd - a markdown script that produces an HTML report with interactive data visualization and optional data export for a specific research site location as chosen by the user
    • water_report.Rmd - a markdown script that produces a summary of all water quality data for all sites on a YTD basis
  • Data
  • figures
    • a folder containing the AWQP logo for use in this README.
  • Report
    • a folder containing the resulting HMTL report that result from execution of the "water_report.Rmd" script.
    • Please note that this is not where the HTML report is generated, but rather where it was moved after generation. It is currently generated in the Code folder, which is an ongoing bug to be addressed at a later time.
  • siteselectortemporary_results
    • a folder containing the resulting HMTL report and CSV file that result from execution of the "site_selector.Rmd" script.
    • Please note that this is not where the HTML report is generated, but rather where it was moved after generation. It is currently generated in the Code folder, which is an ongoing bug to be addressed at a later time. The CSV, however, is generated in this folder upon execution.
  • TSS
    • a separate folder where an excel file exists containing example total suspended solids, electrical conductivity, and pH data is stored in the format congruent with what AWQP staff use in-house. It is in a separate folder because this mimics the true deployment of this code outside of the github repo.
  • ALSDataCleaning_Tool.Rproj
    • the R project file used to open the codebase into the correct directory for Rstudio. Double click this file to begin using the coding tool in Rstudio.
  • AWQP Water Processing ID Key 2023
    • a separate excel file that details the ID protocols used to label each water sample, that is the basis for this codebase to dissect information back into a more useable format.

Code Workflow

```mermaid

flowchart TD

water((Water Sample Collection and Processing)):::wat --> ALS & AWQP

subgraph "Lab Analysis"
ALS(ALS Environmental Laboratories):::als --> houston & Kelso Kelso(Kelso, WA Laboratory \n ):::labs --> kelso houston(Houston, TX Laboratory):::labs--> result & sample AWQP --> tss(TSS Data):::dat

end tss --> file(Local file folder) result(Nutrient Results Data):::dat --> file sample(Sample Meta Data):::dat --> file kelso(Heavy Metal Data):::dat --> file AWQP(AWQP Laboratory):::labs --> tss

subgraph "ALS Data Cleaning Tool"

file:::r --> import(Import Data):::r
import --> clean(Clean Data):::r
clean --> merge(Merge Data):::r
merge --> export(Process & Export Data):::r

end

subgraph "Results" export -->|Site-Specific Report and Data| E{{Creates data report for one, user-selected research site}}:::out

export -->|Water Sample Report for All Samples| D{{Creates data report for all
collected water sample data in local file folder}}:::out

export -->|Master Database| F{{Export final data into master database for
future use}}:::out

end

classDef labs fill:#E3CDB1, stroke:#333, stroke-width:2px

classDef dat fill:#82C503, stroke:#333, stroke-width:2px

classDef als fill:##CCCCCC, stroke:#333, stroke-width:2px

classDef wat fill:#008FB3, stroke:#333, stroke-width:2px, color:#fff

classDef r fill:#006144, stroke:#333, stroke-width:2px, color:#fff

classDef out fill:#C8C372, stroke:#333, stroke-width:2px, color:#000000

```

Some Helpful Troubleshooting Notes

  • Currently, the output HTML reports get generated in the /Code folder and need to be moved manually to other locations (in this case "Reports" or "siteselectortemporary_results").

  • ALS exports data as ".xls" file format in name only. The actual result files are in ".htm" format. The sample metadata, however is actually in ".xls" format, as found on the "samples" tab.

  • If you edit the .htm file from ALS in excel and save it, it corrupts the file and cannot then be imported. This occurred for us once when we tried to edit an erroneous date.

  • When running file-merger.R, which occurs in all the markdown scripts, you CANNOT have the TSS excel file open, else an error will occur

  • It is helpful to create a "Confidential Data" forlder in the repo you fork, to run the code locally with real data. This avoids copy/pasting the entire repo elsewhere to operate it.

Future Developments

  • Overall
    • Automate the emailing of the report(s) after generation
    • fix the issue with HTML reports being saved in /Code folder
    • ~~Create confidential data folder to run using real ALS data~~
    • ~~create confidential TSS folder to run using real data~~
  • file-merger
    • ~~Add integration of separate excel file results when ALS Houston exports work to other ALS labs (e.g., when Selenium analysis is requested.)~~
    • Fix sample hold time flag "H" in flagData funciton in file_merger.R
    • Add GPS coordinates dictionary to assign coordinates to each location and treatment (as necessary)
    • Develop a method to incorporate flow volume and calculate mass loads
      • Develop a standard format corresponding to ID?
    • Add functionality to merge lab results from Colorado Dept. of Ag AWQP
    • ~~Fix NA values in columns for TSS data by adding AWQP lab reported values (DILUTION, RESULT.REPORTED.TO, MDL, RL)~~
    • ~~Drop unnecesary columns (REPORT.BASIS, PERCENT.MOISTURE, PERCENT.SOLID, LAB.ID.y, MATRIX, HOLD)~~
    • ~~Add static box and whisker plot to show distribution of results for each analyte over all irrigation/storm events~~
    • ~~Add static box and whisker plot to show distribution of results for each analyte over all irrigation/storm events~~
    • ~~Add method to include/exclude lab blanks from the data~~
  • water_report
    • See if we can parse Kerbel results by treatment, and any other sties with treatements
    • ~~Create table of methods used in introduction~~
    • ~~fix Kerbel inflow/outflow not being colorized properly~~
  • site_selector
    • Set the plotly plots to start at the correct y range
    • Fix scatter plotly plot points for Point Sample on the violin plot
    • Potentially create a way to run the functions for multiple sites
    • Add histograms and correlations to scatterplot matrix
    • ~~Put analyte labels outside of the scatterplot matrix and shorten their names with a analyte dict.~~
    • ~~Fix mutate error in scatterplot matrix~~

Data Disclosure Statement for ALS-Data-Cleaning-Tool

1. Purpose and Scope

This Data Disclosure Statement ("Statement") applies to all data contained within the ALS-Data-Cleaning-Tool repository hosted on GitHub (the "Repository"). The Statement's intent is to clarify the purpose, limitations, and appropriate usage of the data shared in the Repository.

2. Data Usage

The data provided in this Repository is for example purposes only. Users are explicitly discouraged from interpreting, sharing, or using this data for applications other than illustrative or educational examples related to the Repository's described purposes.

3. Limitations

The data in this Repository:

  • Has not been controlled, verified, or vetted for use as final data in any application, project, or production environment.
  • May not be complete, accurate, or representative of real-world scenarios.
  • May contain inaccuracies or errors that could affect its utility or interpretation.

4. No Warranty

The data is provided "as is" without any warranties, either express or implied. The maintainers of ALS-Data-Cleaning-Tool make no representations or warranties about the suitability of this data for any purpose. Use at your own risk.

5. Restrictions

Users are prohibited from:

  • Claiming the data as their own original work.
  • Using the data for commercial purposes without prior written permission.
  • Distributing or sharing the data without including this Data Disclosure Statement.

6. Liability

The maintainers of ALS-Data-Cleaning-Tool, contributors, or any affiliated parties shall not be held liable for any direct, indirect, incidental, special, exemplary, or consequential damages (including, but not limited to, procurement of substitute goods or services; loss of use, data, or profits; or business interruption) arising from the use or inability to use the data.

7. Acknowledgement

Users who reference or cite the Repository or its data in any public forums, publications, presentations, or similar mediums should acknowledge the source and include a link to the original Repository and this Data Disclosure Statement.


By accessing, downloading, or using the data, you acknowledge that you have read, understood, and agreed to comply with the terms outlined in this Data Disclosure Statement.

For any questions or clarifications regarding this statement or the data provided, please contact:

A.J. Brown
Agricultural Data Scientist
Colorado State University Agricultural Water Quality Program
Ansley.Brown@colostate.edu

Owner

  • Name: CSU-AWQP
  • Login: CSU-Agricultural-Water-Quality-Program
  • Kind: organization
  • Email: AgWaterQuality@colostate.edu
  • Location: United States of America

The Agricultural Water Quality Program (AWQP) protects Colorado state waters and the environment from the improper use of agricultural chemicals.

GitHub Events

Total
  • Push event: 12
Last Year
  • Push event: 12

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 129
  • Total Committers: 3
  • Avg Commits per committer: 43.0
  • Development Distribution Score (DDS): 0.256
Past Year
  • Commits: 122
  • Committers: 3
  • Avg Commits per committer: 40.667
  • Development Distribution Score (DDS): 0.27
Top Committers
Name Email Commits
ansleybrown1337 a****7@g****m 96
mollycb8 m****8@g****m 32
AJ Brown 3****7 1

Issues and Pull Requests

Last synced: about 2 years ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • ansleybrown1337 (1)
Top Labels
Issue Labels
Pull Request Labels