hydrotools

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.

https://github.com/noaa-owp/hydrotools

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
✓
Committers with academic emails
4 of 7 committers (57.1%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary

Keywords

evaluation forecasting hydrology modeling noaa observations pandas python simulation validation verification

Last synced: 6 months ago · JSON representation ·

Repository

Suite of tools for retrieving USGS NWIS observations and evaluating National Water Model (NWM) data.

Basic Info

Host: GitHub
Owner: NOAA-OWP
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 16.6 MB

Statistics

Stars: 59
Watchers: 4
Forks: 19
Open Issues: 11
Releases: 14

Topics

evaluation forecasting hydrology modeling noaa observations pandas python simulation validation verification

Created about 5 years ago · Last pushed 7 months ago

Metadata Files

Readme Contributing License Citation Security

Documentation

OWPHydroTools GitHub pages documentation

Motivation

We developed OWPHydroTools with data scientists in mind. We attempted to ensure the simplest methods such as get both accepted and returned data structures frequently used by data scientists using scientific Python. Specifically, this means that pandas.DataFrames, geopandas.GeoDataFrames, and numpy.arrays are the most frequently encountered data structures when using OWPHydroTools. The majority of methods include sensible defaults that cover the majority of use-cases, but allow customization if required.

We also attempted to adhere to organizational (NOAA-OWP) data standards where they exist. This means pandas.DataFrames will contain column labels like usgs_site_code, start_date, value_date, and measurement_unit which are consistent with organization wide naming conventions. Our intent is to make retrieving, evaluating, and exporting data as easy and reproducible as possible for scientists, practitioners and other hydrological experts.

What's here?

We've taken a grab-and-go approach to installation and usage of OWPHydroTools. This means, in line with a standard toolbox, you will typically install just the tool or tools that get your job done without having to install all the other tools available. This means a lighter installation load and that tools can be added to the toolbox, without affecting your workflows!

It should be noted, we commonly refer to individual tools in OWPHydroTools as a subpackage or by their name (e.g. nwis_client). You will find this lingo in both issues and documentation.

Currently the repository has the following subpackages:

events: Variety of methods used to perform event-based evaluations of hydrometric time series
nwm_client: Provides methods for retrieving National Water Model data from various sources including Google Cloud Platform and NOMADS
metrics: Variety of methods used to compute common evaluation metrics
nwis_client: Provides easy to use methods for retrieving data from the USGS NWIS Instantaneous Values (IV) Web Service
svi_client: Provides programmatic access to the Center for Disease Control's (CDC) Social Vulnerability Index (SVI)
_restclient: A generic REST client with built in cache that make the construction and retrieval of GET requests painless

UTC Time

Note: the canonical pandas.DataFrames used by OWPHydroTools use time-zone naive datetimes that assume UTC time. In general, do not assume methods are compatible with time-zone aware datetimes or timestamps. Expect methods to transform time-zone aware datetimes and timestamps into their timezone naive counterparts at UTC time.

Usage

Refer to each subpackage's README.md or documentation for examples of how to use each tool.

Installation

In accordance with the python community, we support and advise the usage of virtual environments in any workflow using python. In the following installation guide, we use python's built-in venv module to create a virtual environment in which the tools will be installed. Note this is just personal preference, any python virtual environment manager should work just fine (conda, pipenv, etc. ).

```bash

Create and activate python environment, requires python >= 3.8

$ python3 -m venv venv $ source venv/bin/activate $ python3 -m pip install --upgrade pip

Install all tools

$ python3 -m pip install hydrotools

Alternatively you can install a single tool

This installs the NWIS Client tool

$ python3 -m pip install hydrotools.nwis_client ```

OWPHydroTools Canonical Format

"Canonical" labels are protected and part of a fixed lexicon. Canonical labels are shared among all hydrotools subpackages. Subpackage methods should avoid changing or redefining these columns where they appear to encourage cross-compatibility. Existing canonical labels are listed below:

value [float32]: Indicates the real value of an individual measurement or simulated quantity.
value_time [datetime64[ns]]: formerly value_date, this indicates the valid time of value.
variable_name [category]: string category that indicates the real-world type of value (e.g. streamflow, gage height, temperature).
measurement_unit [category]: string category indicating the measurement unit (SI or standard) of value
qualifiers [category]: string category that indicates any special qualifying codes or messages that apply to value
series [integer32]: Use to disambiguate multiple coincident time series returned by a data source.
configuration [category]: string category used as a label for a particular time series, often used to distinguish types of model runs (e.g. shortrange, mediumrange, assimilation)
reference_time [datetime64[ns]]: formerly, start_date, some reference time for a particular model simulation. Could be considered an issue time, start time, end time, or other meaningful reference time. Interpretation is simulation or forecast specific.
longitude [category]: float32 category, WGS84 decimal longitude
latitude [category]: float32 category, WGS84 decimal latitude
crs [category]: string category, Coordinate Reference System, typically "EPSG:4326"
geometry [geometry]: GeoPandas compatible GeoSeries used as the default "geometry" column

Non-Canonical Column Labels

"Non-Canonical" labels are subpackage specific extensions to the canonical standard. Packages may share these non-canonical lables, but cross-compatibility is not guaranteed. Examples of non-canonical labels are given below.

usgs_site_code [category]: string category indicating the USGS Site Code/gage ID
nwm_feature_id [integer32]: indicates the NWM reach feature ID/ComID
nws_lid [category]: string category indicating the NWS Location ID/gage ID
usace_gage_id [category]: string category indicating the USACE gage ID
start [datetime64[ns]]: datetime returned by event_detection that indicates the beginning of an event
end [datetime64[ns]]: datetime returned by event_detection that indicates the end of an event

Categorical Data Types

OWPHydroTools uses pandas.Dataframe that contain pandas.Categorical values to increase memory efficiency. Depending upon your use-case, these values may require special consideration. To see if a Dataframe returned by a OWPHydroTools subpackage contains pandas.Categorical you can use pandas.Dataframe.info like so:

python print(my_dataframe.info())

```console Int64Index: 5706954 entries, 0 to 5706953 Data columns (total 7 columns): # Column Dtype

0 valuedate datetime64[ns] 1 variablename category
2 usgssitecode category
3 measurement_unit category
4 value float32
5 qualifiers category
6 series category
dtypes: category(5), datetime64ns, float32(1) memory usage: 141.5 MB None ```

Columns with Dtype category are pandas.Categorical. In most cases, the behavior of these columns is indistinguishable from their primitive types (in this case str) However, there are times when use of categories can lead to unexpected behavior such as when using pandas.DataFrame.groupby as documented here. pandas.Categorical are also incompatible with fixed format HDF files (must use format="table") and may cause unexpected behavior when attempting to write to GeoSpatial formats using geopandas.

Possible solutions include:

Cast `Categorical` to `str`

Casting to str will resolve all of the aformentioned issues including writing to geospatial formats.

python my_dataframe['usgs_site_code'] = my_dataframe['usgs_site_code'].apply(str)

Remove unused categories

This will remove categories from the Series for which no values are actually present.

python my_dataframe['usgs_site_code'] = my_dataframe['usgs_site_code'].cat.remove_unused_categories()

Use `observed` option with `groupby`

This limits groupby operations to category values that actually appear in the Series or DataFrame.

python mean_flow = my_dataframe.groupby('usgs_site_code', observed=True).mean()

American Geophysical Union 2021 Fall Meeting Poster

OWPHydroTools_AGU2021.pdf

Owner

Name: NOAA-OWP
Login: NOAA-OWP
Kind: organization

Repositories: 28
Profile: https://github.com/NOAA-OWP

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Regina"
  given-names: "Jason A."
  orcid: "https://orcid.org/0000-0002-4091-8647"
- family-names: "Raney"
  given-names: "Austin"
  orcid: "https://orcid.org/0000-0002-7670-1298"
title: "OWPHydroTools"
version: 3.1.0
date-released: 2025-04-09
url: "https://github.com/noaa-owp/hydrotools"

GitHub Events

Total

Create event: 6
Release event: 4
Issues event: 20
Watch event: 5
Delete event: 3
Issue comment event: 57
Push event: 36
Pull request review event: 35
Pull request review comment event: 48
Pull request event: 43
Fork event: 7

Last Year

Create event: 6
Release event: 4
Issues event: 20
Watch event: 5
Delete event: 3
Issue comment event: 57
Push event: 36
Pull request review event: 35
Pull request review comment event: 48
Pull request event: 43
Fork event: 7

Committers

Last synced: 7 months ago

All Time

Total Commits: 1,031
Total Committers: 7
Avg Commits per committer: 147.286
Development Distribution Score (DDS): 0.352

Past Year

Commits: 45
Committers: 4
Avg Commits per committer: 11.25
Development Distribution Score (DDS): 0.178

Top Committers

Name	Email	Commits
Jason Regina	j**c@g**m	668
Austin Raney	a**y@c**u	209
Austin Raney	a**y@n**v	144
Ryan Grout	r**t@n**v	4
Nels	n**r@n**v	2
Josh Cunningham	j**u@g**m	2
HankHerr-NOAA	6****A	2

Committer Domains (Top 20 + Academic)

noaa.gov: 3 crimson.ua.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 86
Total pull requests: 118
Average time to close issues: 6 months
Average time to close pull requests: 18 days
Total issue authors: 15
Total pull request authors: 4
Average comments per issue: 3.36
Average comments per pull request: 2.0
Merged pull requests: 101
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 11
Pull requests: 40
Average time to close issues: 8 days
Average time to close pull requests: 2 days
Issue authors: 3
Pull request authors: 3
Average comments per issue: 1.09
Average comments per pull request: 1.13
Merged pull requests: 34
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

jarq6c (52)
aaraney (16)
amaes3owp (2)
christophertubbs (2)
jameshalgren (2)
fernando-aristizabal (1)
jzleroy (1)
andywood (1)
markwang0 (1)
mikejohnson51 (1)
jmpmcmanus (1)
xfeng2021 (1)
stcui007 (1)
samakraus (1)
hellkite500 (1)

Pull Request Authors

jarq6c (85)
aaraney (45)
groutr (3)
JoshCu (2)

Top Labels

Issue Labels

bug (32) enhancement (26) documentation (13) question (1)

Pull Request Labels

enhancement (43) bug (28) documentation (20)

Dependencies

.github/workflows/deploy-gh-pages.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_slow_unit_tests.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_caches.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_events.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_metrics.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_nwis_client.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_nwm_client.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_nwm_client_new.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_rest_client.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

.github/workflows/run_svi_client.yml actions

actions/checkout v2 composite
actions/setup-python v2 composite

pyproject.toml pypi

python/_restclient/pyproject.toml pypi

python/caches/pyproject.toml pypi

python/events/pyproject.toml pypi

python/metrics/pyproject.toml pypi

python/nwis_client/pyproject.toml pypi

python/nwm_client/pyproject.toml pypi

python/nwm_client_new/pyproject.toml pypi

python/svi_client/pyproject.toml pypi

hydrotools

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Documentation

Motivation

What's here?

UTC Time

Usage

Installation

Create and activate python environment, requires python >= 3.8

Install all tools

Alternatively you can install a single tool

This installs the NWIS Client tool

OWPHydroTools Canonical Format

Non-Canonical Column Labels

Categorical Data Types

Cast Categorical to str

Remove unused categories

Use observed option with groupby

American Geophysical Union 2021 Fall Meeting Poster

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies

Cast `Categorical` to `str`

Use `observed` option with `groupby`