AquaFetch

AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization - Published in JOSS (2025)

https://github.com/hyex-research/aquafetch

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 128 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: arxiv.org, joss.theoj.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

big-data database hydrology wastewater-treatment water water-quality

Scientific Fields

Economics Social Sciences - 85% confidence
Last synced: 4 months ago · JSON representation

Repository

A Unified Python Interface for Water Resource Data Acquisition

Basic Info
Statistics
  • Stars: 8
  • Watchers: 0
  • Forks: 3
  • Open Issues: 2
  • Releases: 1
Topics
big-data database hydrology wastewater-treatment water water-quality
Created 11 months ago · Last pushed 4 months ago
Metadata Files
Readme Contributing

readme.md

Documentation Status PyPI version PyPI - Python Version status Zenodo

A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization

AquaFetch is a Python package designed for the automated downloading, parsing, cleaning, and harmonization of freely available water resource datasets related to rainfall-runoff processes, surface water quality, and wastewater treatment. The package currently supports approximately 70 datasets, each containing between 1 to hundreds of parameters. It facilitates the downloading and transformation of raw data into consistent, easy-to-use, analysis-ready formats. This allows users to directly access and utilize the data without labor-intensive and time-consuming preprocessing.

The package comprises three submodules, each representing a different type of water resource data: rr for rainfall-runoff processes, wq for surface water quality, and wwt for wastewater treatment. The rr submodule offers data for 47,291 catchments worldwide, encompassing both dynamic and static features for each catchment. The dynamic features consist of observed streamflow and meteorological time series, averaged over the catchment area, available at daily and/or hourly time steps. Static features include constant parameters such as land use, soil, topography, and other physiographical characteristics, along with catchment boundaries. This submodule not only provides access to established rainfall-runoff datasets such as CAMELS and LamaH but also introduces new datasets compiled for the first time from publicly accessible online data sources. The wq submodule offers access to 17 surface water quality datasets, each containing various water quality parameters measured across different spaces and times. The wwt submodule provides access to over 20,000 experimental measurements related to wastewater treatment techniques such as adsorption, photocatalysis, membrane filtration, and sonolysis.

The development of AquaFetch was inspired by the growing availability of diverse water resource datasets in recent years. As a community-driven project, the codebase is structured to allow contributors to easily add new datasets, ensuring the package continues to expand and evolve to meet future needs.

Installation

You can install AquaFetch using pip

pip install aqua-fetch

The package can be installed using GitHub link from the master branch

python -m pip install git+https://github.com/hyex-research/AquaFetch.git

To install from a specific branch such as dev branch which contains more recent code

python -m pip install git+https://github.com/hyex-research/AquaFetch.git@dev

The above code will install minimal depencies required to use the library which include numpy, pandas and requests. To install the library with full list of dependencies use the all option during installation.

python -m pip install "aqua-fetch[all] @ git+https://github.com/hyex-research/AquaFetch.git"

This will install addtional optional depencdies which include xarray, fiona, netCDF4 and easy_mpl.

Usage

The following sections describe brief usage of datasets from each of the three submodules i.e. rr, wq and wwt. For detailed usage examples see docs

The core of rr sub-module is the RainfallRunoff class. This class fetches dynamic features (catchment averaged hydrometeorological data at daily or sub-daily timesteps), static features (catchment characteristics related to topography, soil, land use-land cover, or hydrological indices that have constant values over time) and the catchment boundary. The following example demonstrates how to fetch data for CAMELS_SE. However, the method is the same for all available rainfall-runoff datasets.

```python from aquafetch import RainfallRunoff dataset = RainfallRunoff('CAMELSSE') # instead of CAMELS_SE, you can provide any other dataset name

get data by station id

, dynamic = dataset.fetch(stations='5', asdataframe=True) df = dynamic['5'] # dynamic is a dictionary of with keys as station names and values as DataFrames df.shape # -> (21915, 4)

get name of all stations as list

stns = dataset.stations() len(stns) # -> 50

get data of 10 % of stations as dataframe

, dynamic = dataset.fetch(0.1, asdataframe=True) len(dynamic) # 5

dynamic is a dictionary whose values are dataframes of dynamic features

[df.shape for df in dynamic.values()] # [(21915, 4), (21915, 4), (21915, 4), (21915, 4), (21915, 4)]

get the data of a single (randomly selected) station

, dynamic = dataset.fetch(stations=1, asdataframe=True) len(dynamic) # 1

get names of available dynamic features

dataset.dynamic_features

get only selected dynamic features

, dynamic = dataset.fetch('5', asdataframe=True, ... dynamicfeatures=['pcpmm', 'airtempCmean', 'qcmsobs']) dynamic['5'].shape # (21915, 3)

get names of available static features

dataset.static_features

get data of 10 random stations

, dynamic = dataset.fetch(10, asdataframe=True) len(dynamic) # 10

If we want to get both static and dynamic data

static, dynamic = dataset.fetch(stations='5', staticfeatures="all", asdataframe=True) static.shape, len(dynamic), dynamic['5'].shape # ((1, 76), 1, (21915, 4))

If we don't set as_dataframe=True and have xarray installed then the returned data will be a xarray Dataset

_, dynamic = dataset.fetch(10) type(dynamic) # -> xarray.core.dataset.Dataset

dynamic.dims # -> FrozenMappingWarningOnValuesAccess({'time': 21915, 'dynamic_features': 4})

len(dynamic.data_vars) # -> 10

get coordinates of all stations

coords = dataset.stn_coords() coords.shape # (50, 2)

get coordinates of station whose id is 5

dataset.stn_coords('5') # 68.035599 21.9758

get coordinates of two stations

dataset.stn_coords(['5', '736'])

get area of a single station

dataset.area('5')

get coordinates of two stations

dataset.area(['5', '736'])

if fiona library is installed we can get the boundary as fiona Geometry

dataset.get_boundary('5') ```

The datasets related to surface water quality are available using functional or objected-oriented API depending upon the complexity of the dataset. The following example shows usage of two surface water quality related datasets. For complete name of Python functions and classes see documentation

```python from aquafetch import busanbeach dataframe = busan_beach() dataframe.shape # (1446, 14)

dataframe = busanbeach(target=['tetxcoppml', 'sul1_coppml']) dataframe.shape # (1446, 15)

from aqua_fetch import GRQA ds = GRQA(path="/path/to/data") print(ds.parameters)

len(ds.parameters) # 42 country = "Pakistan" len(ds.fetch_parameter('TEMP', country=country)) ```

The datasets for wastewater treatment are all available in function API design. These datasets consist of experimental conducted to remove certain pollutants from wastewater. For complete list of functions, see documentation

```python from aquafetch import ecremovalbiochar data, * = ecremovalbiochar() data.shape # -> (3757, 27)

data, encoders = ecremovalbiochar(encoding="le") data.shape # -> (3757, 27)

from aquafetch import mgdegradation mgdata, encoders = mgdegradation() mg_data.shape # -> (1200, 12)

the default encoding is None, but if we want to use one hot encoder

mgdataohe, encoders = mgdegradation(encoding="ohe") mgdata_ohe.shape # -> (1200, 31)

```

Summary of rainfall runoff Datasets

| Name | Num. of daily stations | Num. of hourly stations | Num. of dynamic features | Num. of static features | Temporal Coverage | Spatial Coverage | Ref. | |----------------|------------------------|-------------------------|--------------------------|-------------------------|-------------------|---------------------------------------------|-------------------------------------------------------------------------------------------------------------| | Arcticnet | 106 | | 27 | 35 | 1979 - 2003 | Arctic (Russia) | R-Arcticnet | | Bull | 484 | | 55 | 214 | 1990 - 2020 | Spain | Aparicio et al., 2024 | | CABra | 735 | | 13 | 87 | 1980 - 2010 | Brazil | Almagro et al., 2021 | | CAMELSH | | 5767 | 13 | 779 | 1900 - 2024 | United States of America | Tran et al., (2025) | | CAMELSAUS | 222, 561 | | 28 | 166, 187 | 1900 - 2018 | Australia | Flower et al., 2021 | | CAMELSBR | 897 | | 10 | 67 | 1920 - 2019 | Brazil | Chagas et al., 2020 | | CAMELSCOL | 347 | | 6 | 255 | 1981 - 2022 | Columbia | Jimenez et al., 2025 | | CAMELSCH | 331 | | 9 | 209 | 1981 - 2020 | Switzerland, Austria, France, Germany Italy | Hoege et al., 2023 | | CAMELSCL | 516 | | 12 | 104 | 1913 - 2018 | Chile | Alvarez-Garreton et al., 2018 | | CAMELSDK | 304 | | 13 | 119 | 1989 - 2023 | Denmark | Liu et al., 2024 | | CAMELSDE | 1555 | | 21 | 111 | 1951 - 2020 | Germany | Loritz et al., 2024 | | CAMELSFI | 320 | | | 111 | 1963 - 2023 | Finland | Seppä, I et al., 2025 | | CAMELSFR | 654 | | 22 | 344 | 1970 - 2021 | France | Delaigue et al., 2024 | | CAMELSGB | 671 | | 10 | 145 | 1970 - 2015 | Britain | Coxon et al., 2020 | | CAMELSIND | 472 | | 20 | 210 | 1980 - 2020 | India | Mangukiya et al., 2024 | | CAMELSLUX | 56 | 56 | 25 | 61 | 2004 - 2021 | Luxumbourg | Nijzink et al., 2025 | | CAMELSSE | 50 | | 4 | 76 | 1961 - 2020 | Sweden | Teutschbein et al., 2024 | | CAMELSSK | | 178 | 17 | 215 | 2000 - 2019 | South Korea | Kim et al., 2025 | | CAMELSNZ | 369 | 369 | 5 | 39 | 1972 - 2024 | New Zealand | Bushra, et al., 2025 | | CAMELSUS | 671 | | 8 | 59 | 1980 - 2014 | USA | Newman et al., 2014 | | Caravan_DK | 308 | | 38 | 211 | 1981 - 2020 | Denmark | Koch, J. (2022) | | CCAM | 102 | | 16 | 124 | 1990 - 2020 | China | Hao et al., 2021 | | Finland | 669 | | 10 | 214 | 2012 - 2023 | Finland | Nascimento et al., 2024 & ymparisto.fi | | GRDCCaravan | 5357 | | 39 | 211 | 1950 - 2023 | Global | Faerber et al., 2023 | | HYSETS | 14425 | | 20 | 30 | 1950 - 2018 | North America | Arsenault et al., 2020 | | HYPE | 561 | | 9 | 3 | 1985 - 2019 | Costa Rica | Arciniega-Esparza and Birkel, 2020 | | Ireland | 464 | | 10 | 214 | 1992 - 2020 | Ireland | Nascimento et al., 2024 & EPA Ireland | | Italy | 294 | | 10 | 214 | 1992 - 2020 | Italy | Nascimento et al., 2024 & hiscentral.isprambiente.gov.it | | Japan | 751 | 696 | 27 | 35 | 1979 - 2022 | Japan | Peirong et al., 2023 & river.go.jp | | LamaHCE | 859 | 859 | 22 | 80 | 1981 - 2019 | Central Europe | Klingler et al., 2021 | | LamaHIce | 111 | 111 | 36 | 154 | 1950 - 2021 | Iceland | Helgason and Nijssen 2024 | | NPCTRCatchments| - | 7 | 14 | 14 | 2013 - 2019 | Canada | Korver et al., 2022 | | Poland | 1287 | | 10 | 214 | 1992 - 2020 | Poland | Nascimento et al., 2024 & danepubliczne.imgw.pl | | Portugal | 280 | | 10 | 214 | 1992 - 2020 | Portugal | Nascimento et al., 2024 & SNIRH Portugal | | RRLuleaSweden | 1 | | 2 | 0 | 2016 - 2019 | Lulea (Sweden) | Broekhuizen et al., 2020 | | Simbi | 70 | | 3 | 232 | 1920 - 1940 | Haiti | Bathelemy et al., 2024 | | Slovenia | 117 | | 3 | 214 | 1950 - 2023 | Slovenia | Nascimento et al., 2024 & vode.arso.gov.si | | Spain | 889 | | 27 | 35 | 1979 - 2020 | Spain | Peirong et al., 2023 & ceh-flumen64 | | Thailand | 73 | | 27 | 35 | 1980 - 1999 | Thailand | Peirong et al., 2023 & RID project | | USGS | 12004 | 1541 | 5 | 27 | 1950 - 2018 | USA | USGS nwis | | WaterBenchIowa | | 125 | 3 | 7 | 2011 - 2018 | Iowa (USA) | Demir et al., 2022 |

Summary of Water Quality Datasets

| Name | Variables Covered | Number of Stations | Temporal Coverage | Spatial Coverage | Ref. | |---------------------------|-------------------|--------------------|-------------------|---------------------------|------------------------------------------------------------------------------| | Busan Beach | 14 | 1 | 2018 - 2019 | Busan, South Korea | Jang et al., 2021 | | Buzzards Bay | 64 | | 1992 - 2018 | Buzzards Bay (USA) | Jakuba et al., 2021 | | CamelsChem | 28 | 671 | 1980 - 2018 | Conterminous USA | Sterle et al., 2024 | | CamelsCHChem | 40 | 115 | 1980 - 2020 | Swtizerland | Nascimento et al., 2025 | | Ecoli Mekong River | 10 | | 2011 - 2021 | Mekong river (Houay Pano) | Boithias et al., 2022 | | Ecoli Mekong River (Laos) | 10 | | 2011 - 2021 | Mekong River (Laos) | Boithias et al., 2022 | | Ecoli Houay Pano (Laos) | 10 | | 2011 - 2021 | Houay Pano (Laos) | Boithias et al., 2022 | | GRQA | 42 | | 1898 - 2020 | Global | Virro et al., 2021 | | GRiMeDB | 1 | 5029 | 1973 - 2021 | Global | Stanley et al., 2023 | | Oligotrend | 17 | 1846 | 1986 - 2022 | Global | Minaudo et al., 2025 | | Quadica | 10 | 1386 | 1950 - 2018 | Germany | Ebeling et al., 2022 | | RC4USCoast | 21 | 140 | 1850 - 2020 | USA | Gomez et al., 2022 | | SanFrancisco Bay | 18 | | 1969 - 2015 | Sans Francisco Bay (USA) | Cloern et al., 2017 | | Selune River | 5 | | 2021 - 2022 | Selune River (France) | Moustapha Ba et al., 2023 | | Sylt Roads | 15 | 3 | 1973 - 2019 | North Sea (Arctic) | Rick et al., 2023 | | SWatCh | 24 | 26322 | 1960 - 2022 | Global | Lobke et al., 2022 | | White Clay Creek | 2 | | 1977 - 2017 | White Clay Creek (USA) | Newbold and Damiano 2013 |

Summary of datasets related to wastewater treatment

| Treatment Process | Parameters | Target Pollutant | Data Points | Reference | |-------------------|------------|--------------------------------|-------------|----------------------------------------------------------------------------| | Adsorption | 26 | Emerg. Contaminants | 3,757 | Jaffari et al., 2023 | | Adsorption | 15 | Cr | 219 | Ishtiaq et al., 2024 | | Adsorption | 30 | (Cr(VI), Co(II), Sr(II), Ba(II), I, and Fe ) | 1,518 | Jaffari et al., 2023 | | Adsorption | 30 | po4 | 5,014 | Iftikhar et al., 2024 | | Adsorption | 12 | Industrial Dye | 1,514 | Iftikhar et al., 2023 | | Adsorption | 17 | Cu, Zn, Pb, Cd, Ni, and As | 689 | Shen et al., 2023 | | Adsorption | 8 | P | 504 | Leng et al., 2024 | | Adsorption | 8 | N | 211 | Leng et al., 2024 | | Adsorption | 13 | As | 1,605 | Huang et al., 2024 | | Photocatalysis | 11 | Melachite Green | 1,200 | Jaffari et a., 2023 | | Photocatalysis | 23 | Dyes | 1,527 | Kim et al., 2024 | | Photocatalysis | 15 | 2,4,Dichlorophenoxyacetic acid | 1,044 | Kim et al., 2024 | | Photocatalysis | - | - | 2,078 | submitted et al., 2024 | | Photocatalysis | 8 | Tetracycline | 374 | Abdi et al., 2022 | | Photocatalysis | 7 | TiO2 | 446 | Jiang et al., 2020 | | Photocatalysis | 8 | multiple | 457 | Jiang et al., 2020 | | membrane | 18 | micropollutants | 1,906 | Jeong et al., 2021 | | membrane | 18 | salts | 1,586 | Jeong et al., 2023 | | sonolysis | 6 | Cyanobacteria | 314 | Jaffari et al., 2024 |

Owner

  • Name: KAUST HYdro-climatic EXtremes (HYEX) Research Group
  • Login: hyex-research
  • Kind: organization

JOSS Publication

AquaFetch: A Unified Python Interface for Water Resource Dataset Acquisition and Harmonization
Published
August 23, 2025
Volume 10, Issue 112, Page 8051
Authors
Ather Abbas ORCID
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Sara Iftikhar ORCID
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Hylke E. Beck ORCID
King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
Editor
Ethan White ORCID
Tags
modeling hydrology data water

GitHub Events

Total
  • Create event: 5
  • Release event: 1
  • Issues event: 2
  • Watch event: 6
  • Issue comment event: 10
  • Member event: 3
  • Push event: 81
  • Pull request event: 10
  • Fork event: 3
Last Year
  • Create event: 5
  • Release event: 1
  • Issues event: 2
  • Watch event: 6
  • Issue comment event: 10
  • Member event: 3
  • Push event: 81
  • Pull request event: 10
  • Fork event: 3

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 2
  • Total pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 2
  • Pull requests: 6
  • Average time to close issues: N/A
  • Average time to close pull requests: less than a minute
  • Issue authors: 2
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tamnva (1)
  • cyn2003 (1)
Pull Request Authors
  • AtrCheema (6)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 161 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 4
  • Total maintainers: 1
pypi.org: aqua-fetch

A Unified Python Interface for Water Resource Data Acquisition and harmonization

  • Versions: 4
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 161 Last month
Rankings
Dependent packages count: 9.7%
Average: 32.3%
Dependent repos count: 54.8%
Maintainers (1)
Last synced: 4 months ago

Dependencies

.binder/requirements.txt pypi
  • imageio *
  • netcdf4 *
  • pyshp *
  • requests *
  • seaborn *
  • statsmodels *
  • xarray *
dev_requirements.txt pypi
  • easy_mpl * development
  • matplotlib * development
  • openpyxl * development
  • pandas <=2.1.4 development
  • requests * development
  • xarray <=2024.7.0 development
docs/requirements.txt pypi
  • ipykernel *
  • nbsphinx *
  • netcdf4 *
  • numpy ==1.26.4
  • openpyxl *
  • pandas ==2.1.4
  • requests *
  • scipy *
  • seaborn *
  • sphinx *
  • sphinx-gallery *
  • sphinx-prompt *
  • sphinx_copybutton *
  • sphinx_issues *
  • sphinx_rtd_theme ==2.0.0
  • sphinx_toggleprompt *
  • xarray ==2024.7.0
requirements.txt pypi
  • numpy *
  • pandas <=2.1.4
  • requests *
setup.py pypi