async-retriever

A part of HyRiver software stack for asynchronous requests with persistent caching

https://github.com/hyriver/async-retriever

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 1 DOI reference(s) in README
✓
Academic publication links
Links to: joss.theoj.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (4.9%) to scientific vocabulary

Keywords

async asyncio caching python requests

Keywords from Contributors

hydrology mesh genetic-algorithm pipeline-testing datacleaner data-profilers daymet dem usgs webservices

Last synced: 10 months ago · JSON representation ·

Repository

A part of HyRiver software stack for asynchronous requests with persistent caching

Basic Info

Host: GitHub
Owner: hyriver
License: other
Language: Python
Default Branch: main
Homepage: https://docs.hyriver.io
Size: 643 KB

Statistics

Stars: 4
Watchers: 1
Forks: 1
Open Issues: 0
Releases: 29

Topics

async asyncio caching python requests

Created about 5 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Contributing Funding License Code of conduct Citation Authors

README.rst

.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/async_retriever_logo.png
    :target: https://github.com/hyriver/HyRiver

|

.. image:: https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77/status.svg
    :target: https://joss.theoj.org/papers/b0df2f6192f0a18b9e622a3edff52e77
    :alt: JOSS

|

.. |pygeohydro| image:: https://github.com/hyriver/pygeohydro/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pygeohydro/actions/workflows/test.yml
    :alt: Github Actions

.. |pygeoogc| image:: https://github.com/hyriver/pygeoogc/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pygeoogc/actions/workflows/test.yml
    :alt: Github Actions

.. |pygeoutils| image:: https://github.com/hyriver/pygeoutils/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pygeoutils/actions/workflows/test.yml
    :alt: Github Actions

.. |pynhd| image:: https://github.com/hyriver/pynhd/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pynhd/actions/workflows/test.yml
    :alt: Github Actions

.. |py3dep| image:: https://github.com/hyriver/py3dep/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/py3dep/actions/workflows/test.yml
    :alt: Github Actions

.. |pydaymet| image:: https://github.com/hyriver/pydaymet/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pydaymet/actions/workflows/test.yml
    :alt: Github Actions

.. |pygridmet| image:: https://github.com/hyriver/pygridmet/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pygridmet/actions/workflows/test.yml
    :alt: Github Actions

.. |pynldas2| image:: https://github.com/hyriver/pynldas2/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/pynldas2/actions/workflows/test.yml
    :alt: Github Actions

.. |async| image:: https://github.com/hyriver/async-retriever/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/async-retriever/actions/workflows/test.yml
    :alt: Github Actions

.. |signatures| image:: https://github.com/hyriver/hydrosignatures/actions/workflows/test.yml/badge.svg
    :target: https://github.com/hyriver/hydrosignatures/actions/workflows/test.yml
    :alt: Github Actions

================ ====================================================================
Package          Description
================ ====================================================================
PyNHD_           Navigate and subset NHDPlus (MR and HR) using web services
Py3DEP_          Access topographic data through National Map's 3DEP web service
PyGeoHydro_      Access NWIS, NID, WQP, eHydro, NLCD, CAMELS, and SSEBop databases
PyDaymet_        Access daily, monthly, and annual climate data via Daymet
PyGridMET_       Access daily climate data via GridMET
PyNLDAS2_        Access hourly NLDAS-2 data via web services
HydroSignatures_ A collection of tools for computing hydrological signatures
AsyncRetriever_  High-level API for asynchronous requests with persistent caching
PyGeoOGC_        Send queries to any ArcGIS RESTful-, WMS-, and WFS-based services
PyGeoUtils_      Utilities for manipulating geospatial, (Geo)JSON, and (Geo)TIFF data
================ ====================================================================

.. _PyGeoHydro: https://github.com/hyriver/pygeohydro
.. _AsyncRetriever: https://github.com/hyriver/async-retriever
.. _PyGeoOGC: https://github.com/hyriver/pygeoogc
.. _PyGeoUtils: https://github.com/hyriver/pygeoutils
.. _PyNHD: https://github.com/hyriver/pynhd
.. _Py3DEP: https://github.com/hyriver/py3dep
.. _PyDaymet: https://github.com/hyriver/pydaymet
.. _PyGridMET: https://github.com/hyriver/pygridmet
.. _PyNLDAS2: https://github.com/hyriver/pynldas2
.. _HydroSignatures: https://github.com/hyriver/hydrosignatures

AsyncRetriever: Asynchronous requests with persistent caching
-------------------------------------------------------------

.. image:: https://img.shields.io/pypi/v/async-retriever.svg
    :target: https://pypi.python.org/pypi/async-retriever
    :alt: PyPi

.. image:: https://img.shields.io/conda/vn/conda-forge/async-retriever.svg
    :target: https://anaconda.org/conda-forge/async-retriever
    :alt: Conda Version

.. image:: https://codecov.io/gh/hyriver/async-retriever/branch/main/graph/badge.svg
    :target: https://codecov.io/gh/hyriver/async-retriever
    :alt: CodeCov

.. image:: https://img.shields.io/pypi/pyversions/async-retriever.svg
    :target: https://pypi.python.org/pypi/async-retriever
    :alt: Python Versions

.. image:: https://static.pepy.tech/badge/async-retriever
    :target: https://pepy.tech/project/async-retriever
    :alt: Downloads

|

.. image:: https://img.shields.io/badge/security-bandit-green.svg
    :target: https://github.com/PyCQA/bandit
    :alt: Security Status

.. image:: https://www.codefactor.io/repository/github/hyriver/async-retriever/badge
   :target: https://www.codefactor.io/repository/github/hyriver/async-retriever
   :alt: CodeFactor

.. image:: https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json
    :target: https://github.com/astral-sh/ruff
    :alt: Ruff

.. image:: https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit&logoColor=white
    :target: https://github.com/pre-commit/pre-commit
    :alt: pre-commit

|

Features
--------

AsyncRetriever is a part of `HyRiver `__ software stack that
is designed to aid in hydroclimate analysis through web services. This package serves as HyRiver's
engine for asynchronously sending requests and retrieving responses as ``text``, ``binary``, or
``json`` objects. It uses persistent caching using
`aiohttp-client-cache `__ to speed up the retrieval
even further. Moreover, thanks to `nest_asyncio `__
you can use this package in Jupyter notebooks. Although this package is part of the HyRiver
software stack, it can be used for any web calls. There are three functions that you can
use to make web calls:

* ``retrieve_text``: Get responses as ``text`` objects.
* ``retrieve_binary``: Get responses as ``binary`` objects.
* ``retrieve_json``: Get responses as ``json`` objects.
* ``stream_write``: Stream responses and write them to disk in chunks.

You can also use the general-purpose ``retrieve`` function to get responses as any
of the three types. All responses are returned as a list that has the same order as the
input list of requests. Moreover, there is another function called ``delete_url_cache``
for removing all requests from a cache file that contains a given URL.

You can control the request/response caching behavior and verbosity of the package
by setting the following environment variables:

* ``HYRIVER_CACHE_NAME``: Path to the caching SQLite database. It defaults to
  ``./cache/aiohttp_cache.sqlite``
* ``HYRIVER_CACHE_EXPIRE``: Expiration time for cached requests in seconds. It defaults to
  one week.
* ``HYRIVER_CACHE_DISABLE``: Disable reading/writing from/to the cache. The default is false.
* ``HYRIVER_SSL_CERT``: Path to a SSL certificate file.

For example, in your code before making any requests you can do:

.. code-block:: python

    import os

    os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
    os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
    os.environ["HYRIVER_CACHE_DISABLE"] = "true"
    os.environ["HYRIVER_SSL_CERT"] = "path/to/cert.pem"

You can find some example notebooks `here `__.

You can also try using AsyncRetriever without installing
it on your system by clicking on the binder badge. A Jupyter Lab
instance with the HyRiver stack pre-installed will be launched in your web browser, and you
can start coding!

Moreover, requests for additional functionalities can be submitted via
`issue tracker `__.

Citation
--------
If you use any of HyRiver packages in your research, we appreciate citations:

.. code-block:: bibtex

    @article{Chegini_2021,
        author = {Chegini, Taher and Li, Hong-Yi and Leung, L. Ruby},
        doi = {10.21105/joss.03175},
        journal = {Journal of Open Source Software},
        month = {10},
        number = {66},
        pages = {1--3},
        title = {{HyRiver: Hydroclimate Data Retriever}},
        volume = {6},
        year = {2021}
    }

Installation
------------

You can install ``async-retriever`` using ``pip``:

.. code-block:: console

    $ pip install async-retriever

Alternatively, ``async-retriever`` can be installed from the ``conda-forge`` repository
using `Conda `__:

.. code-block:: console

    $ conda install -c conda-forge async-retriever

Quick start
-----------

AsyncRetriever by default creates and/or uses ``./cache/aiohttp_cache.sqlite`` as the cache
that you can customize by the ``cache_name`` argument. Also, by default, the cache doesn't
have any expiration date and the ``delete_url_cache`` function should be used if you know
that a database on a server was updated, and you want to retrieve the latest data.
Alternatively, you can use the ``expire_after`` to set the expiration date for the cache.

As an example for retrieving a ``binary`` response, let's use the DAAC server to get
`NDVI `_.
The responses can be directly passed to ``xarray.open_mfdataset`` to get the data as
a ``xarray`` Dataset. We can also disable SSL certificate verification by setting
``ssl=False``.

.. code-block:: python

    import io
    import xarray as xr
    import async_retriever as ar
    from datetime import datetime

    west, south, east, north = (-69.77, 45.07, -69.31, 45.45)
    base_url = "https://thredds.daac.ornl.gov/thredds/ncss/ornldaac/1299"
    dates_itr = ((datetime(y, 1, 1), datetime(y, 1, 31)) for y in range(2000, 2005))
    urls, kwds = zip(
        *[
            (
                f"{base_url}/MCD13.A{s.year}.unaccum.nc4",
                {
                    "params": {
                        "var": "NDVI",
                        "north": f"{north}",
                        "west": f"{west}",
                        "east": f"{east}",
                        "south": f"{south}",
                        "disableProjSubset": "on",
                        "horizStride": "1",
                        "time_start": s.strftime("%Y-%m-%dT%H:%M:%SZ"),
                        "time_end": e.strftime("%Y-%m-%dT%H:%M:%SZ"),
                        "timeStride": "1",
                        "addLatLon": "true",
                        "accept": "netcdf",
                    }
                },
            )
            for s, e in dates_itr
        ]
    )
    resp = ar.retrieve_binary(urls, kwds, max_workers=8, ssl=False)
    data = xr.open_mfdataset(io.BytesIO(r) for r in resp)

We can remove these requests and their responses from the cache like so:

.. code-block:: python

    ar.delete_url_cache(base_url)

.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/ndvi.png
    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb

For a ``json`` response example, let's get water level recordings of an NOAA's water level station,
8534720 (Atlantic City, NJ), during 2012, using CO-OPS API. Note that this CO-OPS product has a
31-day limit for a single request, so we have to break the request down accordingly.

.. code-block:: python

    import pandas as pd

    station_id = "8534720"
    start = pd.to_datetime("2012-01-01")
    end = pd.to_datetime("2012-12-31")

    s = start
    dates = []
    for e in pd.date_range(start, end, freq="m"):
        dates.append((s.date(), e.date()))
        s = e + pd.offsets.MonthBegin()

    url = "https://api.tidesandcurrents.noaa.gov/api/prod/datagetter"

    urls, kwds = zip(
        *[
            (
                url,
                {
                    "params": {
                        "product": "water_level",
                        "application": "web_services",
                        "begin_date": f'{s.strftime("%Y%m%d")}',
                        "end_date": f'{e.strftime("%Y%m%d")}',
                        "datum": "MSL",
                        "station": f"{station_id}",
                        "time_zone": "GMT",
                        "units": "metric",
                        "format": "json",
                    }
                },
            )
            for s, e in dates
        ]
    )

    resp = ar.retrieve_json(urls, kwds)
    wl_list = []
    for rjson in resp:
        wl = pd.DataFrame.from_dict(rjson["data"])
        wl["t"] = pd.to_datetime(wl.t)
        wl = wl.set_index(wl.t).drop(columns="t")
        wl["v"] = pd.to_numeric(wl.v, errors="coerce")
        wl_list.append(wl)
    water_level = pd.concat(wl_list).sort_index()
    water_level.attrs = rjson["metadata"]

.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/water_level.png
    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb

Now, let's see an example without any payload or headers. Here's how we can retrieve
harmonic constituents of several NOAA stations from CO-OPS:

.. code-block:: python

    stations = [
        "8410140",
        "8411060",
        "8413320",
        "8418150",
        "8419317",
        "8419870",
        "8443970",
        "8447386",
    ]

    base_url = "https://api.tidesandcurrents.noaa.gov/mdapi/prod/webapi/stations"
    urls = [f"{base_url}/{i}/harcon.json?units=metric" for i in stations]
    resp = ar.retrieve_json(urls)

    amp_list = []
    phs_list = []
    for rjson in resp:
        sid = rjson["self"].rsplit("/", 2)[1]
        const = pd.DataFrame.from_dict(rjson["HarmonicConstituents"]).set_index("name")
        amp = const.rename(columns={"amplitude": sid})[sid]
        phase = const.rename(columns={"phase_GMT": sid})[sid]
        amp_list.append(amp)
        phs_list.append(phase)

    amp = pd.concat(amp_list, axis=1)
    phs = pd.concat(phs_list, axis=1)

.. image:: https://raw.githubusercontent.com/hyriver/HyRiver-examples/main/notebooks/_static/tides.png
    :target: https://github.com/hyriver/HyRiver-examples/blob/main/notebooks/async.ipynb

Contributing
------------

Contributions are appreciated and very welcomed. Please read
`CONTRIBUTING.rst `__
for instructions.

Owner

Name: HyRiver
Login: hyriver
Kind: organization
Location: United States of America

Website: https://docs.hyriver.io
Repositories: 11
Profile: https://github.com/hyriver

A suite of Python packages that provides a unified API for retrieving geospatial/temporal data from various web services

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Chegini"
  given-names: "Taher"
  orcid: "https://orcid.org/0000-0002-5430-6000"
- family-names: "Li"
  given-names: "Hong-Yi"
  orcid: "https://orcid.org/0000-0002-9807-3851"
- family-names: "Leung"
  given-names: "L. Ruby"
  orcid: "https://orcid.org/0000-0002-3221-9467"
title: "HyRiver: Hydroclimate Data Retriever"
version: 0.11
doi: 10.21105/joss.03175
date-released: 2021-10-27
url: "https://github.com/cheginit/HyRiver"
preferred-citation:
  type: article
  authors:
  - family-names: "Chegini"
    given-names: "Taher"
    orcid: "https://orcid.org/0000-0002-5430-6000"
  - family-names: "Li"
    given-names: "Hong-Yi"
    orcid: "https://orcid.org/0000-0002-9807-3851"
  - family-names: "Leung"
    given-names: "L. Ruby"
    orcid: "https://orcid.org/0000-0002-3221-9467"
  doi: "10.21105/joss.03175"
  journal: "Journal of Open Source Software"
  month: 10
  start: 1
  end: 3
  title: "HyRiver: Hydroclimate Data Retriever"
  issue: 66
  volume: 6
  year: 2021

GitHub Events

Total

Create event: 5
Release event: 3
Issues event: 4
Delete event: 1
Issue comment event: 5
Push event: 16
Pull request event: 3

Last Year

Create event: 5
Release event: 3
Issues event: 4
Delete event: 1
Issue comment event: 5
Push event: 16
Pull request event: 3

Committers

Last synced: over 2 years ago

All Time

Total Commits: 634
Total Committers: 5
Avg Commits per committer: 126.8
Development Distribution Score (DDS): 0.069

Past Year

Commits: 187
Committers: 3
Avg Commits per committer: 62.333
Development Distribution Score (DDS): 0.037

Top Committers

Name	Email	Commits
cheginit	c**t@g**m	590
dependabot[bot]	4****]	23
pre-commit-ci[bot]	6****]	16
Taher Chegini	t**i@g**m	4
DeepSource Bot	b**t@d**o	1

Committer Domains (Top 20 + Academic)

deepsource.io: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 3
Total pull requests: 49
Average time to close issues: about 1 month
Average time to close pull requests: about 15 hours
Total issue authors: 3
Total pull request authors: 2
Average comments per issue: 6.33
Average comments per pull request: 0.88
Merged pull requests: 40
Bot issues: 0
Bot pull requests: 49

Past Year

Issues: 2
Pull requests: 3
Average time to close issues: about 1 month
Average time to close pull requests: 1 day
Issue authors: 2
Pull request authors: 1
Average comments per issue: 6.0
Average comments per pull request: 1.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 3

View more stats

Top Authors

Issue Authors

hrvg (1)
wmcaliley-usgs (1)
rmcd-mscb (1)

Pull Request Authors

dependabot[bot] (35)
pre-commit-ci[bot] (19)

Top Labels

Issue Labels

bug (3)

Pull Request Labels

dependencies (35)

Packages

Total packages: 3
Total downloads:
- pypi 23,743 last-month

Total dependent packages: 19
(may contain duplicates)
Total dependent repositories: 12
(may contain duplicates)
Total versions: 45
Total maintainers: 1

pypi.org: async-retriever

High-level API for asynchronous requests with persistent caching.

Homepage: https://docs.hyriver.io/readme/async-retriever.html
Documentation: https://async-retriever.readthedocs.io/
License: MIT
Latest release: 0.19.3
published over 1 year ago

Versions: 29
Dependent Packages: 9
Dependent Repositories: 10
Downloads: 23,743 Last month

Rankings

Dependent packages count: 1.1%

Dependent repos count: 4.6%

Downloads: 4.9%

Average: 13.1%

Stargazers count: 25.0%

Forks count: 29.8%

Maintainers (1)

tchegini

Last synced: 11 months ago

conda-forge.org: async_retriever

Homepage: https://github.com/hyriver/async-retriever
License: MIT
Latest release: 0.3.6
published almost 4 years ago

Versions: 14
Dependent Packages: 5
Dependent Repositories: 1

Rankings

Dependent packages count: 10.4%

Dependent repos count: 24.4%

Average: 40.8%

Stargazers count: 62.2%

Forks count: 66.1%

Last synced: 11 months ago

conda-forge.org: async-retriever

Homepage: https://github.com/hyriver/async-retriever
License: MIT
Latest release: 0.3.6
published almost 4 years ago

Versions: 2
Dependent Packages: 5
Dependent Repositories: 1

Rankings

Dependent packages count: 10.4%

Dependent repos count: 24.4%

Average: 40.8%

Stargazers count: 62.2%

Forks count: 66.1%

Last synced: 11 months ago

Dependencies

.github/workflows/codeql-analysis.yml actions

actions/checkout v3 composite
github/codeql-action/analyze v2 composite
github/codeql-action/autobuild v2 composite
github/codeql-action/init v2 composite

.github/workflows/pre-commit.yml actions

actions/checkout v3 composite
excitedleigh/setup-nox v2.1.0 composite

.github/workflows/release.yml actions

actions/checkout v3 composite
actions/setup-python master composite
docker://pandoc/core * composite
pypa/gh-action-pypi-publish master composite
softprops/action-gh-release v1 composite

.github/workflows/test.yml actions

actions/checkout v3 composite
codecov/codecov-action v3 composite
mamba-org/provision-with-micromamba main composite

pyproject.toml pypi

aiohttp [speedups]>=3.8.3
aiohttp-client-cache >=0.8.1
aiosqlite *
cytoolz *
ujson *

ci/requirements/environment.yml conda

aiodns
aiohttp >=3.8.3
aiohttp-client-cache >=0.8.1
aiosqlite
brotli
cytoolz
nest-asyncio
psutil
pytest-cov
pytest-xdist
ujson