xagg

xagg: A Python package to aggregate gridded data onto polygons - Published in JOSS (2024)

https://github.com/ks905383/xagg

Science Score: 100.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

aggregation polygon python raster-data xarray

Keywords from Contributors

mesh

Scientific Fields

Mathematics Computer Science - 84% confidence
Earth and Environmental Sciences Physical Sciences - 43% confidence
Last synced: 6 months ago · JSON representation ·

Repository

Aggregating gridded data (xarray) to polygons

Basic Info
Statistics
  • Stars: 100
  • Watchers: 2
  • Forks: 15
  • Open Issues: 12
  • Releases: 15
Topics
aggregation polygon python raster-data xarray
Created almost 5 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

xagg

Build Status codecov pypi conda-forge Conda Downloads DOI DOI Documentation Status

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons.

Installation

The easiest way to install the latest version of xagg is using conda or mamba:

``` conda install -c conda-forge xagg==0.3.3.1

or

mamba install -c conda-forge xagg==0.3.3.1 ```

We recommend installing xagg in a new environment whenever possible, to ensure all (sub)dependencies are correctly loaded.

Alternatively, you can use pip, though not all optional dependencies are available through pip, meaning that certain features may not be available: pip install xagg

Documentation

See the latest documentation at https://xagg.readthedocs.io/en/latest/index.html

Intro

Science often happens on grids - gridded weather products, interpolated pollution data, night time lights, remote sensing all approximate the continuous real world for reasons of data resolution, processing time, or ease of calculation.

However, living things don't live on grids, and rarely play, act, or observe data on grids either. Instead, humans tend to work on the county, state, township, Bezirk, or city level; birds tend to fly along complex migratory corridors; and rain- and watersheds follow valleys and mountains.

So, whenever we need to work with both gridded and geographic data products, we need ways of getting them to match up. We may be interested for example what the average temperature over a county is, or the average rainfall rate over a watershed.

Enter xagg.

xagg provides an easy-to-use (2 lines!), standardized way of aggregating raster data to polygons. All you need is some gridded data in an xarray Dataset or DataArray and some polygon data in a geopandas GeoDataFrame. Both of these are easy to use for the purposes of xagg - for example, all you need to use a shapefile is to open it:

``` import xarray as xr import geopandas as gpd

# Gridded data file (netcdf/climate data) ds = xr.open_dataset('file.nc')

# Shapefile gdf = gpd.open_dataset('file.shp') ```

xagg will then figure out the geographic grid (lat/lon) in ds, create polygons for each pixel, and then generate intersects between every polygon in the GeoDataFrame and every pixel. For each polygon in the GeoDataFrame, the relative area of each covering pixel is calculated - so, for example, if a polygon (say, a US county) is the size and shape of a grid pixel, but is split halfway between two pixels, the weight for each pixel will be 0.5, and the value of the gridded variables on that polygon will just be the average of both.

Here is a sample code run, using the loaded files from above:

```

import xagg as xa

# Get overlap between pixels and polygons weightmap = xa.pixel_overlaps(ds,gdf)

# Aggregate data in [ds] onto polygons aggregated = xa.aggregate(ds,weightmap)

# aggregated can now be converted into an xarray dataset (using aggregated.todataset()), # or a geopandas geodataframe (using aggregated.togeodataframe() or aggregated.todataframe() # for a pure pandas result), or directly exported to netcdf, csv, or shp files using # aggregated.tocsv()/.tonetcdf()/.toshp() ```

Researchers often need to weight your data by more than just its relative area overlap with a polygon (for example, do you want to weight pixels with more population more?). xagg has a built-in support for adding an additional weight grid (another xarray DataArray) into xagg.pixel_overlaps().

Finally, xagg allows for direct exporting of the aggregated data in several commonly used data formats:

  • NetCDF
  • CSV for STATA, R
  • Shapefile for QGIS, further spatial processing

Best of all, xagg is flexible. Multiple variables in your dataset? xagg will aggregate them all, as long as they have at least lat/lon dimensions. Fields in your shapefile that you'd like to keep? xagg keeps all attributes/fields (for example FIPS codes from county datasets) all the way through the final export. Weird dimension names? xagg is trained to recognize all versions of "lat", "Latitude", "Y", "navlat", "Latitude1"... etc. that the author has run into over the years of working with climate data; and this list is easily expandable as a keyword argument if needed.

How to support xagg

The easiest way to support xagg is to star the repository and spread the word!

Please also consider citing xagg if you use it in your research. The preferred citation can be found at the "Cite this repository" button in the About section on the top right of this page. It links to our paper in the Journal of Open Source Software (JOSS).

xagg, like much of open-source software, is a volunteer-run effort. It means a lot to the developers if you reach out and tell us that you're using our software, how it's helped you, and how it can be improved - it makes the long hours fixing bugs feel that much more worth it. (If you're feeling particularly generous, the lead developer would not say no to additional thanks through contributions to his tea fund through Ko-Fi ;) )

Getting Help and Contributing

If you have any questions about how to use xagg, please ask them in the GitHub Discussions forum!

If you spot a bug (xagg not working as advertised), please open an issue if it hasn't yet been raised (or comment on an existing one if you see it listed already). To make sure the issue gets solved as quickly as possible: - Include a minimally reproducible example that triggers the bug - Include a copy of your environment (for example, the output of conda list) in which the bug occurred

If you'd like to go the extra mile and help us fix the bug, feel free to contribute a pull request! We ask that any PR: - Follows a standard development workflow, like this one. - If fixing a bug, includes unit tests that fail when confronted with the original bug. GitHub Actions are set up to automatically run all tests in xagg/tests/ upon a push.

If there's a feature that you'd like xagg to have, please start a Discussion in the GitHub Discussions forum, or implement it yourself in a pull request.

For more information on contributing in general, the contribution guidelines to the xarray package are a great starting point (not everything will be directly relevant to xagg, but much of this guide is generally relevant!).

Use Cases

Climate econometrics

Many climate econometrics studies use societal data (mortality, crop yields, etc.) at a political or administrative level (for example, counties) but climate and weather data on grids. Oftentimes, further weighting by population or agricultural density is needed.

Area-weighting of pixels onto polygons ensures that aggregating weather and climate data onto polygons occurs in a robust way. Consider a (somewhat contrived) example: an administrative region is in a relatively flat lowlands, but a pixel that slightly overlaps the polygon primarily covers a wholly different climate (mountainous, desert, etc.). Using a simple mask would weight that pixel the same, though its information is not necessarily relevant to the climate of the region. Population-weighting may not always be sufficient either; consider Los Angeles, which has multiple significantly different climates, all with high densities.

xagg allows a simple population and area-averaging, in addition to export functions that will turn the aggregated data into output easily used in STATA or R for further calculations.


Project based on the cookiecutter science project template.

Owner

  • Name: Kevin Schwarzwald
  • Login: ks905383
  • Kind: user
  • Location: Harlem, NY
  • Company: IRI, Columbia, LDEO

Climate uncertainty, variability, and impacts by day, urban + transportation policy by evening, rock violin by night.

JOSS Publication

xagg: A Python package to aggregate gridded data onto polygons
Published
December 31, 2024
Volume 9, Issue 104, Page 7239
Authors
Kevin Schwarzwald ORCID
Lamont-Doherty Earth Observatory of Columbia University, Palisades, NY, USA, International Research Institute for Climate and Society, Palisades, NY, USA
Kerrie Geil
Geosystems Research Institute, Mississippi State University, Starkville, MS, USA
Editor
Chris Vernon ORCID
Tags
xarray geopandas raster data spatial statistics

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
- family-names: Schwarzwald
  given-names: Kevin
  orcid: "https://orcid.org/0000-0001-8309-7124"
- family-names: Geil
  given-names: Kerrie
contact:
- family-names: Schwarzwald
  given-names: Kevin
  orcid: "https://orcid.org/0000-0001-8309-7124"
doi: 10.5281/zenodo.13884871
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Schwarzwald
    given-names: Kevin
    orcid: "https://orcid.org/0000-0001-8309-7124"
  - family-names: Geil
    given-names: Kerrie
  date-published: 2024-12-31
  doi: 10.21105/joss.07239
  issn: 2475-9066
  issue: 104
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 7239
  title: "xagg: A Python package to aggregate gridded data onto
    polygons"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.07239"
  volume: 9
title: "xagg: A Python package to aggregate gridded data onto polygons"

GitHub Events

Total
  • Create event: 8
  • Release event: 4
  • Issues event: 4
  • Watch event: 15
  • Delete event: 4
  • Issue comment event: 8
  • Push event: 35
  • Pull request event: 10
  • Fork event: 1
Last Year
  • Create event: 8
  • Release event: 4
  • Issues event: 4
  • Watch event: 15
  • Delete event: 4
  • Issue comment event: 8
  • Push event: 35
  • Pull request event: 10
  • Fork event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 217
  • Total Committers: 6
  • Avg Commits per committer: 36.167
  • Development Distribution Score (DDS): 0.124
Past Year
  • Commits: 96
  • Committers: 3
  • Avg Commits per committer: 32.0
  • Development Distribution Score (DDS): 0.042
Top Committers
Name Email Commits
Kevin Schwarzwald k****d@g****m 190
kerriegeil k****t@g****m 11
dependabot[bot] 4****] 8
jsadler2 j****r@u****v 3
Ray Bell r****0@g****m 3
Jon-Paul Mastrogiacomo j****o@g****m 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 37
  • Total pull requests: 53
  • Average time to close issues: 8 months
  • Average time to close pull requests: 23 days
  • Total issue authors: 22
  • Total pull request authors: 8
  • Average comments per issue: 3.35
  • Average comments per pull request: 0.98
  • Merged pull requests: 49
  • Bot issues: 0
  • Bot pull requests: 9
Past Year
  • Issues: 8
  • Pull requests: 12
  • Average time to close issues: 15 days
  • Average time to close pull requests: about 9 hours
  • Issue authors: 6
  • Pull request authors: 4
  • Average comments per issue: 1.38
  • Average comments per pull request: 0.58
  • Merged pull requests: 11
  • Bot issues: 0
  • Bot pull requests: 3
Top Authors
Issue Authors
  • raybellwaves (6)
  • bradyrx (4)
  • jrising (3)
  • thurber (2)
  • ks905383 (2)
  • kerriegeil (2)
  • JPMastrogiacomo (2)
  • ccallahan45 (1)
  • jamesafranke (1)
  • masawdah (1)
  • econwiz (1)
  • jwyslmh (1)
  • rmcd-mscb (1)
  • dcherian (1)
  • helsharif (1)
Pull Request Authors
  • ks905383 (45)
  • dependabot[bot] (12)
  • kerriegeil (6)
  • raybellwaves (3)
  • JPMastrogiacomo (2)
  • jsadler2 (1)
  • masawdah (1)
  • Hugovdberg (1)
Top Labels
Issue Labels
Pull Request Labels
dependencies (12) github_actions (1)

Packages

  • Total packages: 3
  • Total downloads:
    • pypi 351 last-month
  • Total docker downloads: 11
  • Total dependent packages: 3
    (may contain duplicates)
  • Total dependent repositories: 1
    (may contain duplicates)
  • Total versions: 32
  • Total maintainers: 2
pypi.org: xagg

Aggregating raster data over polygons

  • Versions: 24
  • Dependent Packages: 2
  • Dependent Repositories: 1
  • Downloads: 337 Last month
  • Docker Downloads: 11
Rankings
Docker downloads count: 4.1%
Stargazers count: 8.4%
Dependent packages count: 10.1%
Downloads: 10.1%
Forks count: 10.5%
Average: 10.8%
Dependent repos count: 21.6%
Maintainers (1)
Last synced: 6 months ago
pypi.org: xagg-no-xesmf-deps

Aggregating raster data over polygons

  • Versions: 3
  • Dependent Packages: 1
  • Dependent Repositories: 0
  • Downloads: 14 Last month
Rankings
Dependent packages count: 7.2%
Stargazers count: 9.5%
Forks count: 12.1%
Average: 15.9%
Dependent repos count: 34.8%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: xagg
  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 34.0%
Stargazers count: 38.6%
Average: 42.1%
Forks count: 44.7%
Dependent packages count: 51.2%
Last synced: 6 months ago

Dependencies

setup.py pypi
  • cf_xarray >=0.5.1
  • esmpy >=8.1.0
  • geopandas *
  • netcdf4 *
  • numpy *
  • pandas *
  • scipy *
  • shapely *
  • tables *
  • xarray *
  • xesmf >=0.5.2
.github/workflows/optional/linting.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
  • pre-commit/action v2.0.0 composite
.github/workflows/release.yaml actions
  • actions/checkout v4 composite
  • actions/setup-python v4 composite
  • pypa/gh-action-pypi-publish master composite
.github/workflows/test.yaml actions
  • actions/checkout v4 composite
  • codecov/codecov-action v3 composite
  • mamba-org/provision-with-micromamba main composite
environment.yml conda
  • cf_xarray
  • geopandas >=0.12.0
  • netcdf4
  • numpy
  • pandas
  • pytables
  • pytest
  • scipy
  • xarray
  • xesmf >=0.7.1
docs/environment.yml pypi
  • numpydoc ==1.1.0
  • sphinx_rtd_theme ==0.5.1