RangeExtractor

A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.

https://github.com/asinghvi17/rangeextractor.jl

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.6%) to scientific vocabulary

Keywords

big-data io raster
Last synced: 6 months ago · JSON representation ·

Repository

A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.

Basic Info
Statistics
  • Stars: 7
  • Watchers: 2
  • Forks: 0
  • Open Issues: 3
  • Releases: 0
Topics
big-data io raster
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation

README.md

RangeExtractor

Stable Dev Build Status

RangeExtractor.jl is a package for efficiently extracting and operating on subsets of large (out-of-memory) arrays, and is optimized for use with arrays that have very high load time.

Installation

```julia using Pkg Pkg.add("RangeExtractor")

using RangeExtractor ```

Quick Start

```julia using RangeExtractor

Create sample array

array = ones(20, 20)

Define regions of interest, as ranges of indices.

RangeExtractor only accepts tuples of unit ranges.

ranges = [ (1:4, 1:4), (9:20, 11:20), (1:15, 11:20), (11:20, 1:10) ]

Define tiling scheme (10x10 tiles)

tiling_strategy = FixedGridTiling{2}(10)

Extract results, by invoking extract with:

- a function that takes an array and returns some value.

- a do block, which is a convenient way to provide an anonymous function.

- a TileOperation, which is a more flexible way to provide an operation.

here, we use a do block to sum the values in each range.

results = extract(array, ranges; strategy = tiling_strategy) do A sum(A) end ```

Key features

  • Multi-threaded, asynchronous processing: extract data from multiple tiles in parallel, and apply the operation to each tile in parallel.
  • Split computations efficiently across tiles, choose whether to materialize the whole range requested or reduce sections by some intermediate product.
  • Flexible tiling schemes: define your own tiling scheme that encodes your knowledge of the data.
  • Completely flexible operations.

RangeExtractor.jl also integrates with Rasters.jl, so you can call Rasters.zonal(f, raster, strategy; of = geoms, ...) to use RangeExtractor to accelerate your zonal computations.

Generic to any Array

RangeExtractor.jl is designed to be generic to any array type, as long as it supports AbstractArray-like indexing.

Here's an example of using RangeExtractor.jl to calculate zonal statistics on a raster dataset, using a custom operation. This is faster single-threaded than Rasters.jl is multithreaded, since it can split computation a

```julia using RangeExtractor using Rasters, ArchGDAL using RasterDataSources, NaturalEarth import GeoInterface as GI

Load raster dataset

ras = Raster(WorldClim{Climate}, :tmin, month=1)

Get country polygons

countries = naturalearth("admin0countries", 10)

Convert extents to index ranges

ranges = Rasters.dims2indices.((ras,), Rasters.Touches.(GI.extent.(countries.geometry)))

Define tiling scheme

strategy = FixedGridTiling{2}(100)

Define zonal statistics operation.

Here, we use a TileOperation to define a fully custom operation.

- contained is applied to each range that is fully contained within a tile,

and returns the final result for that range.

- shared is applied to each range that is partially contained or shared with another tile,

and returns some intermediate result that is stored.

- combine is applied to the results of all the shared operations for a range,

and returns the final result for that range.

op = TileOperation( contained = (x, meta) -> zonal(sum, x; of=meta), shared = (x, meta) -> zonal(sum, x; of=meta), combine = (results, args...) -> sum(results) )

Calculate zonal statistics

results = RangeExtractor.extract( op, # the operation to perform ras, # the raster to extract from ranges, # the ranges to extract countries.geometry; # the "metadata" - in this case, the polygons to calculate zonal statistics over strategy = strategy # the tiling strategy to use ) ```

Similar approaches elsewhere

  • exactextract in R and Python has a somewhat similar strategy for operating on large, out-of-memory rasters, but it is forced to keep all vector statistics materialized in memory. See https://isciences.github.io/exactextract/performance.html#the-raster-sequential-strategy. It does not support multithreading, or flexible user-defined operations.

Acknowledgements

This effort was funded by the NASA MEaSUREs program in contribution to the Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project (https://its-live.jpl.nasa.gov/).

Owner

  • Name: Anshul Singhvi
  • Login: asinghvi17
  • Kind: user
  • Location: New York, NY
  • Company: Columbia University

BA student in Applied Physics, graduating May 2022.

Citation (CITATION.bib)

@misc{RangeExtractor.jl,
	author  = {Anshul Singhvi <anshulsinghvi@gmail.com> and contributors},
	title   = {RangeExtractor.jl},
	url     = {https://github.com/asinghvi17/RangeExtractor.jl},
	version = {v1.0.0-DEV},
	year    = {2024},
	month   = {11}
}

GitHub Events

Total
  • Create event: 2
  • Commit comment event: 4
  • Issues event: 11
  • Watch event: 8
  • Delete event: 2
  • Issue comment event: 33
  • Push event: 60
  • Pull request event: 6
Last Year
  • Create event: 2
  • Commit comment event: 4
  • Issues event: 11
  • Watch event: 8
  • Delete event: 2
  • Issue comment event: 33
  • Push event: 60
  • Pull request event: 6

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 8
  • Total pull requests: 3
  • Average time to close issues: 1 day
  • Average time to close pull requests: about 3 hours
  • Total issue authors: 4
  • Total pull request authors: 2
  • Average comments per issue: 4.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 2
Past Year
  • Issues: 8
  • Pull requests: 3
  • Average time to close issues: 1 day
  • Average time to close pull requests: about 3 hours
  • Issue authors: 4
  • Pull request authors: 2
  • Average comments per issue: 4.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 2
Top Authors
Issue Authors
  • asinghvi17 (3)
  • alex-s-gardner (3)
  • felixcremer (1)
  • JuliaTagBot (1)
Pull Request Authors
  • dependabot[bot] (3)
  • asinghvi17 (2)
Top Labels
Issue Labels
Pull Request Labels
dependencies (3) github_actions (1)

Packages

  • Total packages: 1
  • Total downloads: unknown
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
juliahub.com: RangeExtractor

A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent repos count: 3.2%
Average: 9.8%
Dependent packages count: 16.3%
Last synced: 7 months ago

Dependencies

.github/workflows/CI.yml actions
  • actions/checkout v4 composite
  • codecov/codecov-action v4 composite
  • julia-actions/cache v2 composite
  • julia-actions/julia-buildpkg v1 composite
  • julia-actions/julia-docdeploy v1 composite
  • julia-actions/julia-processcoverage v1 composite
  • julia-actions/julia-runtest v1 composite
  • julia-actions/setup-julia v2 composite
.github/workflows/CompatHelper.yml actions
.github/workflows/TagBot.yml actions
  • JuliaRegistries/TagBot v1 composite