RangeExtractor
A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.6%) to scientific vocabulary
Keywords
Repository
A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.
Basic Info
- Host: GitHub
- Owner: asinghvi17
- License: mit
- Language: Julia
- Default Branch: main
- Homepage: https://asinghvi17.github.io/RangeExtractor.jl/
- Size: 390 KB
Statistics
- Stars: 7
- Watchers: 2
- Forks: 0
- Open Issues: 3
- Releases: 0
Topics
Metadata Files
README.md
RangeExtractor
RangeExtractor.jl is a package for efficiently extracting and operating on subsets of large (out-of-memory) arrays, and is optimized for use with arrays that have very high load time.
Installation
```julia using Pkg Pkg.add("RangeExtractor")
using RangeExtractor ```
Quick Start
```julia using RangeExtractor
Create sample array
array = ones(20, 20)
Define regions of interest, as ranges of indices.
RangeExtractor only accepts tuples of unit ranges.
ranges = [ (1:4, 1:4), (9:20, 11:20), (1:15, 11:20), (11:20, 1:10) ]
Define tiling scheme (10x10 tiles)
tiling_strategy = FixedGridTiling{2}(10)
Extract results, by invoking extract with:
- a function that takes an array and returns some value.
- a do block, which is a convenient way to provide an anonymous function.
- a TileOperation, which is a more flexible way to provide an operation.
here, we use a do block to sum the values in each range.
results = extract(array, ranges; strategy = tiling_strategy) do A sum(A) end ```
Key features
- Multi-threaded, asynchronous processing: extract data from multiple tiles in parallel, and apply the operation to each tile in parallel.
- Split computations efficiently across tiles, choose whether to materialize the whole range requested or reduce sections by some intermediate product.
- Flexible tiling schemes: define your own tiling scheme that encodes your knowledge of the data.
- Completely flexible operations.
RangeExtractor.jl also integrates with Rasters.jl, so you can call Rasters.zonal(f, raster, strategy; of = geoms, ...) to use RangeExtractor to accelerate your zonal computations.
Generic to any Array
RangeExtractor.jl is designed to be generic to any array type, as long as it supports AbstractArray-like indexing.
Here's an example of using RangeExtractor.jl to calculate zonal statistics on a raster dataset, using a custom operation. This is faster single-threaded than Rasters.jl is multithreaded, since it can split computation a
```julia using RangeExtractor using Rasters, ArchGDAL using RasterDataSources, NaturalEarth import GeoInterface as GI
Load raster dataset
ras = Raster(WorldClim{Climate}, :tmin, month=1)
Get country polygons
countries = naturalearth("admin0countries", 10)
Convert extents to index ranges
ranges = Rasters.dims2indices.((ras,), Rasters.Touches.(GI.extent.(countries.geometry)))
Define tiling scheme
strategy = FixedGridTiling{2}(100)
Define zonal statistics operation.
Here, we use a TileOperation to define a fully custom operation.
- contained is applied to each range that is fully contained within a tile,
and returns the final result for that range.
- shared is applied to each range that is partially contained or shared with another tile,
and returns some intermediate result that is stored.
- combine is applied to the results of all the shared operations for a range,
and returns the final result for that range.
op = TileOperation( contained = (x, meta) -> zonal(sum, x; of=meta), shared = (x, meta) -> zonal(sum, x; of=meta), combine = (results, args...) -> sum(results) )
Calculate zonal statistics
results = RangeExtractor.extract( op, # the operation to perform ras, # the raster to extract from ranges, # the ranges to extract countries.geometry; # the "metadata" - in this case, the polygons to calculate zonal statistics over strategy = strategy # the tiling strategy to use ) ```
Similar approaches elsewhere
exactextractin R and Python has a somewhat similar strategy for operating on large, out-of-memory rasters, but it is forced to keep all vector statistics materialized in memory. See https://isciences.github.io/exactextract/performance.html#the-raster-sequential-strategy. It does not support multithreading, or flexible user-defined operations.
Acknowledgements
This effort was funded by the NASA MEaSUREs program in contribution to the Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project (https://its-live.jpl.nasa.gov/).
Owner
- Name: Anshul Singhvi
- Login: asinghvi17
- Kind: user
- Location: New York, NY
- Company: Columbia University
- Repositories: 5
- Profile: https://github.com/asinghvi17
BA student in Applied Physics, graduating May 2022.
Citation (CITATION.bib)
@misc{RangeExtractor.jl,
author = {Anshul Singhvi <anshulsinghvi@gmail.com> and contributors},
title = {RangeExtractor.jl},
url = {https://github.com/asinghvi17/RangeExtractor.jl},
version = {v1.0.0-DEV},
year = {2024},
month = {11}
}
GitHub Events
Total
- Create event: 2
- Commit comment event: 4
- Issues event: 11
- Watch event: 8
- Delete event: 2
- Issue comment event: 33
- Push event: 60
- Pull request event: 6
Last Year
- Create event: 2
- Commit comment event: 4
- Issues event: 11
- Watch event: 8
- Delete event: 2
- Issue comment event: 33
- Push event: 60
- Pull request event: 6
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 8
- Total pull requests: 3
- Average time to close issues: 1 day
- Average time to close pull requests: about 3 hours
- Total issue authors: 4
- Total pull request authors: 2
- Average comments per issue: 4.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 2
Past Year
- Issues: 8
- Pull requests: 3
- Average time to close issues: 1 day
- Average time to close pull requests: about 3 hours
- Issue authors: 4
- Pull request authors: 2
- Average comments per issue: 4.5
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 2
Top Authors
Issue Authors
- asinghvi17 (3)
- alex-s-gardner (3)
- felixcremer (1)
- JuliaTagBot (1)
Pull Request Authors
- dependabot[bot] (3)
- asinghvi17 (2)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
juliahub.com: RangeExtractor
A performant way to extract subsections of arrays, under a tiling scheme. Meant for arrays with slow I/O.
- Homepage: https://asinghvi17.github.io/RangeExtractor.jl/
- Documentation: https://docs.juliahub.com/General/RangeExtractor/stable/
- License: MIT
-
Latest release: 0.1.1
published about 1 year ago
Rankings
Dependencies
- actions/checkout v4 composite
- codecov/codecov-action v4 composite
- julia-actions/cache v2 composite
- julia-actions/julia-buildpkg v1 composite
- julia-actions/julia-docdeploy v1 composite
- julia-actions/julia-processcoverage v1 composite
- julia-actions/julia-runtest v1 composite
- julia-actions/setup-julia v2 composite
- JuliaRegistries/TagBot v1 composite