arcpbf

Rust crate and R package for processing Esri Protocol Buffers

https://github.com/r-arcgis/arcpbf

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary

Keywords

agol arcgis r-spatial
Last synced: 6 months ago · JSON representation

Repository

Rust crate and R package for processing Esri Protocol Buffers

Basic Info
  • Host: GitHub
  • Owner: R-ArcGIS
  • License: apache-2.0
  • Language: Rust
  • Default Branch: main
  • Homepage: https://r.esri.com/arcpbf/
  • Size: 65 MB
Statistics
  • Stars: 10
  • Watchers: 2
  • Forks: 0
  • Open Issues: 1
  • Releases: 0
Topics
agol arcgis r-spatial
Created over 2 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---


[![R-CMD-check](https://github.com/R-ArcGIS/arcpbf/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/R-ArcGIS/arcpbf/actions/workflows/R-CMD-check.yaml)


```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

options(max.print = 100)
```

# arcpbf

`{arcpbf}` is an R package that processes [Esri FeatureCollection Protocol Buffers](https://github.com/Esri/arcgis-pbf/tree/main/proto/FeatureCollection).
It is written in Rust and powered by the [extendr](https://github.com/extendr/extendr) library.

arcpbf has functions for reading protocol buffer (pbf) results from an ArcGIS 
REST API result. pbf results are returned when `f=pbf` in a [query](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer-.htm). 

The package is extremely lightweight and fast.

Limitation: this package does not support Z and M dimensions at this point. 


## TL;DR 

- `open_pbf()` will read a FeatureCollection `pbf` file into a raw vector
- `read_pbf()` will read a FeatureCollection `pbf` file _and_ process it with 
- `resp_body_pbf()` and `resps_data_pbf()` process `httr2_response` objects
  with FeatureCollection pbf bodies
- `process_pbf()` will process a raw vector or a list of raw vectors
- `post_process_pbf()` will apply post processing steps to the results of 
  `process_pbf()`
  - set `use_sf = TRUE` to return an `sf` object if possible. Applied by 
    default in `read_pbf()`, `resp_body_pbf()` and `resps_data_pbf()`.

> ***Developer Note***: Rust must be installed to compile the package. Run the one line 
installation instructions at https://rustup.rs/. To verify your Rust installation
is compatible, run `rextendr::rust_sitrep()`. That's it. 

### PBF support

Note that _only_ the FeatureCollection pbf specification is supported by arcpbf.
If you want to process OSM pbf files use [`osmextract::oe_read()`](https://docs.ropensci.org/osmextract/reference/oe_read.html). 
Or, if you want to create and read arbitrary protocol buffers directly in R,
use [`RprotoBuf`](https://cran.r-project.org/web/packages/RProtoBuf).

## Basic usage 


In most cases, we will be processing a protocol buffer directly from an http 
request created with [`{httr2}`](https://httr2.r-lib.org/).

```{r}
library(arcpbf)

# specify url to sent our request to
url <- "https://services.arcgis.com/P3ePLMYs2RVChkJx/arcgis/rest/services/ACS_Population_by_Race_and_Hispanic_Origin_Boundaries/FeatureServer/2/query?where=1=1&outFields=objectid&resultRecordCount=10&f=pbf&token="
req <- httr2::request(url)
resp <- httr2::req_perform(req)

resp
```
We can process request responses with `resp_body_pbf()`. Post-processing steps
are applied by default. The arguments `post_process` and `use_sf` are `TRUE` by 
default. 

```{r}
resp_body_pbf(resp)
```

### Multiple response objects

When running multiple requests in parallel using
`httr2::req_perform_parallel()` the responses are returned as a list of
responses. `resps_data_pbf()` processes the responses in a vectorized
manner.

```{r}
# create a list of requests
reqs <- replicate(5, req, simplify = FALSE)
# perform them in parallel
resps <- httr2::req_perform_parallel(reqs)

# process the responses 
resps_data_pbf(resps)
```
### Reading from a file 

In some cases you may have a file on disk that you want to process a pbf from. 
Use `read_pbf()` to do so. Again, post-processing steps are applied by default. 

```{r}
fp <- system.file("small-points.pbf", package = "arcpbf")
read_pbf(fp)
```


## FeatureCollection Result Types

There are three types of PBF FeatureCollection responses that may be
returned as a result of a [Feature Service Query request](https://developers.arcgis.com/rest/services-reference/enterprise/query-feature-service-layer-.htm).

- **Feature Results**:
  - the default query response type. Contains individual features with their 
    attributes and geometries if available.
- **Count Result**:
  - returned when `returnCountOnly=true` in an API request. Returned as a scalar
    integer vector.
- **Object ID Result**:
  - returned when `returnIdsOnly=true`. A `data.frame` containing object IDs 
    where the column name is set to the object ID field name of the feature 
    service. 
    
### Feature Results

Feature results can either omit geometry entirely, for example in the case of a
Table or when the query parameter `returnGeometry=false`, or include it. When
geometry is omitted entirely, the response is processed as a simple
`data.frame`. However, if the response does contain geometry, the response is a 
bit more complex.

Unprocessed feature results with geometries return a named list with 3 elements:

- `attributes`: 
  - a `data.frame` of the fields and their values
- `sr`: 
  - a named list with elements `wkt`, `wkid`, `latest_wkid`, `vcs_wkid`,
  and `latest_vcs_wkid`. These determine the coordinate reference system of the 
  response as well as the vertical coordinate reference system. 
- `geometry`: 
  - an `sfc` object _**without a computed bounding box or coordinate reference 
    system set**_ or a CRS set.

```{r}
# read an example pbf without post-processing
fc_fp <- system.file("small-points.pbf", package = "arcpbf")
res <- read_pbf(fc_fp, post_process = FALSE)

res
```

When post-processing is applied to a geometry Feature Result, the CRS is set
and the bounding box is computed. This requires the `sf` package to be available. 

```{r}
post_process_pbf(res)
```


## Lower level functions

The function `open_pbf()` will read a pbf file into a raw vector which can be 
passed to `process_pbf()`. In general you will not need this function, but it 
is handy for the sake of example. 

```{r}
pbf_raw <- open_pbf(fc_fp)
head(pbf_raw, 20)
```

This raw vector can be turned into an R object using `process_pbf()`. The output
_will not_ be post processed.

```{r}
res <- process_pbf(pbf_raw)
res
```

Post-processing can be applied to the result of `process_pbf()` using
`post_process_pbf()`.

```{r}
post_process_pbf(res)
```

`post_process_pbf()` can also be applied to a list of processed pbf responses.

```{r}
multi_res <- list(res, res, res)

post_process_pbf(multi_res)
```


## Benchmarking

Below is a bench mark that compares processing pbfs to the current approach of processing 
raw json in arcgislayers and arcgisutils. The below recreates the example from the README of arcgislayers. 

```{r}
jsn <- function() {
  json_reqs <- c(
    "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=0",
    "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=json&resultOffset=2001"
  )
  reqs <- lapply(json_reqs, httr2::request) 
  
  resps <- httr2::req_perform_parallel(reqs) |> 
    lapply(function(x) arcgisutils::parse_esri_json(httr2::resp_body_string(x))) 
  
  do.call(rbind.data.frame, resps) |> 
    sf::st_as_sf()
}

# protobuff processing 
pbf <- function() {
  
  pbf_reqs <- c(
    "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=0",
    "https://services.arcgis.com/P3ePLMYs2RVChkJx/ArcGIS/rest/services/USA_Counties_Generalized_Boundaries/FeatureServer/0/query?outFields=%2A&where=1%3D1&outSR=%7B%22wkid%22%3A4326%7D&returnGeometry=TRUE&token=&f=pbf&resultOffset=2001"
  )
  
  reqs <- lapply(pbf_reqs, httr2::request)
  
  httr2::req_perform_parallel(reqs) |> 
    resps_data_pbf()
}

bench::mark(
  jsn(),
  pbf(),
  check = FALSE,
  relative = TRUE,
  iterations = 5
)
```


## Internals 

Internally, there is a rust crate [`esripbf`](./src/rust/esripbf) which is a
a Rust library built with [`prost`](https://github.com/tokio-rs/prost) to handle the [FeatureCollection Protocol Buffer Specification](https://github.com/Esri/arcgis-pbf/tree/main/proto/FeatureCollection).


## Future Notes

Alternatively, it may make sense to write to a geoarrow array and convert to sfc 
using {wk}. These are just thoughts. 

Owner

  • Name: R-ArcGIS
  • Login: R-ArcGIS
  • Kind: organization

GitHub Events

Total
  • Issues event: 8
  • Watch event: 2
  • Issue comment event: 12
  • Push event: 10
  • Pull request event: 6
  • Create event: 2
Last Year
  • Issues event: 8
  • Watch event: 2
  • Issue comment event: 12
  • Push event: 10
  • Pull request event: 6
  • Create event: 2

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 55
  • Total Committers: 1
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 55
  • Committers: 1
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Josiah Parry j****y@g****m 55

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 9
  • Total pull requests: 5
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 43 minutes
  • Total issue authors: 5
  • Total pull request authors: 1
  • Average comments per issue: 6.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 9
  • Pull requests: 5
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 43 minutes
  • Issue authors: 5
  • Pull request authors: 1
  • Average comments per issue: 6.33
  • Average comments per pull request: 0.0
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • elipousson (3)
  • ryanzomorrodi (2)
  • JosiahParry (2)
  • JWilliamsonArch (1)
  • muschellij2 (1)
Pull Request Authors
  • JosiahParry (10)
Top Labels
Issue Labels
bug (2)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 672 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
cran.r-project.org: arcpbf

Process ArcGIS Protocol Buffer FeatureCollections

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 672 Last month
Rankings
Dependent packages count: 28.9%
Dependent repos count: 36.9%
Average: 50.8%
Downloads: 86.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
src/rust/Cargo.lock cargo
  • aho-corasick 1.1.2
  • anyhow 1.0.75
  • approx 0.5.1
  • autocfg 1.1.0
  • bitflags 1.3.2
  • bitflags 2.4.1
  • bytes 1.5.0
  • cfg-if 1.0.0
  • either 1.9.0
  • equivalent 1.0.1
  • errno 0.3.5
  • extendr-api 0.6.0
  • extendr-macros 0.6.0
  • fastrand 2.0.1
  • fixedbitset 0.4.2
  • geo-types 0.7.11
  • hashbrown 0.14.2
  • heck 0.4.1
  • home 0.5.5
  • indexmap 2.1.0
  • itertools 0.11.0
  • libR-sys 0.6.0
  • libc 0.2.149
  • libm 0.2.8
  • linux-raw-sys 0.4.10
  • log 0.4.20
  • memchr 2.6.4
  • multimap 0.8.3
  • num-traits 0.2.17
  • once_cell 1.18.0
  • paste 1.0.14
  • petgraph 0.6.4
  • prettyplease 0.2.15
  • proc-macro2 1.0.69
  • prost 0.12.1
  • prost-build 0.12.1
  • prost-derive 0.12.1
  • prost-types 0.12.1
  • quote 1.0.33
  • redox_syscall 0.4.1
  • regex 1.10.2
  • regex-automata 0.4.3
  • regex-syntax 0.8.2
  • rustix 0.38.21
  • serde 1.0.190
  • serde_derive 1.0.190
  • syn 2.0.38
  • tempfile 3.8.1
  • unicode-ident 1.0.12
  • which 4.4.2
  • windows-sys 0.48.0
  • windows-targets 0.48.5
  • windows_aarch64_gnullvm 0.48.5
  • windows_aarch64_msvc 0.48.5
  • windows_i686_gnu 0.48.5
  • windows_i686_msvc 0.48.5
  • windows_x86_64_gnu 0.48.5
  • windows_x86_64_gnullvm 0.48.5
  • windows_x86_64_msvc 0.48.5
src/rust/Cargo.toml cargo
src/rust/arcpbf/Cargo.toml cargo
src/rust/esripbf/Cargo.toml cargo
DESCRIPTION cran
  • rlang * imports
  • collapse >= 2.0.0 suggests
  • data.table * suggests
  • dplyr * suggests
  • httr2 * suggests
  • sf * suggests
.github/workflows/fedora.yaml actions
  • actions/checkout v3 composite
  • dtolnay/rust-toolchain 1.67.0 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite