geostan

geostan: An R package for Bayesian spatial analysis - Published in JOSS (2022)

https://github.com/connordonegan/geostan

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 9 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    2 of 4 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Keywords

bayesian bayesian-inference bayesian-statistics epidemiology modeling public-health r r-package rspatial rstats spatial stan

Scientific Fields

Engineering Computer Science - 60% confidence
Last synced: 4 months ago · JSON representation

Repository

Bayesian spatial analysis

Basic Info
Statistics
  • Stars: 80
  • Watchers: 2
  • Forks: 6
  • Open Issues: 0
  • Releases: 17
Topics
bayesian bayesian-inference bayesian-statistics epidemiology modeling public-health r r-package rspatial rstats spatial stan
Created almost 6 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  eval = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  fig.width = 7,
  fig.height = 3,
  fig.align = 'center'
)
```


## geostan: Bayesian spatial analysis

The [geostan](https://connordonegan.github.io/geostan/) R package supports a complete spatial analysis workflow with Bayesian models for areal data, including a suite of functions for visualizing spatial data and model results. geostan models were built using [Stan](https://mc-stan.org), a state-of-the-art platform for Bayesian modeling. 

Introductions to the software can be found at [r-spatial.org](https://r-spatial.org/r/2024/08/02/geostan-introduction.html) and in the package [vignettes](https://connordonegan.github.io/geostan/articles/index.html).

Features include:

* **Spatial regression and disease mapping** Statistical models for data recorded across areal units (states, counties, or census tracts) or networks, including spatial econometric models.
* **Spatial analysis tools** Tools for visualizing and measuring spatial autocorrelation and map patterns, for exploratory analysis and model diagnostics.  
* **Observational uncertainty** Incorporate information on data reliability, such as standard errors of American Community Survey estimates, into any geostan model.
* **Missing and Censored observations** Vital statistics and disease surveillance systems like CDC Wonder censor case counts that fall below a threshold number; geostan can model disease or mortality risk for small areas with censored observations or with missing observations.
* **The RStan ecosystem** Interfaces easily with many high-quality R packages for Bayesian modeling.
* **Custom spatial models** Tools for building custom spatial or network models in Stan.

For public health research, geostan complements the [surveil](https://connordonegan.github.io/surveil/) R package for the study of time trends in disease incidence or mortality data.

## Installation

There are two ways to install geostan: directly from the package github repository or from the Comprehensive R Archive Network (CRAN). 
 
### From CRAN

Using your R console, you can install geostan from CRAN:

``` r
install.packages("geostan")
```

For most users, installing from CRAN is the recommended way to obtain geostan.

### From github

You can also install geostan from github:

```r
if (!require('devtools')) install.packages('devtools')
devtools::install_github("connordonegan/geostan")
```

This requires compilation of the Stan models. If you are using Windows and installing with `install_github`, you may need to install [Rtools](https://cran.r-project.org/bin/windows/Rtools/) first (this is not needed when installing from CRAN). To install Rtools:

1. Visit the Rtools site: https://cran.r-project.org/bin/windows/Rtools/
2. Select the version that corresponds to the version of R that you have installed (e.g., R 4.3).
3. After selecting the correct version, look for the "Install Rtools" section (just below the introductory text) and click on the "installer" to download it. For example, for Rtools43 (for R version 4.3), click on "Rtools43 installer."
4. Go to the `.exe` file you just downloaded and double-click to begin installation of Rtools.

If you are using Mac and installing with `install_github` then you may need to install Xcode Command Line Tools first.

## Support

All functions and methods are documented (with examples) on the website [reference](https://connordonegan.github.io/geostan/reference/index.html) page. See the package [vignettes](https://connordonegan.github.io/geostan/articles/index.html) for more on exploratory spatial analysis, spatial measurement error models, spatial regression with raster layers, and building custom spatial model in Stan.

To ask questions, report a bug, or discuss ideas for improvements or new features please visit the [issues](https://github.com/ConnorDonegan/geostan/issues) page, start a [discussion](https://github.com/ConnorDonegan/geostan/discussions), or submit a [pull request](https://github.com/ConnorDonegan/geostan/pulls).

## Usage

Load the package and the `georgia` county mortality data set:
```{r}
library(geostan)
data(georgia)
```

This has county population and mortality data by sex for ages 55-64, and for the period 2014-2018. As is common for public access data, some of the observations missing because the CDC has censored them to protect privacy. 

The `sp_diag` function provides visual summaries of spatial data, including a histogram, Moran scatter plot, and map. The Moran scatter plot displays the values against a summary of their neighboring values, so that the slope of the regression line gives a measure of their degree of autocorrelation.

Here is a quick visual summary of crude female mortality rates (as deaths per 10,000):

```{r fig.width = 8}
# create adjacency matrix ("B" is for binary)
C <- shape2mat(georgia, style = "B")

# crude mortality rate per 10,000 at risk
mortality_rate <- georgia$rate.female * 10e3

# quick spatial diagnostics
sp_diag(mortality_rate, georgia, w = C, name = "Mortality")
```

Mortality rates and other health statistics for counties are, in many cases, highly unstable estimates that cannot be relied upon for public advisories or inference (due to small population sizes). Hence, we need models to make inferences from small area data.

The following code fits a spatial conditional autoregressive (CAR) model to female county mortality data. These models are used for estimating disease risk in small areas like counties, and for analyzing covariation of health outcomes with other area variables. The R syntax for fitting the models is similar to using `lm` or `glm`. We provide the population at risk (the denominator for mortality rates) as an offset term, using the log-transform.

In our Georgia mortality data, three of the observations are missing because they have been censored; per CDC criteria, this means that there were 9 or fewer deaths in those counties. By using the `censor_point` argument and setting it to `censor_point = 9`, we can easily obtain estimates for the censored counties (along with all the others) using models account for the censoring process:

```{r}
# prepare a list of data for CAR models in Stan
cars <- prep_car_data(C)

# fit the model to female mortality rates
fit <- stan_car(deaths.female ~ offset(log(pop.at.risk.female)),
                censor_point = 9,
		data = georgia,
		car_parts = cars,
		family = poisson(),
		iter = 1e3, # no. MCMC samples
		quiet = TRUE) # to silence printing
```

Passing a fitted model to the `sp_diag` function will return a set of diagnostics for spatial models:

```{r fig.width = 8}
sp_diag(fit, georgia)
```

The `print` method returns a summary of the probability distributions for model parameters, as well as Markov chain Monte Carlo (MCMC) diagnostics from Stan (Monte Carlo standard errors of the mean `se_mean`, effective sample size `n_eff`, and the R-hat statistic `Rhat`):

```{r}
print(fit)
```

To extract estimates of the county mortality rates from this, we apply the `fitted` method - in this case, the fitted values from the model are the estimates of the county mortality rates. Multiplying them by 10,000 gives mortality rate per 10,000 at risk:

```{r}
# mortality rates per 10,000 at risk
mortality_est <- fitted(fit) * 10e3

# display rates with county names
county_name <- georgia$NAME
head( cbind(county_name, mortality_est) )
```

The mortality estimates are stored in the column named "mean", and the limits of the 95\% credible interval are found in the columns "2.5%" and "97.5%". Here we create a map of estimates (with some help from `sf` package):

```{r fig.width = 7, fig.height = 4.5}
library(sf)

# put estimates into bins for map colors
x <- mortality_est$mean
brks <- quantile(x, probs = c(0, 0.2, 0.4, 0.6, 0.8, 1)) 
est_cut <- cut(x, breaks = brks, include.lowest = TRUE)
  
# assign colors to values
rank <- as.numeric( est_cut )  
pal_fun <- colorRampPalette( c("#5D74A5FF", "gray90", "#A8554EFF") )
pal <- pal_fun( max(rank) )
colors <-  pal[ rank ]

# set plot margins
og=par(mar=rep(1, 4))

# get boundaries
geom <- sf::st_geometry(georgia)

# map  estimates
plot(geom,
    lwd = 0.2,
    col = colors)

# legend
legend("right",
     fill = pal,
     title = 'Mortality per 10,000',
     legend = levels(est_cut),
     bty = 'n'
)

mtext('County mortality rates per 10,000, Georgia women ages 55-64', side = 3, font = 2)
# reset margins
par(og)
```

Using the credible intervals, we can complement our map with a point-interval plot:

```{r fig.height = 6.5}
# order counties by mortality rate
index <- order(mortality_est$mean, decreasing = TRUE)
dat <- mortality_est[index, ]

# gather estimate with credible interval (95%)
est <- dat$mean
lwr <- dat$`2.5%`
upr <- dat$`97.5%`
y <- seq_along(county_name)
x_lim <- c(min(lwr), max(upr)) |>
      round()

og=par(mar = c(3, 0, 0, 0))

# points
plot(est,
     y,
     pch = 5,
     col = 'gray50',
     bty = 'L',
     axes = FALSE,
     xlim = x_lim,
     ylab = NA,
     xlab = NA)

# intervals
segments(x0 = lwr, x1 = upr,
         y0 = y, y1 = y,
	 col = colors[ index ])

# x axis
axis(1, at = seq(x_lim[1], x_lim[2], by = 20))
mtext('County mortality rates per 10,000, Georgia women ages 55-64', side = 1, line = 2)
par(og)
```

More details and demonstrations can be found in the package [help pages](https://connordonegan.github.io/geostan/reference/index.html) and [vignettes](https://connordonegan.github.io/geostan/articles/index.html).

## Citing geostan

If you use geostan in published work, please include a citation.

Donegan, Connor (2022) "geostan: An R package for Bayesian spatial analysis" *The Journal of Open Source Software*. 7, no. 79: 4716. [https://doi.org/10.21105/joss.04716](https://doi.org/10.21105/joss.04716).

[![DOI](https://joss.theoj.org/papers/10.21105/joss.04716/status.svg)](https://doi.org/10.21105/joss.04716)

```
@Article{,
  title = {{geostan}: An {R} package for {B}ayesian spatial analysis},
  author = {Connor Donegan},
  journal = {The Journal of Open Source Software},
  year = {2022},
  volume = {7},
  number = {79},
  pages = {4716},
  doi = {10.21105/joss.04716},
}
```

Owner

  • Name: Connor Donegan
  • Login: ConnorDonegan
  • Kind: user
  • Company: UT Dallas & UTSW Medical Center

JOSS Publication

geostan: An R package for Bayesian spatial analysis
Published
November 16, 2022
Volume 7, Issue 79, Page 4716
Authors
Connor Donegan ORCID
Geography and Geospatial Information Sciences, The University of Texas at Dallas, Peter O'Donnell Jr. School of Public Health, University of Texas Southwestern Medical Center
Editor
Mehmet Hakan Satman ORCID
Tags
spatial data survey data Bayesian inference

GitHub Events

Total
  • Create event: 4
  • Release event: 1
  • Issues event: 6
  • Watch event: 18
  • Delete event: 6
  • Issue comment event: 6
  • Push event: 10
  • Pull request event: 1
Last Year
  • Create event: 4
  • Release event: 1
  • Issues event: 6
  • Watch event: 18
  • Delete event: 6
  • Issue comment event: 6
  • Push event: 10
  • Pull request event: 1

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 433
  • Total Committers: 4
  • Avg Commits per committer: 108.25
  • Development Distribution Score (DDS): 0.023
Past Year
  • Commits: 30
  • Committers: 1
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Connor Donegan c****n@u****u 423
Andrew Johnson a****n@a****m 8
amytims a****s@s****u 1
Mitzi Morris m****i@p****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 15
  • Total pull requests: 7
  • Average time to close issues: 2 months
  • Average time to close pull requests: 5 days
  • Total issue authors: 9
  • Total pull request authors: 4
  • Average comments per issue: 4.07
  • Average comments per pull request: 1.43
  • Merged pull requests: 7
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 3 months
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 2.2
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • ConnorDonegan (4)
  • wcjochem (4)
  • anmiri (1)
  • barryrowlingson (1)
  • rsbivand (1)
  • agila5 (1)
  • andtheWings (1)
  • nipnipj (1)
  • bgoodri (1)
Pull Request Authors
  • jbytecode (3)
  • amytims (2)
  • andrjohns (2)
  • mitzimorris (1)
  • ConnorDonegan (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 548 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 16
  • Total maintainers: 1
cran.r-project.org: geostan

Bayesian Spatial Analysis

  • Versions: 16
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 548 Last month
Rankings
Stargazers count: 7.4%
Forks count: 11.3%
Average: 23.6%
Dependent packages count: 29.8%
Downloads: 34.0%
Dependent repos count: 35.5%
Maintainers (1)
Last synced: 4 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.4.0 depends
  • MASS * imports
  • Matrix >= 1.3 imports
  • Rcpp >= 0.12.0 imports
  • RcppParallel >= 5.0.1 imports
  • ggplot2 >= 3.0.0 imports
  • graphics * imports
  • gridExtra * imports
  • methods * imports
  • rstan >= 2.18.1 imports
  • rstantools >= 2.1.1 imports
  • sf * imports
  • signs * imports
  • spdep >= 1.1 imports
  • stats * imports
  • truncnorm * imports
  • utils * imports
  • bayesplot * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests