sim2dpredictr

R package for simulating scalar outcomes using spatial predictors.

https://github.com/jmleach-bst/sim2dpredictr

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
    1 of 2 committers (50.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.2%) to scientific vocabulary
Last synced: 9 months ago · JSON representation

Repository

R package for simulating scalar outcomes using spatial predictors.

Basic Info
  • Host: GitHub
  • Owner: jmleach-bst
  • License: gpl-3.0
  • Language: R
  • Default Branch: main
  • Homepage:
  • Size: 344 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 6 years ago · Last pushed over 2 years ago
Metadata Files
Readme License

README.md

sim2Dpredictr

<!-- CRAN checks --> <!-- R build status --> <!-- CRAN_Status_Badge --> R-CMD-check <!-- badges: end -->

The goal of sim2Dpredictr is to facilitate straightforward simulation of spatially dependent predictors (continuous or binary), which may then be used to simulate continuous, binary, count, or categorical ($> 2$ categories) outcomes within a (generalized) linear model framework. A real-world example is when using medical images to model/predict (scalar) clinical outcomes; such a scenario motivated the development of sim2Dpredictr, which was used to simulate data to evaluate the performance of methods for high-dimensional data analysis and prediction (Leach, Aban, and Yi 2022; Leach et al. 2022).

In the first step, we simulate the predictors, i.e., the $\mathbf{X}i$ part of a GLM, $g(E[Yi]) = \mathbf{X}_i\mathbf{\beta}$, where $g(\cdot)$ is an appropriate link function.

Continuous predictors are simulated using Multivariate Normal (MVN) distributions with a focus on specific correlation structures; alternatively, one can specify conditional dependence via a precision matrix, specifically for a Conditional Autoregressive (CAR) model. Tools are included for easily constructing and taking the Cholesky decomposition of a covariance or precision matrix with either base R or the R package spam, which makes this process faster when the matrix is sparse. The Boolean Model and thresholding of MVN’s are used to simulate spatially dependent binary maps. The package also includes a tool for easily specifying a parameter vector with spatially clustered non-zero elements. These simulation tools are designed for, but not limited to, testing the performance of variable selection methods when predictors are spatially correlated.

In the second step we use the predictor vectors $\mathbf{X}i$ to generate scalar outcomes, i.e., the $Yi$ part of the GLM. The default approach is to use the inverse link function to define subject specific means, $\mui = E[Yi] = g^{-1}(\mathbf{X}i\mathbf{\beta})$. For Normally distributed outcomes, $g(\cdot)$ is the identity link, so $\mui = \mathbf{X}i\mathbf{\beta}$ and a separate dispersion parameter, $\sigma^2$, specifies the variance. In general, the variance may be a function of the mean, as well as some other dispersion parameter. We can draw directly from the desired distributions using (some function of) $\mui$ (and if necessary, dispersion, $\sigma^2$) as the parameter(s) for that distribution to obtain outcomes, $Y_i$. Alternatively, outcomes can be initially generated as continuous and then a threshold applied to obtain binary/categorical data if using the inverse link function is too computationally expensive.

Installation

sim2Dpredictr is available on CRAN:

r install.packages("sim2Dpredictr")

You can install the latest version of sim2Dpredictr from GitHub with:

r devtools::install_github("jmleach-bst/sim2Dpredictr")

Example

A simple demonstration is as follows; suppose each subject has a $5 \times 5$ standardized continuous-valued predictor image, and a binary outcome. We can generate a spatial cluster of non-zero parameter values with beta_builder(), simulate and take the Cholesky decomposition of a correlation (or covariance) matrix with chol_s2Dp(), and generate both the images and outcomes with sim_Y_MVN_X().

``` r library(sim2Dpredictr)

Construct spatially clusterd non-zero parameters.

Bex <- sim2Dpredictr::beta_builder(row.index = c(1, 1, 2), col.index = c(1, 2, 1), im.res = c(3, 3), B0 = 0, B.values = rep(1, 3))

Construct and take Cholesky decomposition of correlation matrix.

Rex <- sim2Dpredictr::chol_s2Dp(corr.structure = "ar1", im.res = c(3, 3), rho = 0.5, use.spam = TRUE)

Simulate a dataset with spatially dependent design matrix and binary outcomes.

sim.dat <- sim2Dpredictr::simYMVN_X(N = 3, B = Bex$B, R = Rex$R, S = Rex$S, dist = "binomial")

sim.dat

> Y X1 X2 X3 X4 X5 X6 X7

> 1 0 0.5198678 -1.1035062 -0.805563 -1.3767652 -1.70525968 -0.6388318 -1.5858393

> 2 0 0.1798315 0.5728877 1.463024 -0.9348662 -0.05594379 0.8592733 0.2619289

> 3 1 0.1996273 1.7450479 2.500030 1.4882150 1.61903239 1.2503542 0.3439712

> X8 X9 subjectID

> 1 -1.72102736 0.06016316 1

> 2 0.08300277 -0.28904666 2

> 3 0.63695327 0.85655952 3

```

Once the dependence framework and non-zero parameter vector is set, sim_Y_MVN_X() can be used to draw as many data sets as necessary, upon each of which variable selection methods are applied; summaries from each analyzed data set can be obtained and then used to evaluate variable selection performance. The documentation provides details about how to use these functions (and others) to create desired simulations.

References

Leach, Justin M, Inmaculada Aban, and Nengjun Yi. 2022. “Incorporating Spatial Structure into Inclusion Probabilities for Bayesian Variable Selection in Generalized Linear Models with the Spike-and-Slab Elastic Net.” *Journal of Statistical Planning and Inference* 217: 141–52. .
Leach, Justin M, Lloyd J Edwards, Rajesh Kana, Kristina Visscher, Nengjun Yi, and Inmaculada Aban. 2022. “The Spike-and-Slab Elastic Net as a Classification Tool in Alzheimer’s Disease.” *PLoS ONE* 17: e0262367. .

Owner

  • Name: Justin M Leach
  • Login: jmleach-bst
  • Kind: user
  • Location: Birmingham, AL
  • Company: University of Alabama at Birmingham

Assistant Professor of Biostatistics

GitHub Events

Total
Last Year

Committers

Last synced: about 3 years ago

All Time
  • Total Commits: 88
  • Total Committers: 2
  • Avg Commits per committer: 44.0
  • Development Distribution Score (DDS): 0.023
Top Committers
Name Email Commits
Justin Leach j****h@u****u 86
jmleach-bst 5****t@u****m 2
Committer Domains (Top 20 + Academic)
uab.edu: 1

Issues and Pull Requests

Last synced: 11 months ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 152 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: sim2Dpredictr

Simulate Outcomes Using Spatially Dependent Design Matrices

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 152 Last month
Rankings
Forks count: 28.8%
Dependent packages count: 29.8%
Stargazers count: 35.2%
Dependent repos count: 35.5%
Average: 41.2%
Downloads: 77.0%
Maintainers (1)
Last synced: 9 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • MASS * imports
  • Rdpack * imports
  • car * imports
  • dplyr * imports
  • ggplot2 * imports
  • magrittr * imports
  • matrixcalc * imports
  • spam >= 2.2 imports
  • tibble * imports
  • tidyverse * imports
  • knitr * suggests
  • rmarkdown * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite