holodeck

A Tidy Interface for Simulating Multivariate Data

https://github.com/aariq/holodeck

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
✓
Academic publication links
Links to: zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (19.2%) to scientific vocabulary

Keywords

multivariate-data simulated-data simulating-multivariate-data tidy-interface

Last synced: 11 months ago · JSON representation

Repository

A Tidy Interface for Simulating Multivariate Data

Basic Info

Host: GitHub
Owner: Aariq
License: other
Language: R
Default Branch: master
Size: 675 KB

Statistics

Stars: 12
Watchers: 0
Forks: 0
Open Issues: 7
Releases: 3

Topics

multivariate-data simulated-data simulating-multivariate-data tidy-interface

Created over 7 years ago · Last pushed almost 3 years ago

Metadata Files

Readme Changelog License

README.Rmd

---
output: github_document
---




[![R-CMD-check](https://github.com/Aariq/holodeck/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/Aariq/holodeck/actions/workflows/R-CMD-check.yaml)
[![CRAN](https://www.r-pkg.org/badges/version/holodeck)]( https://CRAN.R-project.org/package=holodeck) ![downloads](http://cranlogs.r-pkg.org/badges/grand-total/holodeck)
[![Codecov test coverage](https://codecov.io/gh/Aariq/holodeck/branch/master/graph/badge.svg)](https://app.codecov.io/gh/Aariq/holodeck?branch=master)
[![DOI](https://zenodo.org/badge/167047376.svg)](https://zenodo.org/badge/latestdoi/167047376)


```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```




# holodeck: A Tidy Interface For Simulating Multivariate Data

`holodeck` allows quick and simple creation of simulated multivariate data with variables that co-vary or discriminate between levels of a categorical variable.  The resulting simulated multivariate dataframes are useful for testing the performance of multivariate statistical techniques under different scenarios, power analysis, or just doing a sanity check when trying out a new multivariate method.

## Installation

From CRAN:
``` r
install.packages("holodeck)
```

Development version from r-universe:
``` r
install.packages('holodeck', repos = c('https://aariq.r-universe.dev', 'https://cloud.r-project.org'))
```

## Load packages

`holodeck` is built to work with `dplyr` functions, including `group_by()` and the pipe (` %>% `). `purrr` is helpful for iterating simulated data.  For these examples I'll use `ropls` for PCA and PLS-DA.

```{r example, message=FALSE, warning=FALSE}
library(holodeck)
library(dplyr)
library(tibble)
library(purrr)
library(ropls)
```

## Example 1: Investigating PCA and PLS-DA

Let's say we want to learn more about how principal component analysis (PCA) works.  Specifically, what matters more in terms of creating a principal component---variance or covariance of variables? To this end, you might create a dataframe with a few variables with high covariance and low variance and another set of variables with low covariance and high variance

### Generate data

```{R}
set.seed(925)
df1 <- 
  sim_covar(n_obs = 20, n_vars = 5, cov = 0.9, var = 1, name = "high_cov") %>%
  sim_covar(n_vars = 5, cov = 0.1, var = 2, name = "high_var") 
```

Explore covariance structure visually.  The diagonal is variance.

```{r}
df1 %>% 
  cov() %>%
  heatmap(Rowv = NA, Colv = NA, symm = TRUE, margins = c(6,6), main = "Covariance")
```

Now let's make this dataset a little more complex. We can add a factor variable, some variables that discriminate between the levels of that factor, and add some missing values.

```{r}
set.seed(501)
df2 <-
  df1 %>% 
  sim_cat(n_groups = 3, name = "factor") %>% 
  group_by(factor) %>% 
  sim_discr(n_vars = 5, var = 1, cov = 0, group_means = c(-1.3, 0, 1.3), name = "discr") %>% 
  sim_discr(n_vars = 5, var = 1, cov = 0, group_means = c(0, 0.5, 1), name = "discr2") %>% 
  sim_missing(prop = 0.1) %>% 
  ungroup()
df2
```


### PCA

```{r}
pca <- opls(select(df2, -factor), fig.pdfC = "none", info.txtC = "none")
  
plot(pca, parAsColFcVn = df2$factor, typeVc = "x-score")

getLoadingMN(pca) %>%
  as_tibble(rownames = "variable") %>% 
  arrange(desc(abs(p1)))
```

It looks like PCA mostly picks up on the variables with high covariance, **not** the variables that discriminate among levels of `factor`.  This makes sense, as PCA is an unsupervised analysis.

### PLS-DA

```{r}
plsda <- opls(select(df2, -factor), df2$factor, predI = 2, permI = 10, fig.pdfC = "none", info.txtC = "none")

plot(plsda, typeVc = "x-score")

getVipVn(plsda) %>% 
  tibble::enframe(name = "variable", value = "VIP") %>% 
  arrange(desc(VIP))
```

PLS-DA, a supervised analysis, finds discrimination among groups and finds that the discriminating variables we generated are most responsible for those differences.

Owner

Name: Eric R. Scott
Login: Aariq
Kind: user
Company: University of Arizona, @cct-datascience

Website: www.ericrscott.com
Twitter: leafyericscott
Repositories: 125
Profile: https://github.com/Aariq

Scientific Programmer & Educator at University of Arizona

GitHub Events

Total

Watch event: 1

Last Year

Watch event: 1

Committers

Last synced: about 2 years ago

All Time

Total Commits: 93
Total Committers: 1
Avg Commits per committer: 93.0
Development Distribution Score (DDS): 0.0

Past Year

Commits: 14
Committers: 1
Avg Commits per committer: 14.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
Aariq	s**r@g**m	93

Issues and Pull Requests

Last synced: over 1 year ago

All Time

Total issues: 29
Total pull requests: 6
Average time to close issues: 6 days
Average time to close pull requests: about 5 hours
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.55
Average comments per pull request: 0.17
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

Aariq (16)
avalonceleste (1)

Pull Request Authors

Aariq (4)

Top Labels

Issue Labels

enhancement (8) bug (5) needs testing (2)

Pull Request Labels

Packages

Total packages: 1
Total downloads:
- cran 157 last-month

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 3
Total maintainers: 1

cran.r-project.org: holodeck

A Tidy Interface for Simulating Multivariate Data

Homepage: https://github.com/Aariq/holodeck
Documentation: http://cran.r-project.org/web/packages/holodeck/holodeck.pdf
License: MIT + file LICENSE
Latest release: 0.2.2
published almost 3 years ago

Versions: 3
Dependent Packages: 0
Dependent Repositories: 0
Downloads: 157 Last month

Rankings

Stargazers count: 16.3%

Forks count: 28.8%

Dependent packages count: 29.8%

Dependent repos count: 35.5%

Average: 36.2%

Downloads: 70.7%

Maintainers (1)

scottericr@gmail.com

Last synced: 11 months ago

Dependencies

DESCRIPTION cran

MASS * imports
assertthat * imports
dplyr * imports
purrr * imports
rlang * imports
tibble * imports
covr * suggests
ggplot2 * suggests
knitr * suggests
mice * suggests
rmarkdown * suggests
testthat * suggests

.github/workflows/R-CMD-check.yaml actions

actions/checkout v3 composite
r-lib/actions/check-r-package v2 composite
r-lib/actions/setup-pandoc v2 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

.github/workflows/test-coverage.yaml actions

actions/checkout v3 composite
actions/upload-artifact v3 composite
r-lib/actions/setup-r v2 composite
r-lib/actions/setup-r-dependencies v2 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

holodeck

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

cran.r-project.org: holodeck

Rankings

Maintainers (1)

Dependencies