decorrelate

Decorrelation projection scalable to high dimensional data using estimated correlation with low rank and shrinkage

https://github.com/gabrielhoffman/decorrelate

Science Score: 39.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Decorrelation projection scalable to high dimensional data using estimated correlation with low rank and shrinkage

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 5 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```








### Fast Probabilistic Whitening Transformation for Ultra-High Dimensional Data
Data whitening is a widely used preprocessing step to remove correlation structure since statistical models often assume independence [(Kessy, et al. 2018)](https://doi.org/10.1080/00031305.2016.1277159). The typical procedures transforms the observed data by an inverse square root of the sample correlation matrix (**Figure 1**). For low dimension data (i.e. $n > p$), this transformation produces transformed data with an identity sample covariance matrix. This procedure assumes either that the true covariance matrix is know, or is well estimated by the sample covariance matrix. Yet the use of the sample covariance matrix for this transformation can be problematic since **1)** the complexity is $\mathcal{O}(p^3)$ and **2)** it is not applicable to the high dimensional (i.e. $n \ll p$) case since the sample covariance matrix is no longer full rank. Here we use a probabilistic model of the observed data to apply a whitening transformation. This Gaussian Inverse Wishart Empirical Bayes (GIW-EB) **1)** model substantially reduces computational complexity, and **2)** regularizes the eigen-values of the sample covariance matrix to improve out-of-sample performance.
![**Figure 1: Intuition for data whitening transformation**. **A)** Original data, **B)** Data rotated along principal components, **C)** Data rotated and scaled, **D)** Data rotated, scaled and rotated back to original axes. Green arrows indicate principal axes and lengths indicate eigen-values.](man/figures/README-run.examples-1.png) ## Installation ``` r devtools::install_github("GabrielHoffman/decorrelate") ```

Owner

  • Name: Gabriel Hoffman
  • Login: GabrielHoffman
  • Kind: user
  • Location: New York
  • Company: Icahn School of Medicine at Mount Sinai

Statistical genomics

GitHub Events

Total
  • Push event: 4
  • Public event: 1
Last Year
  • Push event: 4
  • Public event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • cran 191 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: decorrelate

Decorrelation Projection Scalable to High Dimensional Data

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 191 Last month
Rankings
Dependent packages count: 25.9%
Dependent repos count: 31.8%
Average: 47.8%
Downloads: 85.6%
Maintainers (1)
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.2.0 depends
  • methods * depends
  • CholWishart * imports
  • Matrix * imports
  • Rcpp * imports
  • Rfast * imports
  • graphics * imports
  • irlba * imports
  • stats * imports
  • utils * imports
  • CCA * suggests
  • RUnit * suggests
  • clusterGeneration * suggests
  • colorRamps * suggests
  • cowplot * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • latex2exp * suggests
  • mvtnorm * suggests
  • pander * suggests
  • rmarkdown * suggests
  • whitening * suggests
  • yacca * suggests