pcaone

Window based randomized singular value decomposition (winSVD)

https://github.com/zilong-li/pcaoner

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: acm.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary

Keywords

matrix-factorization pca r rcppeigen rsvd svd
Last synced: 5 months ago · JSON representation

Repository

Window based randomized singular value decomposition (winSVD)

Basic Info
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
matrix-factorization pca r rcppeigen rsvd svd
Created almost 4 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog Contributing License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# Accurate and pass-efficient randomized SVD in R

## Introduction

This repo initially implements the algorithm so-called window-based Randomized SVD in the [PCAone](https://github.com/Zilong-Li/PCAone) paper. **Now the aim of the package is to implement state-of-the-art Randomized SVD algorithms other than the basic [rsvd](https://github.com/erichson/rSVD) for R community.**

Currently there are 2 versions of RSVD implemented in this package ordered by their accuracy in below.

- **winSVD**: [window based randomized singular value decomposition](https://genome.cshlp.org/content/33/9/1599)
- **dashSVD**: [dynamic shifts based randomized singular value decomposition](https://dl.acm.org/doi/10.1145/3660629)

With surports for a number of matrix type including:

- `matrix` in base R for general dense matrices. or other types can be casted e.g. `dgeMatrix`
- `dgCMatrix` in **Matrix** package, for column major sparse matrices
- `dgRMatrix` in **Matrix** package, for row major sparse matrices





## Installation

``` r
# install.packages("pcaone") # For the CRAN version
remotes::install_github("Zilong-Li/PCAoneR") # For the latest developing version
```

## Example

This is a basic example which shows you how to use pcaone:

```{r example}
library(pcaone)
mat <- matrix(rnorm(100*5000), 5000, 100)
res <- pcaone(mat, k = 10)
str(res)
```

## Benchmarking of accuracy

We define the accuracy as the error of singular values using results of `RSpectra::svds` as truth. For all RSVD, let's restrict the number of epochs as `8`, i.e. how many times the whole matrix is read through if it can only be hold on disk. 

```{r acc}
library(RSpectra) ## svds
library(rsvd)     ## regular rsvd
library(pcaone)
load(system.file("extdata", "popgen.rda", package="pcaone") )
A <- popgen - rowMeans(popgen) ## center
k <- 40
system.time(s0 <- RSpectra::svds(A, k = k) )
system.time(s1 <- rsvd::rsvd(A, k = k, q = 4))  ## the number of epochs is two times of power iters, 4*2=8
system.time(s3 <- pcaone(A, k = k, method = "winsvd", p = 7)) ## the number of epochs is 1 + p
system.time(s4 <- pcaone(A, k = k, method = "dashsvd", p = 6))## the number of epochs is 2 + p

par(mar = c(5, 5, 2, 1))
plot(s0$d-s1$d, ylim = c(0, 10), xlab = "PC index", ylab = "Error of singular values", cex = 1.5, cex.lab = 2)
points(s0$d-s3$d, col = "red", cex = 1.5)
points(s0$d-s4$d, col = "blue", cex = 1.5)
legend("top", legend = c("rSVD", "dashSVD", "winSVD"), pch = 16,col = c("black", "blue", "red"), horiz = T, cex = 1.2, bty = "n" )
```

Now let's see how many epochs we need for `rSVD`, `sSVD` and `dashSVD`  to reach the accuracy of `winSVD`.

```{r acc2}
system.time(s1 <- rsvd::rsvd(A, k = k, q = 20))  ## the number of epochs is 4*20=40
system.time(s4 <- pcaone(A, k = k, method = "dashsvd", p = 18))

par(mar = c(5, 5, 2, 1))
plot(s0$d-s1$d, ylim = c(0, 2), xlab = "PC index", ylab = "Error of singular values", cex = 1.5, cex.lab = 2)
points(s0$d-s3$d, col = "red", cex = 1.5)
points(s0$d-s4$d, col = "blue", cex = 1.5)
legend("top", legend = c("rSVD", "dashSVD", "winSVD"), pch = 16,col = c("black", "blue", "red"), horiz = T, cex = 1.2, bty = "n" )
```
## Benchmarking of speed

Let's see the performance of ```pcaone``` compared to the other packages. 

``` {r time}
library(microbenchmark)
timing <- microbenchmark(
  'RSpectra' = svds(A,k = k),
  'rSVD' = rsvd(A, k=k, q = 20),
  'pcaone.winsvd' = pcaone(A, k=k, p = 7),
  'pcaone.dashsvd' = pcaone(A, k=k, p = 18, method = "dashsvd"),
  times=10)
print(timing, unit='s')
```

## References

* [Zilong Li, Jonas Meisner, Anders Albrechtsen (2023). Fast and accurate out-of-core PCA framework for large scale biobank data](https://genome.cshlp.org/content/33/9/1599)
* [Feng et al. 2024. Algorithm 1043: Faster Randomized SVD with Dynamic Shifts](https://dl.acm.org/doi/10.1145/3660629)

Owner

  • Login: Zilong-Li
  • Kind: user
  • Location: Copenhagen
  • Company: Copenhagen University

GitHub Events

Total
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 61
  • Create event: 1
Last Year
  • Issues event: 2
  • Watch event: 1
  • Issue comment event: 1
  • Push event: 61
  • Create event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 89
  • Total Committers: 1
  • Avg Commits per committer: 89.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 55
  • Committers: 1
  • Avg Commits per committer: 55.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Zilong-Li z****4@g****m 89

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: 1 day
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 2.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jianshu93 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • cran 180 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
cran.r-project.org: pcaone

Fast and Accurate Randomized Singular Value Decomposition Algorithms with 'PCAone'

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 180 Last month
Rankings
Forks count: 21.9%
Stargazers count: 26.2%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 40.1%
Downloads: 87.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.6.0 depends
  • Rcpp * imports
  • testthat >= 3.0.0 suggests
.github/workflows/check-release.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/rhub.yaml actions
  • r-hub/actions/checkout v1 composite
  • r-hub/actions/platform-info v1 composite
  • r-hub/actions/run-check v1 composite
  • r-hub/actions/setup v1 composite
  • r-hub/actions/setup-deps v1 composite
  • r-hub/actions/setup-r v1 composite