pcaone
Window based randomized singular value decomposition (winSVD)
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: acm.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.3%) to scientific vocabulary
Keywords
matrix-factorization
pca
r
rcppeigen
rsvd
svd
Last synced: 5 months ago
·
JSON representation
Repository
Window based randomized singular value decomposition (winSVD)
Basic Info
- Host: GitHub
- Owner: Zilong-Li
- License: gpl-3.0
- Language: R
- Default Branch: main
- Homepage: https://github.com/Zilong-Li/PCAone
- Size: 9.33 MB
Statistics
- Stars: 5
- Watchers: 2
- Forks: 1
- Open Issues: 0
- Releases: 0
Topics
matrix-factorization
pca
r
rcppeigen
rsvd
svd
Created almost 4 years ago
· Last pushed 9 months ago
Metadata Files
Readme
Changelog
Contributing
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# Accurate and pass-efficient randomized SVD in R
## Introduction
This repo initially implements the algorithm so-called window-based Randomized SVD in the [PCAone](https://github.com/Zilong-Li/PCAone) paper. **Now the aim of the package is to implement state-of-the-art Randomized SVD algorithms other than the basic [rsvd](https://github.com/erichson/rSVD) for R community.**
Currently there are 2 versions of RSVD implemented in this package ordered by their accuracy in below.
- **winSVD**: [window based randomized singular value decomposition](https://genome.cshlp.org/content/33/9/1599)
- **dashSVD**: [dynamic shifts based randomized singular value decomposition](https://dl.acm.org/doi/10.1145/3660629)
With surports for a number of matrix type including:
- `matrix` in base R for general dense matrices. or other types can be casted e.g. `dgeMatrix`
- `dgCMatrix` in **Matrix** package, for column major sparse matrices
- `dgRMatrix` in **Matrix** package, for row major sparse matrices
## Installation
``` r
# install.packages("pcaone") # For the CRAN version
remotes::install_github("Zilong-Li/PCAoneR") # For the latest developing version
```
## Example
This is a basic example which shows you how to use pcaone:
```{r example}
library(pcaone)
mat <- matrix(rnorm(100*5000), 5000, 100)
res <- pcaone(mat, k = 10)
str(res)
```
## Benchmarking of accuracy
We define the accuracy as the error of singular values using results of `RSpectra::svds` as truth. For all RSVD, let's restrict the number of epochs as `8`, i.e. how many times the whole matrix is read through if it can only be hold on disk.
```{r acc}
library(RSpectra) ## svds
library(rsvd) ## regular rsvd
library(pcaone)
load(system.file("extdata", "popgen.rda", package="pcaone") )
A <- popgen - rowMeans(popgen) ## center
k <- 40
system.time(s0 <- RSpectra::svds(A, k = k) )
system.time(s1 <- rsvd::rsvd(A, k = k, q = 4)) ## the number of epochs is two times of power iters, 4*2=8
system.time(s3 <- pcaone(A, k = k, method = "winsvd", p = 7)) ## the number of epochs is 1 + p
system.time(s4 <- pcaone(A, k = k, method = "dashsvd", p = 6))## the number of epochs is 2 + p
par(mar = c(5, 5, 2, 1))
plot(s0$d-s1$d, ylim = c(0, 10), xlab = "PC index", ylab = "Error of singular values", cex = 1.5, cex.lab = 2)
points(s0$d-s3$d, col = "red", cex = 1.5)
points(s0$d-s4$d, col = "blue", cex = 1.5)
legend("top", legend = c("rSVD", "dashSVD", "winSVD"), pch = 16,col = c("black", "blue", "red"), horiz = T, cex = 1.2, bty = "n" )
```
Now let's see how many epochs we need for `rSVD`, `sSVD` and `dashSVD` to reach the accuracy of `winSVD`.
```{r acc2}
system.time(s1 <- rsvd::rsvd(A, k = k, q = 20)) ## the number of epochs is 4*20=40
system.time(s4 <- pcaone(A, k = k, method = "dashsvd", p = 18))
par(mar = c(5, 5, 2, 1))
plot(s0$d-s1$d, ylim = c(0, 2), xlab = "PC index", ylab = "Error of singular values", cex = 1.5, cex.lab = 2)
points(s0$d-s3$d, col = "red", cex = 1.5)
points(s0$d-s4$d, col = "blue", cex = 1.5)
legend("top", legend = c("rSVD", "dashSVD", "winSVD"), pch = 16,col = c("black", "blue", "red"), horiz = T, cex = 1.2, bty = "n" )
```
## Benchmarking of speed
Let's see the performance of ```pcaone``` compared to the other packages.
``` {r time}
library(microbenchmark)
timing <- microbenchmark(
'RSpectra' = svds(A,k = k),
'rSVD' = rsvd(A, k=k, q = 20),
'pcaone.winsvd' = pcaone(A, k=k, p = 7),
'pcaone.dashsvd' = pcaone(A, k=k, p = 18, method = "dashsvd"),
times=10)
print(timing, unit='s')
```
## References
* [Zilong Li, Jonas Meisner, Anders Albrechtsen (2023). Fast and accurate out-of-core PCA framework for large scale biobank data](https://genome.cshlp.org/content/33/9/1599)
* [Feng et al. 2024. Algorithm 1043: Faster Randomized SVD with Dynamic Shifts](https://dl.acm.org/doi/10.1145/3660629)
Owner
- Login: Zilong-Li
- Kind: user
- Location: Copenhagen
- Company: Copenhagen University
- Twitter: zilongli21
- Repositories: 7
- Profile: https://github.com/Zilong-Li
GitHub Events
Total
- Issues event: 2
- Watch event: 1
- Issue comment event: 1
- Push event: 61
- Create event: 1
Last Year
- Issues event: 2
- Watch event: 1
- Issue comment event: 1
- Push event: 61
- Create event: 1
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: 1 day
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 2.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- jianshu93 (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 180 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 2
- Total maintainers: 1
cran.r-project.org: pcaone
Fast and Accurate Randomized Singular Value Decomposition Algorithms with 'PCAone'
- Homepage: https://github.com/Zilong-Li/PCAoneR
- Documentation: http://cran.r-project.org/web/packages/pcaone/pcaone.pdf
- License: GPL (≥ 3)
-
Latest release: 1.1.0
published over 1 year ago
Rankings
Forks count: 21.9%
Stargazers count: 26.2%
Dependent packages count: 29.8%
Dependent repos count: 35.5%
Average: 40.1%
Downloads: 87.0%
Maintainers (1)
Last synced:
6 months ago
Dependencies
DESCRIPTION
cran
- R >= 3.6.0 depends
- Rcpp * imports
- testthat >= 3.0.0 suggests
.github/workflows/check-release.yaml
actions
- actions/checkout v3 composite
- r-lib/actions/check-r-package v2 composite
- r-lib/actions/setup-r v2 composite
- r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/rhub.yaml
actions
- r-hub/actions/checkout v1 composite
- r-hub/actions/platform-info v1 composite
- r-hub/actions/run-check v1 composite
- r-hub/actions/setup v1 composite
- r-hub/actions/setup-deps v1 composite
- r-hub/actions/setup-r v1 composite