irlba

Fast truncated singular value decompositions

https://github.com/bwlewis/irlba

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

pca principal-component-analysis singular-value-decomposition sparse-principal-components svd
Last synced: 6 months ago · JSON representation

Repository

Fast truncated singular value decompositions

Basic Info
  • Host: GitHub
  • Owner: bwlewis
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 2.63 MB
Statistics
  • Stars: 131
  • Watchers: 9
  • Forks: 17
  • Open Issues: 37
  • Releases: 0
Topics
pca principal-component-analysis singular-value-decomposition sparse-principal-components svd
Created over 10 years ago · Last pushed almost 2 years ago
Metadata Files
Readme

README.md

irlba

Implicitly-restarted Lanczos methods for fast truncated singular value decomposition of sparse and dense matrices (also referred to as partial SVD). IRLBA stands for Augmented, Implicitly Restarted Lanczos Bidiagonalization Algorithm. The package provides the following functions (see help on each for details and examples).

  • irlba() partial SVD function
  • ssvd() l1-penalized matrix decompoisition for sparse PCA (based on Shen and Huang's algorithm)
  • prcomp_irlba() principal components function similar to the prcomp function in stats package for computing the first few principal components of large matrices
  • svdr() alternate partial SVD function based on randomized SVD (see also the rsvd package by N. Benjamin Erichson for an alternative implementation)
  • partial_eigen() a very limited partial eigenvalue decomposition for symmetric matrices (see the RSpectra package for more comprehensive truncated eigenvalue decomposition)

Help documentation for each function includes extensive documentation and examples. Also see the package vignette, vignette("irlba", package="irlba").

An overview web page is here: https://bwlewis.github.io/irlba/.

New in 2.3.2

  • Fixed a regression in prcomp_irlba() discovered by Xiaojie Qiu, see https://github.com/bwlewis/irlba/issues/25, and other related problems reported in https://github.com/bwlewis/irlba/issues/32.
  • Added rchk testing to pre-CRAN submission tests.
  • Fixed a sign bug in ssvd() found by Alex Poliakov.

What's new in Version 2.3.1?

  • Fixed an irlba() bug associated with centering (PCA), see https://github.com/bwlewis/irlba/issues/21.
  • Fixed irlba() scaling to conform to scale, see https://github.com/bwlewis/irlba/issues/22.
  • Improved prcomp_irlba() from a suggestion by N. Benjamin Erichson, see https://github.com/bwlewis/irlba/issues/23.
  • Significanty changed/improved svdr() convergence criterion.
  • Added a version of Shen and Huang's Sparse PCA/SVD L1-penalized matrix decomposition (ssvd()).
  • Fixed valgrind errors.

Deprecated features

I will remove partial_eigen() in a future version. As its documentation states, users are better off using the RSpectra package for eigenvalue computations (although not generally for singular value computations).

The mult argument is deprecated and will be removed in a future version. We now recommend simply defining a custom class with a custom multiplcation operator. The example below illustrates the old and new approaches.

```{r} library(irlba) set.seed(1) A <- matrix(rnorm(100), 10)

------------------ old way ----------------------------------------------

A custom matrix multiplication function that scales the columns of A

(cf the scale option). This function scales the columns of A to unit norm.

colscale <- sqrt(apply(A, 2, crossprod)) mult <- function(x, y) { # check if x is a vector if (is.vector(x)) { return((x %*% y) / colscale) } # else x is the matrix x %*% (y / col_scale) } irlba(A, 3, mult=mult)$d

[1] 1.820227 1.622988 1.067185

Compare with:

irlba(A, 3, scale=col_scale)$d

[1] 1.820227 1.622988 1.067185

Compare with:

svd(sweep(A, 2, col_scale, FUN=/))$d[1:3]

[1] 1.820227 1.622988 1.067185

------------------ new way ----------------------------------------------

setClass("scaledmatrix", contains="matrix", slots=c(scale="numeric")) setMethod("%*%", signature(x="scaledmatrix", y="numeric"), function(x ,y) x@.Data %% (y / x@scale)) setMethod("%%", signature(x="numeric", y="scaledmatrix"), function(x ,y) (x %*% y@.Data) / y@scale) a <- new("scaledmatrix", A, scale=col_scale)

irlba(a, 3)$d

[1] 1.820227 1.622988 1.067185

```

We have learned that using R's existing S4 system is simpler, easier, and more flexible than using custom arguments with idiosyncratic syntax and behavior. We've even used the new approach to implement distributed parallel matrix products for very large problems with amazingly little code.

Wishlist / help wanted...

  • More Matrix classes supported in the fast code path
  • Help improving the solver for singular values in tricky cases (basically, for ill-conditioned problems and especially for the smallest singular values); in general this may require a combination of more careful convergence criteria and use of harmonic Ritz values; Dmitriy Selivanov has proposed alternative convergence criteria in https://github.com/bwlewis/irlba/issues/29 for example.

References

  • Baglama, James, and Lothar Reichel. "Augmented implicitly restarted Lanczos bidiagonalization methods." SIAM Journal on Scientific Computing 27.1 (2005): 19-42.
  • Halko, Nathan, Per-Gunnar Martinsson, and Joel A. Tropp. "Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions." (2009).
  • Shen, Haipeng, and Jianhua Z. Huang. "Sparse principal component analysis via regularized low rank matrix approximation." Journal of multivariate analysis 99.6 (2008): 1015-1034.
  • Witten, Daniela M., Robert Tibshirani, and Trevor Hastie. "A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis." Biostatistics 10.3 (2009): 515-534.

Owner

  • Name: B. W. Lewis
  • Login: bwlewis
  • Kind: user
  • Location: Appalachia
  • Company: Foraging

Forager, kayaker, mathematician

GitHub Events

Total
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 2
Last Year
  • Issues event: 2
  • Watch event: 2
  • Issue comment event: 2

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 297
  • Total Committers: 4
  • Avg Commits per committer: 74.25
  • Development Distribution Score (DDS): 0.02
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
bwlewis b****s@i****t 291
Aaron Lun a****n@c****k 4
Zachary Kurtz z****z@l****m 1
Will Townes w****s@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 67
  • Total pull requests: 8
  • Average time to close issues: 2 months
  • Average time to close pull requests: 7 months
  • Total issue authors: 50
  • Total pull request authors: 5
  • Average comments per issue: 4.88
  • Average comments per pull request: 7.5
  • Merged pull requests: 5
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 2
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • bwlewis (9)
  • LTLA (5)
  • jsams (2)
  • eliocamp (2)
  • simon-anders (2)
  • tkcaccia (2)
  • jcarlen (1)
  • JosiahParry (1)
  • privefl (1)
  • GabrielHoffman (1)
  • vspinu (1)
  • erichson (1)
  • sanjeevRJMU1 (1)
  • dselivanov (1)
  • bapike (1)
Pull Request Authors
  • LTLA (4)
  • zdk123 (1)
  • willtownes (1)
  • Lei-D (1)
  • jan-glx (1)
Top Labels
Issue Labels
bug (6) enhancement (5) question (3) help wanted (1)
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 51,624 last-month
  • Total docker downloads: 2,444,462
  • Total dependent packages: 89
    (may contain duplicates)
  • Total dependent repositories: 228
    (may contain duplicates)
  • Total versions: 20
  • Total maintainers: 1
cran.r-project.org: irlba

Fast Truncated Singular Value Decomposition and Principal Components Analysis for Large Dense and Sparse Matrices

  • Versions: 14
  • Dependent Packages: 73
  • Dependent Repositories: 223
  • Downloads: 51,624 Last month
  • Docker Downloads: 2,444,462
Rankings
Dependent repos count: 1.2%
Dependent packages count: 1.2%
Downloads: 2.2%
Stargazers count: 3.5%
Forks count: 4.6%
Average: 5.1%
Docker downloads count: 17.7%
Maintainers (1)
Last synced: 6 months ago
conda-forge.org: r-irlba
  • Versions: 6
  • Dependent Packages: 16
  • Dependent Repositories: 5
Rankings
Dependent packages count: 4.0%
Average: 9.3%
Dependent repos count: 14.7%
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • Matrix * depends
  • R >= 3.6.2 depends
  • methods * imports
  • stats * imports