OnlinePCA.jl: A Julia Package for Out-of-core and Sparse Principal Component Analysis

OnlinePCA.jl: A Julia Package for Out-of-core and Sparse Principal Component Analysis - Published in JOSS (2026)

https://github.com/rikenbit/onlinepca.jl

Science Score: 87.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
○
codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README and JOSS metadata
✓
Academic publication links
Links to: arxiv.org, scholar.google, ncbi.nlm.nih.gov, sciencedirect.com, zenodo.org
○
Academic email domains
○
Institutional organization owner
✓
JOSS paper metadata
Published in Journal of Open Source Software

Keywords

bioinformatics dimensionality-reduction julia out-of-core-processing pca sparse-matrix

Last synced: 22 days ago · JSON representation

Repository

Online Principal Component Analysis

Basic Info

Host: GitHub
Owner: rikenbit
License: other
Language: Julia
Default Branch: master
Homepage: https://rikenbit.github.io/OnlinePCA.jl/
Size: 9.14 MB

Statistics

Stars: 24
Watchers: 8
Forks: 3
Open Issues: 0
Releases: 21

Topics

bioinformatics dimensionality-reduction julia out-of-core-processing pca sparse-matrix

Created almost 8 years ago · Last pushed 3 months ago

Metadata Files

Readme Contributing License Code of conduct

OnlinePCA.jl

Online Principal Component Analysis

📚 Documentation

Description

OnlinePCA.jl binarizes CSV file, summarizes the information of data matrix and, performs some online-PCA functions for extreamely large scale matrix.

Algorithms

Gradient-based
- GD-PCA
- SGD-PCA
- Oja's method : Erkki Oja et. al., 1985, Erkki Oja, 1992
- CCIPCA : Juyang Weng et. al., 2003
- RSGD-PCA : Silvere Bonnabel, 2013
- SVRG-PCA : Ohad Shamir, 2015
- RSVRG-PCA : Hongyi Zhang, et. al., 2016, Hiroyuki Sato, et. al., 2017
Krylov subspace-based
- Orthgonal Iteration (A power method to calculate multiple eigenvectors at once) : Zhaofun Bai, 1987
- Arnoldi method : Zhaofun Bai, 1987
- Lanczos method : Zhaofun Bai, 1987
Random projection-based
- Halko's method : Halko, N., et. al., 2011, Halko, N. et. al., 2011
- Algorithm971 : George C. Linderman, et. al., 2017, Huamin, Li, et. al., 2017, Vladimir Rokhlin, et. al., 2009
- Randomized Block Krylov Iteration : W, Yu, et. al., 2017
- Single-pass PCA : C Musco, et. al., 2015

Learning Parameter Scheduling

Robbins-Monro : Herbert Robbins, et. al., 1951
Momentum : Ning Qian, 1999
Nesterov's Accelerated Gradient Descent（NAG） : Nesterov, 1983
ADAGRAD : John Duchi, et. al., 2011

Installation

Requirements

Julia 1.0 or later

Installation Methods

Method 1: Using Pkg.add() julia julia> Pkg.add(url="https://github.com/rikenbit/OnlinePCA.jl.git")

Method 2: Using Pkg REPL mode ```julia

push the key "]" and type the following command.

(v1.7) pkg> add https://github.com/rikenbit/OnlinePCA.jl

Press Backspace or Ctrl+C to return to Julia REPL

```

Optional Dependencies

For interactive visualization of PCA results: julia Pkg.add("PlotlyJS")

Basic API usage

Preprocess of CSV

```julia using OnlinePCA using OnlinePCA: write_csv using Distributions using DelimitedFiles using SparseArrays using MatrixMarket

CSV

tmp = mktempdir() data = Int64.(ceil.(rand(NegativeBinomial(1, 0.5), 300, 99))) data[1:50, 1:33] .= 100data[1:50, 1:33] data[51:100, 34:66] .= 100data[51:100, 34:66] data[101:150, 67:99] .= 100*data[101:150, 67:99] write_csv(joinpath(tmp, "Data.csv"), data)

Binarization (Zstandard)

csv2bin(csvfile=joinpath(tmp, "Data.csv"), binfile=joinpath(tmp, "Data.zst"))

Matrix Market (MM)

mmwrite(joinpath(tmp, "Data.mtx"), sparse(data))

Summary of data for CSV/Dense Matrix

densepath = mktempdir() sumr(binfile=joinpath(tmp, "Data.zst"), outdir=densepath) ```

Setting for plot

```julia using DataFrames using PlotlyJS

function subplots(respca, group) # data frame dataleft = DataFrame(pc1=respca[:,1], pc2=respca[:,2], group=group) dataright = DataFrame(pc2=respca[:,2], pc3=respca[:,3], group=group) # plot pleft = Plot(dataleft, x=:pc1, y=:pc2, mode="markers", markersize=10, group=:group) pright = Plot(dataright, x=:pc2, y=:pc3, mode="markers", markersize=10, group=:group, showlegend=false) pleft.data[1]["marker_color"] = "red" pleft.data[2]["markercolor"] = "blue" pleft.data[3]["markercolor"] = "green" pright.data[1]["markercolor"] = "red" pright.data[2]["markercolor"] = "blue" pright.data[3]["markercolor"] = "green" pleft.data[1]["name"] = "group1" pleft.data[2]["name"] = "group2" pleft.data[3]["name"] = "group3" pleft.layout["title"] = "PC1 vs PC2" pright.layout["title"] = "PC2 vs PC3" pleft.layout["xaxistitle"] = "pc1" pleft.layout["yaxistitle"] = "pc2" pright.layout["xaxistitle"] = "pc2" pright.layout["yaxistitle"] = "pc3" plot([pleft pright]) end

group=vcat(repeat(["group1"],inner=33), repeat(["group2"],inner=33), repeat(["group3"],inner=33)) ```

GD-PCA

```julia outgd1 = gd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outgd2 = gd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outgd3 = gd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outgd4 = gd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-0, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outgd1[1], group) # Top, Left subplots(outgd2[1], group) # Top, Right subplots(outgd3[1], group) # Bottom, Left subplots(outgd4[1], group) # Bottom, Right ``` GD-PCA

SGD-PCA

```julia outsgd1 = sgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E-3, numbatch=100, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsgd2 = sgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-3, numbatch=100, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsgd3 = sgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-3, numbatch=100, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsgd4 = sgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-0, numbatch=100, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outsgd1[1], group) # Top, Left subplots(outsgd2[1], group) # Top, Right subplots(outsgd3[1], group) # Bottom, Left subplots(outsgd4[1], group) # Bottom, Right ``` SGD-PCA

Oja's method

```julia outoja1 = oja(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E+0, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outoja2 = oja(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outoja3 = oja(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outoja4 = oja(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-1, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outoja1[1], group) # Top, Left subplots(outoja2[1], group) # Top, Right subplots(outoja3[1], group) # Bottom, Left subplots(outoja4[1], group) # Bottom, Right ``` Oja

CCIPCA

```julia outccipca1 = ccipca(input=joinpath(tmp, "Data.zst"), dim=3, stepsize=1E-0, numepoch=10, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_ccipca1[1], group) ``` CCIPCA

RSGD-PCA

```julia outrsgd1 = rsgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E+2, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsgd2 = rsgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsgd3 = rsgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-3, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsgd4 = rsgd(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-1, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outrsgd1[1], group) # Top, Left subplots(outrsgd2[1], group) # Top, Right subplots(outrsgd3[1], group) # Bottom, Left subplots(outrsgd4[1], group) # Bottom, Right ``` RSGD-PCA

SVRG-PCA

```julia outsvrg1 = svrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E-5, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsvrg2 = svrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-5, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsvrg3 = svrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-5, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outsvrg4 = svrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-2, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outsvrg1[1], group) # Top, Left subplots(outsvrg2[1], group) # Top, Right subplots(outsvrg3[1], group) # Bottom, Left subplots(outsvrg4[1], group) # Bottom, Right ``` SVRG-PCA

RSVRG-PCA

```julia outrsvrg1 = rsvrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="robbins-monro", stepsize=1E-6, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsvrg2 = rsvrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="momentum", stepsize=1E-6, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsvrg3 = rsvrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="nag", stepsize=1E-6, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv")) outrsvrg4 = rsvrg(input=joinpath(tmp, "Data.zst"), dim=3, scheduling="adagrad", stepsize=1E-2, numepoch=10, rowmeanlist=joinpath(densepath, "FeatureLogMeans.csv"))

subplots(outrsvrg1[1], group) # Top, Left subplots(outrsvrg2[1], group) # Top, Right subplots(outrsvrg3[1], group) # Bottom, Left subplots(outrsvrg4[1], group) # Bottom, Right ``` RSVRG-PCA

Orthogonal Iteration (Power method)

```julia outorthiter = orthiter(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_orthiter[1], group) ``` Orthogonal Iteration

Arnoldi method

```julia outarnoldi = arnoldi(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_arnoldi[1], group) ``` Arnoldi method

Lanczos method

```julia outlanczos = lanczos(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_lanczos[1], group) ``` Orthogonal Iteration

Halko's method

```julia outhalko = halko(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_halko[1], group) ``` Halko's method

Algorithm 971

```julia outalgorithm971 = algorithm971(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_algorithm971[1], group) ```

Randomized Block Krylov Iteration

```julia outrbkiter = rbkiter(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_rbkiter[1], group) ``` rbkiter

Single-pass PCA type I

```julia outsinglepass = singlepass(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_singlepass[1], group) ```

Single-pass PCA type II

```julia outsinglepass2 = singlepass2(input=joinpath(tmp, "Data.zst"), dim=3, rowmeanlist=joinpath(densepath, "Feature_LogMeans.csv"))

subplots(out_singlepass2[1], group) ```

Summarization for 10X-HDF5

julia tenxsumr(tenxfile="Data.h5", group="mm10", chunksize=100)

Algorithm 971 for 10X-HDF5

julia out_tenxpca = tenxpca(tenxfile="Data.h5", scale="sqrt", rowmeanlist="Feature_SqrtMeans.csv", dim=3, chunksize=100, group="mm10")

Summary of data for MM/Sparse Matrix

```julia

Sparsification + Binarization (Zstandard + MM format)

mm2bin(mmfile=joinpath(tmp, "Data.mtx"), binfile=joinpath(tmp, "Data.mtx.zst"))

sparsepath = mktempdir() sumr(binfile=joinpath(tmp, "Data.mtx.zst"), outdir=sparsepath, mode="sparse_mm") ```

Sparse Randomized SVD for MM format

```julia outsparsersvd = sparsersvd( input=joinpath(tmp, "Data.mtx.zst"), scale="ftt", rowmeanlist=joinpath(sparsepath, "Feature_FTTMeans.csv"), dim=3, chunksize=100)

subplots(outsparsersvd[1], group) ``` out_sparse_rsvd

Exact Out-of-Core PCA

Unlike other PCAs, this function assumes matrix data with data x dimensions. It is also computationally efficient when the data is vertical with number of data >> number of dimensions. In the following, data assuming this assumption are first prepared. The function can also be used without performing a sumr to extract row and column statistics in advance.

```julia

CSV

tmp2 = mktempdir() data2 = Int64.(ceil.(rand(NegativeBinomial(1, 0.5), 99, 30))) data2[1:33, 1:10] .= 100data2[1:33, 1:10] data2[34:66, 11:20] .= 100data2[34:66, 11:20] data2[67:99, 21:30] .= 100*data2[67:99, 21:30] write_csv(joinpath(tmp2, "Data2.csv"), data2)

Binarization (Zstandard)

csv2bin(csvfile=joinpath(tmp2, "Data2.csv"), binfile=joinpath(tmp2, "Data2.zst"))

Matrix Market (MM)

mmwrite(joinpath(tmp2, "Data2.mtx"), sparse(data2))

Binary COO (BinCOO)

data3 = Int64.(ceil.(rand(Binomial(1, 0.2), 99, 33))) data3[1:33, 1:11] .= 1 data3[34:66, 12:22] .= 1 data3[67:99, 23:33] .= 1

bincoofile = joinpath(tmp2, "Data3.bincoo") open(bincoofile, "w") do io for i in 1:size(data3, 1) for j in 1:size(data3, 2) if data3[i, j] != 0 println(io, "$i $j") end end end end

Binarization (CSV + Zstandard)

csv2bin(csvfile=joinpath(tmp2, "Data2.csv"), binfile=joinpath(tmp2, "Data2.zst"))

Binarization (MM + Zstandard)

mm2bin(mmfile=joinpath(tmp2, "Data2.mtx"), binfile=joinpath(tmp2, "Data2.mtx.zst"))

Binarziation (BinCOO + Zstandard)

bincoo2bin(bincoofile=bincoofile, binfile=joinpath(tmp2, "Data3.bincoo.zst")) ```

```julia

Dense-mode

outexactoocpcadense = exactoocpca( input=joinpath(tmp2, "Data2.zst"), scale="raw", dim=3, chunksize=10)

subplots(outexactoocpcadense[3], group) ``` exact_ooc_pca_dense

```julia

Sparse-mode (MM)

outexactoocpcasparsemm = exactoocpca( input=joinpath(tmp2, "Data2.mtx.zst"), scale="raw", dim=3, chunksize=10, mode="sparsemm")

subplots(outexactoocpcasparse_mm[3], group) ``` exact_ooc_pca_sparse_mm

```julia

Sparse-mode (BinCOO)

outexactoocpcasparsebincoo = exactoocpca( input=joinpath(tmp2, "Data3.bincoo.zst"), scale="raw", dim=3, chunksize=10, mode="sparsebincoo")

subplots(outexactoocpcasparse_bincoo[3], group) ``` exact_ooc_pca_sparse_bincoo

Command line usage

All the CSV preprocess functions and PCA functions also can be performed as command line tools with same parameter names like below.

```bash

CSV → Julia Binary (e.g, csv2bin, mm2bin)

julia YOURHOMEDIR/.julia/v0.x/OnlinePCA/bin/csv2bin \ --csvfile Data.csv --binfile Data.zst

Summary statistics extracted from Julia Binary (e.g., sumr, tenxsumr)

julia YOURHOMEDIR/.julia/v0.x/OnlinePCA/bin/sumr \ --binfile Data.zst

Perform PCA

julia YOURHOMEDIR/.julia/v0.x/OnlinePCA/bin/gd \ --input Data.zst --dim 3 --scheduling robbins-monro --stepsize 10 \ --numepoch 10 --rowmeanlist Feature_LogMeans.csv ```

Distributed Computing with Multiple Stepsize Setting

The online PCA algorithms are performed until the reconstruction error is converged. In the default stopping criteria, the calculation is stopped when the relative change is bellow 1E-3 or above 0.03. These values can be changed by lower and upper options, respectively.

The convergence is depend on the step size parameter and default value is set as 1000. This value is tuned for single-cell RNA-Seq dataset, but the appropriate level may change according to the size and dynamic range of data matrix.

Combined with Grid Engine, this step is easily paralled, because each calculation of different step size are independently performed. For example, we firstly make the following template file (e.g., oja_template) containing the online PCA script,

```bash

!/bin/bash

julia YOURHOMEDIR/.julia/v0.x/OnlinePCA/bin/oja \ --scale log \ --input Data.zst \ --outdir XXXXX \ --rowmeanlist Feature_LogMeans.csv \ --dim 10 \ --stepsize YYYYY \ --logdir XXXXX/log ```

and then rewrite the template to set different step size by sed command and submit each job by qsub command.

```bash

!/bin/bash

Steps=(1 10 100 1000 10000 100000 1000000) for i in ${Step[@]}; do OUT="Step"$i mkdir -p $OUT sed -e "s|XXXXX|$OUT|g" ojatemplate > TMPojascData.sh sed -e "s|YYYYY|$i|g" TMPojascData.sh > ojascData.sh chmod +x ojascData.sh qsub ojascData.sh done ```

Even if there are no distributed computational environment, background process is applicable (just adding & in the end of command).

```bash

!/bin/bash

Steps=(1 10 100 1000 10000 100000 1000000) for i in ${Steps[@]}; do mkdir -p "Step"$i julia YOURHOMEDIR/.julia/v0.x/OnlinePCA/bin/oja \ --scale log \ --input Data.zst \ --outdir "Step"$i \ --rowmeanlist Feature_LogMeans.csv \ --dim 10 \ --stepsize $i \ --logdir "Step"$i/log & done

ps | grep julia ```

Contributing

If you have suggestions for how OnlinePCA.jl could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

For more, check out the Contributing Guide.

Author

Koki Tsuyuzaki

Owner

Name: RIKEN BiT
Login: rikenbit
Kind: organization
Location: Japan

Website: https://bit.riken.jp/
Twitter: dritoshien
Repositories: 80
Profile: https://github.com/rikenbit

Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research

JOSS Publication

OnlinePCA.jl: A Julia Package for Out-of-core and Sparse Principal Component Analysis

Published

January 29, 2026

DOI

10.21105/joss.09343

Volume 11, Issue 117, Page 9343

Authors

Koki Tsuyuzaki

Department of Artificial Intelligence Medicine, Graduate School of Medicine, Chiba University, Japan, Laboratory for Bioinformatics Research, RIKEN Center for Biosystems Dynamics Research, Japan

Editor

Chris Vernon

GitHub Events

Total

Release event: 13
Delete event: 4
Pull request event: 32
Issues event: 6
Watch event: 2
Issue comment event: 8
Push event: 59
Create event: 30

Last Year

Release event: 13
Delete event: 4
Pull request event: 32
Issues event: 6
Watch event: 1
Issue comment event: 8
Push event: 59
Create event: 30

Issues and Pull Requests

Last synced: 3 months ago

All Time

Total issues: 3
Total pull requests: 13
Average time to close issues: 33 minutes
Average time to close pull requests: 8 days
Total issue authors: 2
Total pull request authors: 2
Average comments per issue: 1.33
Average comments per pull request: 0.0
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 13

Past Year

Issues: 3
Pull requests: 13
Average time to close issues: 33 minutes
Average time to close pull requests: 8 days
Issue authors: 2
Pull request authors: 2
Average comments per issue: 1.33
Average comments per pull request: 0.0
Merged pull requests: 12
Bot issues: 0
Bot pull requests: 13

View more stats

Top Authors

Issue Authors

kokitsuyuzaki (2)
JuliaTagBot (1)

Pull Request Authors

github-actions[bot] (10)
dependabot[bot] (3)

Top Labels

Issue Labels

Pull Request Labels

dependencies (3) github_actions (3)

Packages

Total packages: 1
Total downloads: unknown

Total dependent packages: 0
Total dependent repositories: 0
Total versions: 2

juliahub.com: OnlinePCA

Online Principal Component Analysis

Homepage: https://rikenbit.github.io/OnlinePCA.jl/
Documentation: https://docs.juliahub.com/General/OnlinePCA/stable/
License: MIT
Latest release: 0.3.10
published 8 months ago

Versions: 2
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent repos count: 8.4%

Average: 22.3%

Dependent packages count: 36.3%

Last synced: about 1 month ago

Dependencies

.github/workflows/CI.yml actions

actions/checkout v4 composite
julia-actions/cache v2 composite
julia-actions/julia-buildpkg v1 composite
julia-actions/julia-runtest v1 composite
julia-actions/setup-julia v2 composite

.github/workflows/CompatHelper.yml actions

.github/workflows/TagBot.yml actions

JuliaRegistries/TagBot v1 composite

.github/workflows/build_push.yml actions

actions/checkout v4 composite
docker/build-push-action v6 composite
docker/login-action v3 composite

Dockerfile docker

julia 1.8.0-rc1-buster build

.github/workflows/docs.yml actions

actions/checkout v3 composite
julia-actions/setup-julia v1 composite

OnlinePCA.jl: A Julia Package for Out-of-core and Sparse Principal Component Analysis

Science Score: 87.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

OnlinePCA.jl

📚 Documentation

Description

Algorithms

Learning Parameter Scheduling

Installation

Requirements

Installation Methods

push the key "]" and type the following command.

Press Backspace or Ctrl+C to return to Julia REPL

Optional Dependencies

Basic API usage

Preprocess of CSV

CSV

Binarization (Zstandard)

Matrix Market (MM)

Summary of data for CSV/Dense Matrix

Setting for plot

GD-PCA

SGD-PCA

Oja's method

CCIPCA

RSGD-PCA

SVRG-PCA

RSVRG-PCA

Orthogonal Iteration (Power method)

Arnoldi method

Lanczos method

Halko's method

Algorithm 971

Randomized Block Krylov Iteration

Single-pass PCA type I

Single-pass PCA type II

Summarization for 10X-HDF5

Algorithm 971 for 10X-HDF5

Summary of data for MM/Sparse Matrix

Sparsification + Binarization (Zstandard + MM format)

Sparse Randomized SVD for MM format

Exact Out-of-Core PCA

CSV

Binarization (Zstandard)

Matrix Market (MM)

Binary COO (BinCOO)

Binarization (CSV + Zstandard)

Binarization (MM + Zstandard)

Binarziation (BinCOO + Zstandard)

Dense-mode

Sparse-mode (MM)

Sparse-mode (BinCOO)

Command line usage

CSV → Julia Binary (e.g, csv2bin, mm2bin)

Summary statistics extracted from Julia Binary (e.g., sumr, tenxsumr)

Perform PCA

Distributed Computing with Multiple Stepsize Setting

!/bin/bash

!/bin/bash

!/bin/bash

Contributing

Author

Owner

JOSS Publication

OnlinePCA.jl: A Julia Package for Out-of-core and Sparse Principal Component Analysis

Authors

Editor

Tags

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year