DelayedMatrixStats
A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
✓Committers with academic emails
2 of 8 committers (25.0%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.3%) to scientific vocabulary
Keywords from Contributors
bioconductor-package
genomics
gene
bioconductor
bioinformatics
core-package
single-cell
rna-seq
u24ca289073
human-cell-atlas
Last synced: 10 months ago
·
JSON representation
Repository
A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package
Basic Info
Statistics
- Stars: 15
- Watchers: 3
- Forks: 8
- Open Issues: 36
- Releases: 0
Created about 9 years ago
· Last pushed about 1 year ago
Metadata Files
Readme
License
README.Rmd
---
output: github_document
editor_options:
chunk_output_type: console
---
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# DelayedMatrixStats
**DelayedMatrixStats** is a port of the
[**matrixStats**](https://CRAN.R-project.org/package=matrixStats) API to work
with *DelayedMatrix* objects from the
[**DelayedArray**](http://bioconductor.org/packages/DelayedArray/) package.
For a *DelayedMatrix*, `x`, the simplest way to apply a function, `f()`, from
**matrixStats** is`matrixStats::f(as.matrix(x))`. However, this "*realizes*"
`x` in memory as a *base::matrix*, which typically defeats the entire purpose
of using a *DelayedMatrix* for storing the data.
The **DelayedArray** package already implements a clever strategy called
"block-processing" for certain common "matrix stats" operations (e.g.
`colSums()`, `rowSums()`). This is a good start, but not all of the
**matrixStats** API is currently supported. Furthermore, certain operations can
be optimized with additional information about `x`. I'll refer to these
"seed-aware" implementations.
## Installation
You can install **DelayedMatrixStats** from Bioconductor with:
```{r gh-installation, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("DelayedMatrixStats")
```
## Example
This example compares two ways of computing column sums of a *DelayedMatrix*
object:
1. `DelayedMatrix::colSums()`: The 'block-processing strategy', implemented in the **DelayedArray** package. The block-processing strategy works for any *DelayedMatrix* object, regardless of the type of *seed*.
2. `DelayedMatrixStats::colSums2()`: The 'seed-aware' strategy, implemented in the
**DelayedMatrixStats** package. The seed-aware implementation is optimized for both speed and memory but only for *DelayedMatrix* objects with certain types of *seed*.
```{r real_pkg_load, message = FALSE, echo = FALSE}
devtools::load_all()
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)
```
```{r fake_pkg_load, message = FALSE, echo = TRUE, eval = FALSE}
library(DelayedMatrixStats)
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)
```
```{r example, message = FALSE}
set.seed(666)
# Fast column sums of DelayedMatrix with matrix seed
dense_matrix <- DelayedArray(matrix(runif(20000 * 600), nrow = 20000,
ncol = 600))
class(seed(dense_matrix))
dense_matrix
microbenchmark(DelayedArray::colSums(dense_matrix),
DelayedMatrixStats::colSums2(dense_matrix),
times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(dense_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(dense_matrix)))
# Fast, low-memory column sums of DelayedMatrix with sparse matrix seed
sparse_matrix <- seed(dense_matrix)
zero_idx <- sample(length(sparse_matrix), 0.6 * length(sparse_matrix))
sparse_matrix[zero_idx] <- 0
sparse_matrix <- DelayedArray(Matrix::Matrix(sparse_matrix, sparse = TRUE))
class(seed(sparse_matrix))
sparse_matrix
microbenchmark(DelayedArray::colSums(sparse_matrix),
DelayedMatrixStats::colSums2(sparse_matrix),
times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(sparse_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(sparse_matrix)))
# Fast column sums of DelayedMatrix with Rle-based seed
rle_matrix <- RleArray(Rle(sample(2L, 200000 * 6 / 10, replace = TRUE), 100),
dim = c(2000000, 6))
class(seed(rle_matrix))
rle_matrix
microbenchmark(DelayedArray::colSums(rle_matrix),
DelayedMatrixStats::colSums2(rle_matrix),
times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(rle_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(rle_matrix)))
```
## Benchmarking
An extensive set of benchmarks is under development at [http://peterhickey.org/BenchmarkingDelayedMatrixStats/](http://peterhickey.org/BenchmarkingDelayedMatrixStats/).
## API coverage
- ✔ = Implemented in **DelayedMatrixStats**
- ☑️ = Implemented in [**DelayedArray**](http://bioconductor.org/packages/DelayedArray/) or [**sparseMatrixStats**](http://bioconductor.org/packages/sparseMatrixStats/)
- ❌: = Not yet implemented
```{r, echo = FALSE, comment="", results = "asis"}
matrixStats <- sort(
c("colsum", "rowsum", grep("^(col|row)",
getNamespaceExports("matrixStats"),
value = TRUE)))
sparseMatrixStats <- getNamespaceExports("sparseMatrixStats")
DelayedMatrixStats <- getNamespaceExports("DelayedMatrixStats")
DelayedArray <- getNamespaceExports("DelayedArray")
api_df <- data.frame(
Method = paste0("`", matrixStats, "()`"),
`Block processing` = ifelse(
matrixStats %in% DelayedMatrixStats,
"✔",
ifelse(matrixStats %in% c(DelayedArray, sparseMatrixStats), "☑️", "❌")),
`_base::matrix_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "matrix_OR_array_OR_table_OR_numeric"),
"✔",
"❌"),
`_Matrix::dgCMatrix_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "xgCMatrix") | sapply(matrixStats, existsMethod, signature = "dgCMatrix"),
"✔",
"❌"),
`_Matrix::lgCMatrix_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "xgCMatrix") | sapply(matrixStats, existsMethod, signature = "lgCMatrix"),
"✔",
"❌"),
`_DelayedArray::RleArray_ (_SolidRleArraySeed_) optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "SolidRleArraySeed"),
"✔",
"❌"),
`_DelayedArray::RleArray_ (_ChunkedRleArraySeed_) optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "ChunkedRleArraySeed"),
"✔",
"❌"),
`_HDF5Array::HDF5Matrix_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "HDF5ArraySeed"),
"✔",
"❌"),
`_base::data.frame_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "data.frame"),
"✔",
"❌"),
`_S4Vectors::DataFrame_ optimized` =
ifelse(sapply(matrixStats, existsMethod, signature = "DataFrame"),
"✔",
"❌"),
check.names = FALSE)
knitr::kable(api_df, row.names = FALSE)
```
Owner
- Name: Peter Hickey
- Login: PeteHaitch
- Kind: user
- Website: www.peterhickey.org
- Repositories: 23
- Profile: https://github.com/PeteHaitch
GitHub Events
Total
- Watch event: 2
- Push event: 4
- Fork event: 2
- Create event: 1
Last Year
- Watch event: 2
- Push event: 4
- Fork event: 2
- Create event: 1
Committers
Last synced: about 1 year ago
Top Committers
| Name | Commits | |
|---|---|---|
| Peter Hickey | p****y@g****m | 254 |
| Hervé Pagès | h****s@f****g | 24 |
| Nitesh Turaga | n****a@g****m | 14 |
| LTLA | i****s@g****m | 13 |
| J Wokaty | j****y@s****u | 10 |
| vobencha | v****a@g****m | 2 |
| vobencha | v****n@r****g | 2 |
| A Wokaty | a****y@s****u | 2 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 84
- Total pull requests: 18
- Average time to close issues: about 1 month
- Average time to close pull requests: 23 days
- Total issue authors: 20
- Total pull request authors: 5
- Average comments per issue: 2.13
- Average comments per pull request: 2.44
- Merged pull requests: 13
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: about 24 hours
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 1.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- PeteHaitch (54)
- LTLA (7)
- HenrikBengtsson (4)
- ryandesign (2)
- MalteThodberg (2)
- daisyyr (1)
- chen-peng-1874 (1)
- ndusek (1)
- sanhe374 (1)
- maxim-h (1)
- derbalkon (1)
- hpages (1)
- tillea (1)
- bschilder (1)
- christie-ga (1)
Pull Request Authors
- hpages (8)
- LTLA (5)
- PeteHaitch (4)
- const-ae (1)
- bbimber (1)
Top Labels
Issue Labels
initial_release (9)
enhancement (7)
help wanted (5)
future_release (4)
benchmarking (3)
out_of_scope (3)
ongoing (3)
question (1)
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- bioconductor 1,548,727 total
- Total dependent packages: 45
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
bioconductor.org: DelayedMatrixStats
Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects
- Homepage: https://github.com/PeteHaitch/DelayedMatrixStats
- Documentation: https://bioconductor.org/packages/release/bioc/vignettes/DelayedMatrixStats/inst/doc/DelayedMatrixStats.pdf
- License: MIT + file LICENSE
-
Latest release: 1.30.0
published about 1 year ago
Rankings
Dependent repos count: 0.0%
Dependent packages count: 2.0%
Downloads: 2.2%
Average: 4.6%
Forks count: 8.5%
Stargazers count: 10.2%
Maintainers (1)
Last synced:
10 months ago
Dependencies
DESCRIPTION
cran
- DelayedArray >= 0.17.6 depends
- MatrixGenerics >= 1.5.3 depends
- IRanges >= 2.25.10 imports
- Matrix * imports
- S4Vectors >= 0.17.5 imports
- matrixStats >= 0.60.0 imports
- methods * imports
- sparseMatrixStats * imports
- BiocStyle * suggests
- HDF5Array * suggests
- covr * suggests
- knitr * suggests
- microbenchmark * suggests
- profmem * suggests
- rmarkdown * suggests
- testthat * suggests