DelayedMatrixStats

A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package

https://github.com/petehaitch/delayedmatrixstats

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    2 of 8 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary

Keywords from Contributors

bioconductor-package genomics gene bioconductor bioinformatics core-package single-cell rna-seq u24ca289073 human-cell-atlas
Last synced: 10 months ago · JSON representation

Repository

A port of the matrixStats API to work with DelayedMatrix objects from the DelayedArray package

Basic Info
  • Host: GitHub
  • Owner: PeteHaitch
  • License: other
  • Language: R
  • Default Branch: devel
  • Homepage:
  • Size: 741 KB
Statistics
  • Stars: 15
  • Watchers: 3
  • Forks: 8
  • Open Issues: 36
  • Releases: 0
Created about 9 years ago · Last pushed about 1 year ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
editor_options: 
  chunk_output_type: console
---



```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# DelayedMatrixStats

**DelayedMatrixStats** is a port of the 
[**matrixStats**](https://CRAN.R-project.org/package=matrixStats) API to work 
with *DelayedMatrix* objects from the 
[**DelayedArray**](http://bioconductor.org/packages/DelayedArray/) package.

For a *DelayedMatrix*, `x`, the simplest way to apply a function, `f()`, from 
**matrixStats** is`matrixStats::f(as.matrix(x))`. However, this "*realizes*" 
`x` in memory as a *base::matrix*, which typically defeats the entire purpose 
of using a *DelayedMatrix* for storing the data.

The **DelayedArray** package already implements a clever strategy called 
"block-processing" for certain common "matrix stats" operations (e.g. 
`colSums()`, `rowSums()`). This is a good start, but not all of the 
**matrixStats** API is currently supported. Furthermore, certain operations can 
be optimized with additional information about `x`. I'll refer to these 
"seed-aware" implementations.

## Installation

You can install **DelayedMatrixStats** from Bioconductor with:

```{r gh-installation, eval = FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
  install.packages("BiocManager")

BiocManager::install("DelayedMatrixStats")
```

## Example

This example compares two ways of computing column sums of a *DelayedMatrix* 
object:

1. `DelayedMatrix::colSums()`: The 'block-processing strategy', implemented in the **DelayedArray** package. The block-processing strategy works for any *DelayedMatrix* object, regardless of the type of *seed*.
2. `DelayedMatrixStats::colSums2()`: The 'seed-aware' strategy, implemented in the 
**DelayedMatrixStats** package. The seed-aware implementation is optimized for both speed and memory but only for *DelayedMatrix* objects with certain types of *seed*.

```{r real_pkg_load, message = FALSE, echo = FALSE}
devtools::load_all()
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)
```

```{r fake_pkg_load, message = FALSE, echo = TRUE, eval = FALSE}
library(DelayedMatrixStats)
library(sparseMatrixStats)
library(microbenchmark)
library(profmem)
```

```{r example, message = FALSE}
set.seed(666)

# Fast column sums of DelayedMatrix with matrix seed
dense_matrix <- DelayedArray(matrix(runif(20000 * 600), nrow = 20000,
                                    ncol = 600))
class(seed(dense_matrix))
dense_matrix
microbenchmark(DelayedArray::colSums(dense_matrix),
               DelayedMatrixStats::colSums2(dense_matrix),
               times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(dense_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(dense_matrix)))

# Fast, low-memory column sums of DelayedMatrix with sparse matrix seed
sparse_matrix <- seed(dense_matrix)
zero_idx <- sample(length(sparse_matrix), 0.6 * length(sparse_matrix))
sparse_matrix[zero_idx] <- 0
sparse_matrix <- DelayedArray(Matrix::Matrix(sparse_matrix, sparse = TRUE))
class(seed(sparse_matrix))
sparse_matrix
microbenchmark(DelayedArray::colSums(sparse_matrix),
               DelayedMatrixStats::colSums2(sparse_matrix),
               times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(sparse_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(sparse_matrix)))

# Fast column sums of DelayedMatrix with Rle-based seed
rle_matrix <- RleArray(Rle(sample(2L, 200000 * 6 / 10, replace = TRUE), 100),
                       dim = c(2000000, 6))
class(seed(rle_matrix))
rle_matrix
microbenchmark(DelayedArray::colSums(rle_matrix),
               DelayedMatrixStats::colSums2(rle_matrix),
               times = 10)
profmem::total(profmem::profmem(DelayedArray::colSums(rle_matrix)))
profmem::total(profmem::profmem(DelayedMatrixStats::colSums2(rle_matrix)))
```

## Benchmarking

An extensive set of benchmarks is under development at [http://peterhickey.org/BenchmarkingDelayedMatrixStats/](http://peterhickey.org/BenchmarkingDelayedMatrixStats/).

## API coverage

- ✔ = Implemented in **DelayedMatrixStats**
- ☑️ = Implemented in [**DelayedArray**](http://bioconductor.org/packages/DelayedArray/) or [**sparseMatrixStats**](http://bioconductor.org/packages/sparseMatrixStats/)
- ❌: = Not yet implemented

```{r, echo = FALSE, comment="", results = "asis"}
matrixStats <- sort(
  c("colsum", "rowsum", grep("^(col|row)", 
                             getNamespaceExports("matrixStats"), 
                             value = TRUE)))
sparseMatrixStats <- getNamespaceExports("sparseMatrixStats")
DelayedMatrixStats <- getNamespaceExports("DelayedMatrixStats")
DelayedArray <- getNamespaceExports("DelayedArray")

api_df <- data.frame(
  Method = paste0("`", matrixStats, "()`"),
  `Block processing` = ifelse(
    matrixStats %in% DelayedMatrixStats,
    "✔",
    ifelse(matrixStats %in% c(DelayedArray, sparseMatrixStats), "☑️", "❌")),
  `_base::matrix_ optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "matrix_OR_array_OR_table_OR_numeric"), 
           "✔", 
           "❌"),
  `_Matrix::dgCMatrix_ optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "xgCMatrix") | sapply(matrixStats, existsMethod, signature = "dgCMatrix"), 
           "✔", 
           "❌"),
  `_Matrix::lgCMatrix_ optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "xgCMatrix") | sapply(matrixStats, existsMethod, signature = "lgCMatrix"), 
           "✔", 
           "❌"),
  `_DelayedArray::RleArray_ (_SolidRleArraySeed_) optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "SolidRleArraySeed"),
           "✔", 
           "❌"),
  `_DelayedArray::RleArray_  (_ChunkedRleArraySeed_) optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "ChunkedRleArraySeed"),
           "✔", 
           "❌"),
  `_HDF5Array::HDF5Matrix_ optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "HDF5ArraySeed"),
           "✔", 
           "❌"),
  `_base::data.frame_ optimized` = 
    ifelse(sapply(matrixStats, existsMethod, signature = "data.frame"),
           "✔", 
           "❌"),
  `_S4Vectors::DataFrame_ optimized` =
    ifelse(sapply(matrixStats, existsMethod, signature = "DataFrame"),
           "✔", 
           "❌"), 
  check.names = FALSE)
knitr::kable(api_df, row.names = FALSE)
```

Owner

  • Name: Peter Hickey
  • Login: PeteHaitch
  • Kind: user

GitHub Events

Total
  • Watch event: 2
  • Push event: 4
  • Fork event: 2
  • Create event: 1
Last Year
  • Watch event: 2
  • Push event: 4
  • Fork event: 2
  • Create event: 1

Committers

Last synced: about 1 year ago

All Time
  • Total Commits: 321
  • Total Committers: 8
  • Avg Commits per committer: 40.125
  • Development Distribution Score (DDS): 0.209
Past Year
  • Commits: 8
  • Committers: 4
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.625
Top Committers
Name Email Commits
Peter Hickey p****y@g****m 254
Hervé Pagès h****s@f****g 24
Nitesh Turaga n****a@g****m 14
LTLA i****s@g****m 13
J Wokaty j****y@s****u 10
vobencha v****a@g****m 2
vobencha v****n@r****g 2
A Wokaty a****y@s****u 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 84
  • Total pull requests: 18
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 23 days
  • Total issue authors: 20
  • Total pull request authors: 5
  • Average comments per issue: 2.13
  • Average comments per pull request: 2.44
  • Merged pull requests: 13
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: about 24 hours
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • PeteHaitch (54)
  • LTLA (7)
  • HenrikBengtsson (4)
  • ryandesign (2)
  • MalteThodberg (2)
  • daisyyr (1)
  • chen-peng-1874 (1)
  • ndusek (1)
  • sanhe374 (1)
  • maxim-h (1)
  • derbalkon (1)
  • hpages (1)
  • tillea (1)
  • bschilder (1)
  • christie-ga (1)
Pull Request Authors
  • hpages (8)
  • LTLA (5)
  • PeteHaitch (4)
  • const-ae (1)
  • bbimber (1)
Top Labels
Issue Labels
initial_release (9) enhancement (7) help wanted (5) future_release (4) benchmarking (3) out_of_scope (3) ongoing (3) question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 1,548,727 total
  • Total dependent packages: 45
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
bioconductor.org: DelayedMatrixStats

Functions that Apply to Rows and Columns of 'DelayedMatrix' Objects

  • Versions: 6
  • Dependent Packages: 45
  • Dependent Repositories: 0
  • Downloads: 1,548,727 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 2.0%
Downloads: 2.2%
Average: 4.6%
Forks count: 8.5%
Stargazers count: 10.2%
Maintainers (1)
Last synced: 10 months ago

Dependencies

DESCRIPTION cran
  • DelayedArray >= 0.17.6 depends
  • MatrixGenerics >= 1.5.3 depends
  • IRanges >= 2.25.10 imports
  • Matrix * imports
  • S4Vectors >= 0.17.5 imports
  • matrixStats >= 0.60.0 imports
  • methods * imports
  • sparseMatrixStats * imports
  • BiocStyle * suggests
  • HDF5Array * suggests
  • covr * suggests
  • knitr * suggests
  • microbenchmark * suggests
  • profmem * suggests
  • rmarkdown * suggests
  • testthat * suggests