flashmm
Fast and Scalable Single Cell Differential Expression Analysis using Mixed-effects Models
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.7%) to scientific vocabulary
Last synced: 7 months ago
·
JSON representation
Repository
Fast and Scalable Single Cell Differential Expression Analysis using Mixed-effects Models
Basic Info
- Host: GitHub
- Owner: BaderLab
- License: other
- Language: R
- Default Branch: main
- Size: 1.75 MB
Statistics
- Stars: 6
- Watchers: 2
- Forks: 0
- Open Issues: 11
- Releases: 1
Created over 1 year ago
· Last pushed 7 months ago
Metadata Files
Readme
Changelog
License
README.Rmd
---
output: github_document
---
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# FLASH-MM
FLASH-MM is a method (package name: FLASHMM) for analysis of single-cell differential expression using a linear mixed-effects model (LMM). The mixed-effects model is a powerful tool in single-cell studies due to their ability to model intra-subject correlation and inter-subject variability.
The FLASHMM package provides two functions for fitting LMMs: *lmm* and *lmmfit*. The *lmm* function takes summary statistics as input, whereas *lmmfit* is a wrapper around *lmm* that directly processes cell-level data and computes the summary statistics internally. While *lmmfit* is easier to use, it has the limitation of higher memory consumption. For extremely large scale data, it is recommended to precompute and store the summary statistics and then use *lmm* function to fit LMMs.
In summary, FLASHMM package provides the following main functions.
* *lmm*: fit LMM using summary-level data.
* *lmmfit*: fit LMM using cell-level data.
* *lmmtest*: perform statistical tests on the fixed effects and their contrasts.
* *contrast.matrix*: construct contrast matrix of the fixed effects for various comparisons.
* *simuRNAseq*: simulate multi-sample multi-cell-type scRNA-seq data.
## Installation
You can install FLASHMM package from CRAN:
```{r echo = TRUE, results = "hide", message = FALSE}
install.packages("FLASHMM")
```
Or the development version from GitHub:
```{r echo = TRUE, results = "hide", message = FALSE}
devtools::install_github("https://github.com/Baderlab/FLASHMM")
```
## Example
This basic example shows how to use FLASHMM for analyzing single-cell differential expression. See the package vignette for details: https://cran.r-project.org/web/packages/FLASHMM/vignettes/FLASHMM-vignette.html.
```{r}
library(FLASHMM)
```
### Simulating scRNA-seq dataset with *simuRNAseq*
Simulate a multi-sample multi-cell-cluster scRNA-seq dataset that contains 25 samples and 4 clusters (cell-types) with 2 treatments.
```{r dataset}
set.seed(2412)
dat <- simuRNAseq(nGenes = 50, nCells = 1000, nsam = 25, ncls = 4, ntrt = 2, nDEgenes = 6)
names(dat)
##counts and meta data
counts <- dat$counts
metadata <- dat$metadata
head(metadata)
##DE genes
dat$DEgenes
rm(dat)
```
The simulated data contains
* *counts*: a genes-by-cells matrix of expression counts
* *metadata*: a data frame consisting of samples (sam), cell-types (cls) and treatments (trt).
* *DEgenes*: differetially expressed (DE) genes.
### Differential expression analysis using LMM
The analyses involve following steps: LMM design, LMM fitting, and hypothesis testing.
**1. Model design**
* Y: gene expression profile (log-transformed counts)
* X: design matrix for fixed effects
* Z: design matrix for random effects
```{r}
Y <- log(counts + 1)
X <- model.matrix(~ 0 + log(libsize) + cls + cls:trt, data = metadata)
Z <- model.matrix(~ 0 + sam, data = metadata)
d <- ncol(Z)
```
**2. LMM fitting**
**Option 1**: Fit LMMs with *lmmfit* function using cell-level data.
```{r}
fit <- lmmfit(Y, X, Z, d = d)
```
**Option 2**: Fit LMMs with *lmm* function using summary-level data.
```{r}
##(1) Computing summary statistics
n <- nrow(X)
XX <- t(X)%*%X; XY <- t(Y%*%X)
ZX <- t(Z)%*%X; ZY <- t(Y%*%Z); ZZ <- t(Z)%*%Z
Ynorm <- rowSums(Y*Y)
##(2) Fitting LMMs
fitss <- lmm(XX, XY, ZX, ZY, ZZ, Ynorm = Ynorm, n = n, d = d)
identical(fit, fitss)
```
**3. Hypothesis testing**
```{r, echo = TRUE, message = FALSE, tidy = TRUE, tidy.opts = list(width.cutoff = 80)}
##Testing coefficients (fixed effects)
test <- lmmtest(fit)
#head(test)
##The t-value and p-values are identical with those provided in the LMM fit.
range(test - cbind(t(fit$coef), t(fit$t), t(fit$p)))
fit$p[, 1:4]
#fit$coef[, 1:4]; fit$t[, 1:4]
```
**Differentially expressed (DE) genes**: The coefficients of the interactions, cls*i*
:trtB, represent the effects of treatment B versus A in a cell-type, cls*i*.
```{r}
##Coefficients, t-values, and p-values for the genes specific to a cell-type.
index <- grep(":", rownames(fit$coef))
ce <- fit$coef[index, ]
tv <- fit$t[index, ]
pv <- fit$p[index, ]
out <- data.frame(
gene = rep(colnames(ce), nrow(ce)),
cluster = rep(rownames(ce), each = ncol(ce)),
coef = c(t(ce)), t = c(t(tv)), p = c(t(pv)))
##FDR.
out$FDR <- p.adjust(out$p, method = "fdr")
##The DE genes with FDR < 0.05
out[out$FDR < 0.05, ]
```
**Using contrasts**: We can make comparisons using contrasts. For example, the effects of treatment B vs A in all clusters can be tested using the contrast constructed as follows.
```{r}
ct <- numeric(ncol(X))
index <- grep("B", colnames(X))
ct[index] <- 1/length(index)
test <- lmmtest(fit, contrast = ct)
head(test)
```
## And More
### Using ML method
To use the maximum likelihood (ML) method to fit the LMM, set method = ‘ML’ in the *lmm* and *lmmfit* functions.
```{r LMM_ML, echo = TRUE, message = FALSE, warning = FALSE}
##Fitting LMM using ML method
fit1 <- lmmfit(Y, X, Z, d = d, method = "ML")
```
### LMM with two-component random effects
If appropriate, for example, we also take account of the measurement time as a random effect within a subject, we may fit data using the LMM with two-component random effects.
```{r, echo = TRUE, message = FALSE, tidy = TRUE, tidy.opts = list(width.cutoff = 80)}
##Design matrix for two-component random effects: Suppose the data contains the measurement time points, denoted as 'time', which are randomly generated.
set.seed(2508)
n <- nrow(metadata)
metadata$time <- sample(1:2, n, replace = TRUE)
Z <- model.matrix(~ 0 + sam + sam:time, data = metadata)
d <- c(ncol(Z)/2, ncol(Z)/2) #dimension
##Fit the LMM with two-component random effects.
fit2 <- lmmfit(Y, X, Z, d = d, method = "ML")
```
### Testing variance components
We use both z-test and likelihood ratio test (LRT) to test the second variance component in the LMM with two-component random effects. Since the simulated data was generated by the LMM with single-component random effects, the second variance component should be zero. For the LRT test, the two nested models must be fitted using the same method, either REML or ML, and use the same design matrix, $X$, when using REML method.
```{r}
##(1) z-test for testing the second variance component
##Z-statistics for the second variance component
i <- grep("var2", rownames(fit2$theta))
z <- fit2$theta[i, ]/fit2$se.theta[i, ]
##One-sided z-test p-values for hypotheses:
##H0: theta <= 0 vs H1: theta > 0
p <- pnorm(z, lower.tail = FALSE)
##(2) LRT for testing the second variance component
LRT <- 2*(fit2$logLik - fit1$logLik)
pLRT <- pchisq(LRT, df = 1, lower.tail = FALSE)
##QQ-plot
qqplot(runif(length(p)), p, xlab = "Uniform quantile", ylab = "Z-test p-value", col = "blue")
abline(0, 1, col = "gray")
qqplot(runif(length(pLRT)), pLRT, xlab = "Uniform quantile", ylab = "LRT p-value", col = "blue")
abline(0, 1, col = "gray")
```
```{r}
sessionInfo()
```
# Citation
If you find FLASH-MM useful for your publication, please cite:
Xu & Pouyabahar et al., FLASH-MM: fast and scalable single-cell differential expression analysis using linear mixed-effects models, bioRxiv 2025.04.08.647860; doi: https://doi.org/10.1101/2025.04.08.647860
Owner
- Name: Bader Lab, University of Toronto
- Login: BaderLab
- Kind: organization
- Location: Toronto, Canada
- Website: http://www.baderlab.org
- Repositories: 83
- Profile: https://github.com/BaderLab
GitHub Events
Total
- Issues event: 11
- Watch event: 6
- Issue comment event: 2
- Member event: 1
- Push event: 69
- Create event: 1
Last Year
- Issues event: 11
- Watch event: 6
- Issue comment event: 2
- Member event: 1
- Push event: 69
- Create event: 1
Issues and Pull Requests
Last synced: 7 months ago
All Time
- Total issues: 10
- Total pull requests: 0
- Average time to close issues: about 12 hours
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.1
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 10
- Pull requests: 0
- Average time to close issues: about 12 hours
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.1
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ChangjiangXu (9)
- s2hui (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- cran 602 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 6
- Total maintainers: 1
cran.r-project.org: FLASHMM
Fast and Scalable Single Cell Differential Expression Analysis using Mixed-Effects Models
- Homepage: https://github.com/BaderLab/FLASHMM
- Documentation: http://cran.r-project.org/web/packages/FLASHMM/FLASHMM.pdf
- License: MIT + file LICENSE
-
Latest release: 1.2.3
published 7 months ago
Rankings
Dependent packages count: 27.4%
Dependent repos count: 33.8%
Average: 49.4%
Downloads: 87.0%
Maintainers (1)
Last synced:
7 months ago