cider

R package CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation

https://github.com/zhiyhu/cider

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: sciencedirect.com, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.3%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

R package CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation

Basic Info
Statistics
  • Stars: 6
  • Watchers: 1
  • Forks: 4
  • Open Issues: 0
  • Releases: 1
Archived
Created over 5 years ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation




[![DOI](https://zenodo.org/badge/296897483.svg)](https://zenodo.org/badge/latestdoi/296897483)

Clustering Single-cell RNA-Seq (scRNA-Seq) data from multiple samples or conditions are often challenged by confounding factors, such as batch effects and biologically relevant variability. Existing batch effect removal methods typically require strong assumptions on the composition of cell populations being near identical across samples. Here we present **CIDER**, a **meta-clustering workflow** based on inter-group similarity measures. The prototype of this method is firstly applied in [Hu et al., Cancer Cell 2020](https://www.sciencedirect.com/science/article/pii/S1535610820300428).

**For more informtion please see our [publication](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02561-2) on Genome Biology (2021).** The published version and new citation information will be available soon.


CIDER can:

1. address the **clustering** task for confounded scRNA-Seq data, or
1. assess the biological correctness of integration as **a test metric**, while it does **not** require the existence of prior cellular annotations, or
1. compute the similarity among biological populations.









## Installation (archived versions)

Archived versions of CIDER are available on [cran Archive](https://cran.r-project.org/src/contrib/Archive/CIDER/).

First you can download the package to a local folder and then install it from the file

```r
## replace the "path_to_pkg" with the real path
install.packages("path_to_pkg/CIDER_0.99.1.tar.gz", repos = NULL, type = "source")
```

We are working on the compatibility to bring our package back to cran.

## Compatibility

**Seurat**: currently our package only supports the old version Seurat, but not the Seurat v5. We are working to resolve the issue.


## CIDER as an evaluation metric - Quick start

If you have scRNA-Seq data corrected by an integration algorithm (e.g. Seurat-CCA, Harmony, Scanrama...). You can use CIDER to evaluate if the biological populations are correctly aligned.

Before running CIDER evaluation functions, make sure that you have a Seurat object (e.g. `seu.integrated`) with corrected PCs in `seu.integrated@reductions$pca@cell.embeddings`. Seurat-CCA automatically put the corrected PCs there. If other methods are used, the corrected PCs can be added using `seu.integrated@reductions$pca@cell.embeddings <- corrected.PCs`.

```r
library(CIDER)
seu.integrated <- hdbscan.seurat(seu.integrated)
ider <- getIDEr(seu.integrated, verbose = FALSE)
seu.integrated <- estimateProb(seu.integrated, ider)
```

The evaluation scores (IDER-based similarity and empirical p values) can be visualised by the `scatterPlot` function. A [detailed tutorial of evaluation](https://zhiyhu.github.io/CIDER/articles/evaluation.html) is available.

```r
p1 <- scatterPlot(seu.integrated, "tsne", colour.by = "similarity")
p2 <- scatterPlot(seu.integrated, "tsne", colour.by = "pvalue") 
plot_grid(p1,p2, ncol = 3)
```
![](man/figures/evaluation_scatterplot.png)


## Use CIDER for clustering tasks




### Quick start - asCIDER

Here `seu` is a Seurat object with initial clustering annotation stored in `initial_cluster` of metadata and batch information in `Batch`. The asCIDER example here contains two steps: computing IDER-based similarity matrix (`getIDER`) and performing the final clustering (`finalClustering`).

```r
ider <- getIDEr(seu, 
                group.by.var = "initial_cluster",
                batch.by.var = "Batch")
seu <- finalClustering(seu, ider, cutree.h = 0.45)
```

A detailed tutorial of asCIDER is [here](https://zhiyhu.github.io/CIDER/articles/asCIDER.html). If your data do not have prior batch-specific clusters, the dnCDIER tutorials ([high-level](https://zhiyhu.github.io/CIDER/articles/dnCIDER_highlevel.html) and [detailed walk-through](https://zhiyhu.github.io/CIDER/articles/dnCIDER.html)) can be referred to.


## Quick start - compute the similarity matrix within one batch

Here is the code used to compute the similarity matrix within one batch. 

```r
library(CIDER)
# make sure have a colume called "Batch" in the data and assign a uniform value
# to all cells, for example:
seu$Batch <- "onebatch" 
# run the getDistMat, the input needs to be list(seu), the tmp.initial.clusters 
# is the group information that you want to compute the similarity matrix with.
# The output is the similarity matrix.
dist <- getDistMat(seu_list = list(seu), tmp.initial.clusters = "cell_type")
```

A more dedicated function is under construction.




## Bug reports and issues

Please use [Issues](https://github.com/zhiyhu/CIDER/issues) to report bugs or
seek help. Thank you!

## Citation

Z. Hu, A. A. Ahmed, C. Yau. CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation. *Genome Biology*
22, Article number: 337 (2021); [doi:
https://doi.org/10.1186/s13059-021-02561-2](https://doi.org/10.1186/s13059-021-02561-2)


Owner

  • Name: Zhiyuan Hu
  • Login: zhiyhu
  • Kind: user
  • Company: Wuhan University

Professor/PI @zhiyuan-hu-lab at Wuhan University; collaborator @TSS-Lab at University of Oxford

Citation (CITATION.cff)

cff-version: 0.99.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Hu
    given-names: Zhiyuan
    orcid: https://orcid.org/0000-0000-0000-0000
  - family-names: Ahmed
    given-names: Ahmed
  - family-names: Yau
    given-names: Christopher Yau
title: zhiyhu/CIDER: Genome Biology Release
version: GBIO
date-released: 2021-11-20
            

GitHub Events

Total
  • Issues event: 3
  • Issue comment event: 3
  • Push event: 1
  • Fork event: 1
Last Year
  • Issues event: 3
  • Issue comment event: 3
  • Push event: 1
  • Fork event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 84
  • Total Committers: 2
  • Avg Commits per committer: 42.0
  • Development Distribution Score (DDS): 0.095
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
zhiyhu z****u@g****m 76
Zhiyuan Hu 1****u 8

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 4
  • Total pull requests: 2
  • Average time to close issues: about 1 year
  • Average time to close pull requests: less than a minute
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 2.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • YuchiZou (1)
  • eblchen (1)
  • levinhein (1)
  • Gesmira (1)
  • sjspielman (1)
Pull Request Authors
  • zhiyhu (2)
Top Labels
Issue Labels
Pull Request Labels

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • Seurat >= 3.1.0 imports
  • dbscan >= 1.1 imports
  • doParallel * imports
  • edgeR >= 3.28.0 imports
  • foreach >= 1.4.7 imports
  • ggplot2 * imports
  • graphics * imports
  • igraph * imports
  • kernlab >= 0.9 imports
  • limma >= 3.42.0 imports
  • parallel * imports
  • pheatmap >= 1.0.0 imports
  • stats >= 3.6.2 imports
  • utils >= 3.6.2 imports
  • viridis * imports
  • cowplot * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • statmod >= 1.2.2 suggests
  • testthat * suggests