cider

R package CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation

https://github.com/zhiyhu/cider

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 5 DOI reference(s) in README
✓
Academic publication links
Links to: sciencedirect.com, zenodo.org
○
Committers with academic emails
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (17.3%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

R package CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation

Basic Info

Host: GitHub
Owner: zhiyhu
License: mit
Language: HTML
Default Branch: master
Homepage: https://zhiyuan-hu-lab.github.io/CIDER/
Size: 67 MB

Statistics

Stars: 6
Watchers: 1
Forks: 4
Open Issues: 0
Releases: 1

Archived

Created almost 6 years ago · Last pushed over 1 year ago

Metadata Files

Readme Changelog License Citation

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)
```

# CIDER: Meta-Clustering for Single-Cell Data Integration and Evaluation




[![DOI](https://zenodo.org/badge/296897483.svg)](https://zenodo.org/badge/latestdoi/296897483)

Clustering Single-cell RNA-Seq (scRNA-Seq) data from multiple samples or conditions are often challenged by confounding factors, such as batch effects and biologically relevant variability. Existing batch effect removal methods typically require strong assumptions on the composition of cell populations being near identical across samples. Here we present **CIDER**, a **meta-clustering workflow** based on inter-group similarity measures. The prototype of this method is firstly applied in [Hu et al., Cancer Cell 2020](https://www.sciencedirect.com/science/article/pii/S1535610820300428).

**For more informtion please see our [publication](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02561-2) on Genome Biology (2021).** The published version and new citation information will be available soon.


CIDER can:

1. address the **clustering** task for confounded scRNA-Seq data, or
1. assess the biological correctness of integration as **a test metric**, while it does **not** require the existence of prior cellular annotations, or
1. compute the similarity among biological populations.









## Installation (archived versions)

Archived versions of CIDER are available on [cran Archive](https://cran.r-project.org/src/contrib/Archive/CIDER/).

First you can download the package to a local folder and then install it from the file

```r
## replace the "path_to_pkg" with the real path
install.packages("path_to_pkg/CIDER_0.99.1.tar.gz", repos = NULL, type = "source")
```

We are working on the compatibility to bring our package back to cran.

## Compatibility

**Seurat**: currently our package only supports the old version Seurat, but not the Seurat v5. We are working to resolve the issue.


## CIDER as an evaluation metric - Quick start

If you have scRNA-Seq data corrected by an integration algorithm (e.g. Seurat-CCA, Harmony, Scanrama...). You can use CIDER to evaluate if the biological populations are correctly aligned.

Before running CIDER evaluation functions, make sure that you have a Seurat object (e.g. `seu.integrated`) with corrected PCs in `seu.integrated@reductions$pca@cell.embeddings`. Seurat-CCA automatically put the corrected PCs there. If other methods are used, the corrected PCs can be added using `seu.integrated@reductions$pca@cell.embeddings <- corrected.PCs`.

```r
library(CIDER)
seu.integrated <- hdbscan.seurat(seu.integrated)
ider <- getIDEr(seu.integrated, verbose = FALSE)
seu.integrated <- estimateProb(seu.integrated, ider)
```

The evaluation scores (IDER-based similarity and empirical p values) can be visualised by the `scatterPlot` function. A [detailed tutorial of evaluation](https://zhiyhu.github.io/CIDER/articles/evaluation.html) is available.

```r
p1 <- scatterPlot(seu.integrated, "tsne", colour.by = "similarity")
p2 <- scatterPlot(seu.integrated, "tsne", colour.by = "pvalue") 
plot_grid(p1,p2, ncol = 3)
```
![](man/figures/evaluation_scatterplot.png)


## Use CIDER for clustering tasks




### Quick start - asCIDER

Here `seu` is a Seurat object with initial clustering annotation stored in `initial_cluster` of metadata and batch information in `Batch`. The asCIDER example here contains two steps: computing IDER-based similarity matrix (`getIDER`) and performing the final clustering (`finalClustering`).

```r
ider <- getIDEr(seu, 
                group.by.var = "initial_cluster",
                batch.by.var = "Batch")
seu <- finalClustering(seu, ider, cutree.h = 0.45)
```

A detailed tutorial of asCIDER is [here](https://zhiyhu.github.io/CIDER/articles/asCIDER.html). If your data do not have prior batch-specific clusters, the dnCDIER tutorials ([high-level](https://zhiyhu.github.io/CIDER/articles/dnCIDER_highlevel.html) and [detailed walk-through](https://zhiyhu.github.io/CIDER/articles/dnCIDER.html)) can be referred to.


## Quick start - compute the similarity matrix within one batch

Here is the code used to compute the similarity matrix within one batch. 

```r
library(CIDER)
# make sure have a colume called "Batch" in the data and assign a uniform value
# to all cells, for example:
seu$Batch <- "onebatch" 
# run the getDistMat, the input needs to be list(seu), the tmp.initial.clusters 
# is the group information that you want to compute the similarity matrix with.
# The output is the similarity matrix.
dist <- getDistMat(seu_list = list(seu), tmp.initial.clusters = "cell_type")
```

A more dedicated function is under construction.




## Bug reports and issues

Please use [Issues](https://github.com/zhiyhu/CIDER/issues) to report bugs or
seek help. Thank you!

## Citation

Z. Hu, A. A. Ahmed, C. Yau. CIDER: an interpretable meta-clustering framework for single-cell RNA-seq data integration and evaluation. *Genome Biology*
22, Article number: 337 (2021); [doi:
https://doi.org/10.1186/s13059-021-02561-2](https://doi.org/10.1186/s13059-021-02561-2)

Owner

Name: Zhiyuan Hu
Login: zhiyhu
Kind: user
Company: Wuhan University

Twitter: zhi_yuan_hu
Repositories: 19
Profile: https://github.com/zhiyhu

Professor/PI @zhiyuan-hu-lab at Wuhan University; collaborator @TSS-Lab at University of Oxford

Citation (CITATION.cff)

cff-version: 0.99.0
message: "If you use this software, please cite it as below."
authors:
  - family-names: Hu
    given-names: Zhiyuan
    orcid: https://orcid.org/0000-0000-0000-0000
  - family-names: Ahmed
    given-names: Ahmed
  - family-names: Yau
    given-names: Christopher Yau
title: zhiyhu/CIDER: Genome Biology Release
version: GBIO
date-released: 2021-11-20

GitHub Events

Total

Issues event: 3
Issue comment event: 3
Push event: 1
Fork event: 1

Last Year

Issues event: 3
Issue comment event: 3
Push event: 1
Fork event: 1

Committers

Last synced: over 2 years ago

All Time

Total Commits: 84
Total Committers: 2
Avg Commits per committer: 42.0
Development Distribution Score (DDS): 0.095

Past Year

Commits: 0
Committers: 0
Avg Commits per committer: 0.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
zhiyhu	z**u@g**m	76
Zhiyuan Hu	1****u	8

Issues and Pull Requests

Last synced: 10 months ago

All Time

Total issues: 4
Total pull requests: 2
Average time to close issues: about 1 year
Average time to close pull requests: less than a minute
Total issue authors: 4
Total pull request authors: 1
Average comments per issue: 2.0
Average comments per pull request: 0.0
Merged pull requests: 2
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 0
Pull request authors: 0
Average comments per issue: 0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

YuchiZou (1)
eblchen (1)
levinhein (1)
Gesmira (1)
sjspielman (1)

Pull Request Authors

zhiyhu (2)

Top Labels

Issue Labels

Pull Request Labels

Dependencies

DESCRIPTION cran

R >= 3.5.0 depends
Seurat >= 3.1.0 imports
dbscan >= 1.1 imports
doParallel * imports
edgeR >= 3.28.0 imports
foreach >= 1.4.7 imports
ggplot2 * imports
graphics * imports
igraph * imports
kernlab >= 0.9 imports
limma >= 3.42.0 imports
parallel * imports
pheatmap >= 1.0.0 imports
stats >= 3.6.2 imports
utils >= 3.6.2 imports
viridis * imports
cowplot * suggests
knitr * suggests
rmarkdown * suggests
statmod >= 1.2.2 suggests
testthat * suggests

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

cider

Science Score: 54.0%

Repository

Basic Info

Statistics

Metadata Files

README.Rmd

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies