clustifyr

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets

https://github.com/rnabioco/clustifyr

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    5 of 17 committers (29.4%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.0%) to scientific vocabulary

Keywords

assign-identities clusters marker-genes rna-seq single-cell-rna-seq

Keywords from Contributors

bioconductor-packages gene immune-repertoire ontologies gene-symbols entrez ensembl-ids edger differential-expression bulk-transcriptional-analyses
Last synced: 6 months ago · JSON representation

Repository

Infer cell types in scRNA-seq data using bulk RNA-seq or gene sets

Basic Info
Statistics
  • Stars: 124
  • Watchers: 9
  • Forks: 15
  • Open Issues: 0
  • Releases: 3
Topics
assign-identities clusters marker-genes rna-seq single-cell-rna-seq
Created over 7 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct

README.Rmd

---
output: github_document
---

```{r, echo = FALSE, message = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/",
    dpi = 300
)
```

```{r, echo=FALSE, message=FALSE}
st <- data.table::fread("https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab", data.table = FALSE, verbose = FALSE)
st_all <- dplyr::filter(st, Month == "all")
cl <- as.numeric(data.table::fread("https://raw.githubusercontent.com/raysinensis/clone_counts_public/main/clustifyr_total.txt", verbose = FALSE))
```

# clustifyr 


[![R-CMD-check-bioc](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml/badge.svg)](https://github.com/rnabioco/clustifyr/actions/workflows/check-bioc.yml)
[![Codecov test coverage](https://codecov.io/gh/rnabioco/clustifyr/branch/devel/graph/badge.svg)](https://app.codecov.io/gh/rnabioco/clustifyr?branch=devel)
[![platforms](https://bioconductor.org/shields/availability/release/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[![bioc](https://bioconductor.org/shields/years-in-bioc/clustifyr.svg)](https://bioconductor.org/packages/release/bioc/html/clustifyr.html)
[![#downloads](`r paste0("https://img.shields.io/badge/%23%20downloads-", sum(st_all[[4]]) + cl, "-brightgreen")`)](https://bioconductor.org/packages/stats/bioc/clustifyr/clustifyr_stats.tab)


clustifyr classifies cells and clusters in single-cell RNA sequencing experiments using reference bulk RNA-seq data sets, sorted microarray expression data, single-cell gene signatures, or lists of marker genes. 

## Installation

Install the Bioconductor version with:

``` r
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("clustifyr")
```

Install the development version with:

``` r
BiocManager::install("rnabioco/clustifyr")
```
 
## Example usage

In this example we use the following built-in input data:

- an expression matrix of single cell RNA-seq data (`pbmc_matrix_small`)
- a metadata data.frame (`pbmc_meta`), with cluster information stored (`"classified"`)
- a vector of variable genes (`pbmc_vargenes`)
- a matrix of mean normalized scRNA-seq UMI counts by cell type (`cbmc_ref`)

We then calculate correlation coefficients and plot them on a pre-calculated projection (stored in `pbmc_meta`).

```{r readme_example, warning=F, message=F}
library(clustifyr)

# calculate correlation
res <- clustify(
    input = pbmc_matrix_small,
    metadata = pbmc_meta$classified,
    ref_mat = cbmc_ref,
    query_genes = pbmc_vargenes
)

# print assignments
cor_to_call(res)

# plot assignments on a projection
plot_best_call(
    cor_mat = res,
    metadata = pbmc_meta,
    cluster_col = "classified"
)
```

`clustify()` can take a clustered `SingleCellExperiment` or `seurat` object (from v2 up to v5) and assign identities.

```{r example_seurat, warning=F, message=F}
# for SingleCellExperiment
sce_small <- sce_pbmc()
clustify(
    input = sce_small, # an SCE object
    ref_mat = cbmc_ref, # matrix of RNA-seq expression data for each cell type
    cluster_col = "cell_type", # name of column in meta.data containing cell clusters
    obj_out = TRUE # output SCE object with cell type inserted as "type" column
)

# for Seurat
library(Seurat)
s_small <- so_pbmc()
clustify(
    input = s_small,
    cluster_col = "RNA_snn_res.0.5",
    ref_mat = cbmc_ref,
    seurat_out = TRUE
)

# New output option, directly as a vector (in the order of the metadata), which can then be inserted into metadata dataframes and other workflows
clustify(
    input = s_small,
    cluster_col = "RNA_snn_res.0.5",
    ref_mat = cbmc_ref,
    vec_out = TRUE
)[1:10]
```

New reference matrix can be made directly from `SingleCellExperiment` and `Seurat` objects as well. Other scRNAseq experiment object types are supported as well.

```{r example_ref_matrix}
# make reference from SingleCellExperiment objects
sce_small <- sce_pbmc()
sce_ref <- object_ref(
    input = sce_small, # SCE object
    cluster_col = "cell_type" # name of column in colData containing cell identities
)

# make reference from seurat objects
s_small <- so_pbmc()
s_ref <- seurat_ref(
    seurat_object = s_small,
    cluster_col = "RNA_snn_res.0.5"
)

head(s_ref)
```

`clustify_lists()` handles identity assignment of matrix or `SingleCellExperiment` and `seurat` objects based on marker gene lists.
 
```{r example_seurat3, warning=F, message=F}
clustify_lists(
    input = pbmc_matrix_small,
    metadata = pbmc_meta,
    cluster_col = "classified",
    marker = pbmc_markers,
    marker_inmatrix = FALSE
)

clustify_lists(
    input = s_small,
    marker = pbmc_markers,
    marker_inmatrix = FALSE,
    cluster_col = "RNA_snn_res.0.5",
    seurat_out = TRUE
)
```

## Additional resources

* [Script](https://github.com/rnabioco/clustifyrdata/blob/master/inst/run_clustifyr.R) for benchmarking, compatible with [`scRNAseq_Benchmark`](https://github.com/tabdelaal/scRNAseq_Benchmark)

* Additional reference data (including tabula muris, immgen, etc) are available in a supplemental package [`clustifyrdatahub`](https://github.com/rnabioco/clustifyrdatahub). Also see [list](https://rnabioco.github.io/clustifyrdata/articles/download_refs.html) for individual downloads. 

* See the [FAQ](https://github.com/rnabioco/clustifyr/wiki/Frequently-asked-questions) for more details.

Owner

  • Name: RNA Bioscience Initiative (RBI)
  • Login: rnabioco
  • Kind: organization
  • Email: jay.hesselberth@cuanschutz.edu
  • Location: University of Colorado School of Medicine

The RNA Bioscience Initiative Informatics Fellows program

GitHub Events

Total
  • Watch event: 11
  • Delete event: 1
  • Push event: 7
  • Pull request event: 2
  • Fork event: 1
  • Create event: 1
Last Year
  • Watch event: 11
  • Delete event: 1
  • Push event: 7
  • Pull request event: 2
  • Fork event: 1
  • Create event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 727
  • Total Committers: 17
  • Avg Commits per committer: 42.765
  • Development Distribution Score (DDS): 0.541
Past Year
  • Commits: 10
  • Committers: 4
  • Avg Commits per committer: 2.5
  • Development Distribution Score (DDS): 0.7
Top Committers
Name Email Commits
raysinensis r****s@g****m 334
Kent Riemondy k****y@g****m 137
Jay Hesselberth j****h@g****m 88
agillen a****n@g****m 38
Yue Hao h****k@g****m 37
chti4479 c****n@c****u 29
RF rf@R****l 17
RF rf@c****u 16
Nitesh Turaga n****a@g****m 11
Michelle Daya m****a@g****l 5
Ryan Sheridan r****n@u****u 4
RF rf@r****u 2
J Wokaty j****y 2
J Wokaty j****y@s****u 2
Ben Busby D****s 2
Sidhant Puntambekar 3****r 2
Hervé Pagès h****b@g****m 1

Issues and Pull Requests

Last synced: 7 months ago

All Time
  • Total issues: 40
  • Total pull requests: 71
  • Average time to close issues: 10 months
  • Average time to close pull requests: 3 days
  • Total issue authors: 8
  • Total pull request authors: 5
  • Average comments per issue: 0.95
  • Average comments per pull request: 0.03
  • Merged pull requests: 68
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 2
  • Average time to close issues: 2 days
  • Average time to close pull requests: 24 minutes
  • Issue authors: 1
  • Pull request authors: 2
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kriemo (15)
  • raysinensis (10)
  • jayhesselberth (9)
  • agillen (4)
  • mdaya (1)
  • pauldeboissier (1)
  • saphir746 (1)
  • saketkc (1)
  • tianchengzhe (1)
Pull Request Authors
  • raysinensis (51)
  • agillen (10)
  • kriemo (7)
  • jayhesselberth (5)
  • YueYvetteHao (1)
Top Labels
Issue Labels
enhancement (5)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 15,417 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
bioconductor.org: clustifyr

Classifier for Single-cell RNA-seq Using Cell Clusters

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 15,417 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 21.8%
Downloads: 65.5%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.0 depends
  • Matrix * imports
  • S4Vectors * imports
  • SingleCellExperiment * imports
  • SummarizedExperiment * imports
  • cowplot * imports
  • dplyr * imports
  • entropy * imports
  • fgsea * imports
  • ggplot2 * imports
  • httr * imports
  • matrixStats * imports
  • methods * imports
  • proxy * imports
  • readr * imports
  • rlang * imports
  • scales * imports
  • stats * imports
  • stringr * imports
  • tibble * imports
  • tidyr * imports
  • utils * imports
  • BiocManager * suggests
  • BiocStyle * suggests
  • ComplexHeatmap * suggests
  • Seurat * suggests
  • covr * suggests
  • ggrepel * suggests
  • gprofiler2 * suggests
  • knitr * suggests
  • purrr * suggests
  • remotes * suggests
  • rmarkdown * suggests
  • shiny * suggests
  • testthat * suggests
.github/workflows/R-CMD-check.yaml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/check-bioc.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pkgdown.yaml actions
  • JamesIves/github-pages-deploy-action v4.4.1 composite
  • actions/checkout v3 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite
.github/workflows/pr-commands.yaml actions
  • actions/checkout v2 composite
  • r-lib/actions/pr-fetch master composite
  • r-lib/actions/pr-push master composite
  • r-lib/actions/setup-r master composite
.github/workflows/test-coverage.yaml actions
  • actions/checkout v3 composite
  • actions/upload-artifact v3 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite