SpotClean

R package for decontaminating the spot swapping effect and recovering true expression in spatial transcriptomics data

https://github.com/zijianni/spotclean

Science Score: 46.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    2 of 5 committers (40.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (7.4%) to scientific vocabulary

Keywords

rna-seq spatial-transcriptomics
Last synced: 6 months ago · JSON representation

Repository

R package for decontaminating the spot swapping effect and recovering true expression in spatial transcriptomics data

Basic Info
  • Host: GitHub
  • Owner: zijianni
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 4.01 MB
Statistics
  • Stars: 32
  • Watchers: 2
  • Forks: 10
  • Open Issues: 11
  • Releases: 1
Topics
rna-seq spatial-transcriptomics
Created almost 5 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

SpotClean: a computational method to adjust for spot swapping in spatial transcriptomics data

SpotClean is a computational method to adjust for spot swapping in spatial transcriptomics data. Recent spatial transcriptomics experiments utilize slides containing thousands of spots with spot-specific barcodes that bind mRNA. Ideally, unique molecular identifiers at a spot measure spot-specific expression, but this is often not the case due to bleed from nearby spots, an artifact we refer to as spot swapping. SpotClean is able to estimate the contamination rate in observed data and decontaminate the spot swapping effect, thus increase the sensitivity and precision of downstream analyses.

Introduction

Spatial transcriptomics (ST), named Method of the Year 2020 by Nature Methods in 2020, is a powerful and widely-used experimental method for profiling genome-wide gene expression across a tissue. In a typical ST experiment, fresh-frozen (or FFPE) tissue is sectioned and placed onto a slide containing spots, with each spot containing millions of capture oligonucleotides with spatial barcodes unique to that spot. The tissue is imaged, typically via Hematoxylin and Eosin (H&E) staining. Following imaging, the tissue is permeabilized to release mRNA which then binds to the capture oligonucleotides, generating a cDNA library consisting of transcripts bound by barcodes that preserve spatial information. Data from an ST experiment consists of the tissue image coupled with RNA-sequencing data collected from each spot. A first step in processing ST data is tissue detection, where spots on the slide containing tissue are distinguished from background spots without tissue. Unique molecular identifier (UMI) counts at each spot containing tissue are then used in downstream analyses.

Ideally, a gene-specific UMI at a given spot would represent expression of that gene at that spot, and spots without tissue would show no (or few) UMIs. This is not the case in practice. Messenger RNA bleed from nearby spots causes substantial contamination of UMI counts, an artifact we refer to as spot swapping. On average, we observe that more than 30% of UMIs at a tissue spot did not originate from this spot, but from other spots contaminating it. Spot swapping confounds downstream inferences including normalization, marker gene-based annotation, differential expression and cell type decomposition.

We developed SpotClean to adjust for the effects of spot swapping in ST experiments. SpotClean is able to measure the per-spot contamination rates in observed data and decontaminate gene expression levels, thus increases the sensitivity and precision of downstream analyses. Our package SpotClean is built based on 10x Visium spatial transcriptomics experiments, currently the most widely-used commercial protocol, providing functions to load raw spatial transcriptomics data from 10x Space Ranger outputs, decontaminate the spot swapping effect, estimate contamination levels, visualize expression profiles and spot labels on the slide, and connect with other widely-used packages for further analyses. SpotClean can be potentially extended to other spatial transcriptomics data as long as the gene expression data in both tissue and background regions are available.

Installation

Install the GitHub version:

```{r} if(!requireNamespace("devtools", quietly = TRUE)) install.packages("devtools")

devtools::installgithub("zijianni/SpotClean", buildmanual = TRUE, build_vignettes = TRUE)

```

Install the Bioconductor version:

```{r} if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install("SpotClean")

```

Load package after installation:

{r} library(SpotClean)

Tutorial

After installing the package, access the vignette by running

{r} vignette("SpotClean")

Citation

We appreciate it if you could cite our work when using SpotClean:

Ni, Z., Prasad, A., Chen, S. et al. SpotClean adjusts for spot swapping in spatial transcriptomics data. Nat Commun 13, 2971 (2022). https://doi.org/10.1038/s41467-022-30587-y

A BibTeX entry for LaTeX users can be found by running

{r} citation("SpotClean")

Owner

  • Name: Zijian Ni
  • Login: zijianni
  • Kind: user
  • Location: Seattle, WA
  • Company: Amazon

Applied Scientist @ Amazon Prime ML

GitHub Events

Total
  • Issues event: 5
  • Watch event: 10
  • Issue comment event: 2
Last Year
  • Issues event: 5
  • Watch event: 10
  • Issue comment event: 2

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 104
  • Total Committers: 5
  • Avg Commits per committer: 20.8
  • Development Distribution Score (DDS): 0.452
Past Year
  • Commits: 5
  • Committers: 2
  • Avg Commits per committer: 2.5
  • Development Distribution Score (DDS): 0.4
Top Committers
Name Email Commits
Zijian Ni 4****0@q****m 57
Zijian Ni z****5@w****u 43
J Wokaty j****y@s****u 2
Hannah A Pliner h****r@g****m 1
Zijian Ni 4****i 1
Committer Domains (Top 20 + Academic)

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 7,824 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 7
  • Total maintainers: 1
bioconductor.org: SpotClean

SpotClean adjusts for spot swapping in spatial transcriptomics data

  • Versions: 7
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 7,824 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Forks count: 8.4%
Stargazers count: 12.6%
Average: 22.2%
Downloads: 89.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.2.0 depends
  • Matrix * imports
  • RColorBrewer * imports
  • S4Vectors * imports
  • Seurat * imports
  • SpatialExperiment * imports
  • SummarizedExperiment * imports
  • dplyr * imports
  • ggplot2 * imports
  • grDevices * imports
  • grid * imports
  • methods * imports
  • readbitmap * imports
  • rhdf5 * imports
  • rjson * imports
  • rlang * imports
  • stats * imports
  • tibble * imports
  • utils * imports
  • viridis * imports
  • BiocStyle * suggests
  • R.utils * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • spelling * suggests
  • testthat >= 2.1.0 suggests