Banksy

BANKSY: spatial clustering

https://github.com/prabhakarlab/banksy

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: springer.com, nature.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (16.1%) to scientific vocabulary

Keywords

clustering-algorithm single-cell-omics spatial-omics
Last synced: 6 months ago · JSON representation

Repository

BANKSY: spatial clustering

Basic Info
Statistics
  • Stars: 119
  • Watchers: 5
  • Forks: 18
  • Open Issues: 8
  • Releases: 8
Topics
clustering-algorithm single-cell-omics spatial-omics
Created over 4 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog License

README.Rmd

---
output: github_document
---



```{r, include = FALSE}
knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>",
    fig.path = "man/figures/README-",
    out.width = "100%", 
    dpi = 70
)
```

## Overview

```{r, eval=T, include=F}
start.time <- Sys.time()
```

BANKSY is a method for clustering spatial omics data by augmenting the
features of each cell with both an average of the features of its spatial 
neighbors along with neighborhood feature gradients. By incorporating 
neighborhood information for clustering, BANKSY is able to

- improve cell-type assignment in noisy data
- distinguish subtly different cell-types stratified by microenvironment
- identify spatial domains sharing the same microenvironment

BANKSY is applicable to a wide array of spatial technologies (e.g. 10x Visium, 
Slide-seq, MERFISH, CosMX, CODEX) and scales well to large datasets. For more 
details, check out:

- the [paper](https://www.nature.com/articles/s41588-024-01664-3),
- the [peer review file](https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-024-01664-3/MediaObjects/41588_2024_1664_MOESM3_ESM.pdf),
- a [tweetorial](https://x.com/shyam_lab/status/1762648072360792479?s=20) on BANKSY,
- a set of [vignettes](https://prabhakarlab.github.io/Banksy) showing basic 
  usage,
- usage compatibility with Seurat ([here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md) and [here](https://satijalab.org/seurat/articles/visiumhd_analysis_vignette#identifying-spatially-defined-tissue-domains)),
- a [Python version](https://github.com/prabhakarlab/Banksy_py) of this package,
- a [Zenodo archive](https://zenodo.org/records/10258795) containing scripts to 
  reproduce the analyses in the paper, and the corresponding
  [GitHub Pages](https://github.com/jleechung/banksy-zenodo) 
  (and [here](https://github.com/prabhakarlab/Banksy_py/tree/Banksy_manuscript) for analyses done in Python). 

## Installation

The *Banksy* package can be installed via Bioconductor. This currently requires
R `>= 4.4.0`. 

```{r, eval=F}
BiocManager::install('Banksy')
```

To install directly from GitHub instead, use

```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy")
```

To use the legacy version of *Banksy* utilising the `BanksyObject` class, use

```{r, eval=F}
remotes::install_github("prabhakarlab/Banksy@legacy")
```

*Banksy* is also interoperable with [*Seurat*](https://satijalab.org/seurat/) 
via [*SeuratWrappers*](https://github.com/satijalab/seurat-wrappers). 
Documentation on how to run BANKSY on Seurat objects can be found [here](https://github.com/satijalab/seurat-wrappers/blob/master/docs/banksy.md). 
For installation of *SeuratWrappers* with BANKSY version `>= 0.1.6`, run

```{r, eval=F}
remotes::install_github('satijalab/seurat-wrappers')
```

## Quick start

Load *BANKSY*. We'll also load *SpatialExperiment* and *SummarizedExperiment* 
for containing and manipulating the data, *scuttle* for normalization 
and quality control, and *scater*, *ggplot2* and *cowplot* for visualisation.

```{r, eval=T, warning=F, message=F}
library(Banksy)

library(SummarizedExperiment)
library(SpatialExperiment)
library(scuttle)

library(scater)
library(cowplot)
library(ggplot2)
```

Here, we'll run *BANKSY* on mouse hippocampus data. 

```{r, eval=T}
data(hippocampus)
gcm <- hippocampus$expression
locs <- as.matrix(hippocampus$locations)
```

Initialize a SpatialExperiment object and perform basic quality control and 
normalization. 

```{r, eval=T, message=F}
se <- SpatialExperiment(assay = list(counts = gcm), spatialCoords = locs)

# QC based on total counts
qcstats <- perCellQCMetrics(se)
thres <- quantile(qcstats$total, c(0.05, 0.98))
keep <- (qcstats$total > thres[1]) & (qcstats$total < thres[2])
se <- se[, keep]

# Normalization to mean library size
se <- computeLibraryFactors(se)
aname <- "normcounts"
assay(se, aname) <- normalizeCounts(se, log = FALSE)
```

Compute the neighborhood matrices for *BANKSY*. Setting `compute_agf=TRUE` 
computes both the weighted neighborhood mean ($\mathcal{M}$) and the azimuthal 
Gabor filter ($\mathcal{G}$). The number of spatial neighbors used to compute 
$\mathcal{M}$ and $\mathcal{G}$ are `k_geom[1]=15` and `k_geom[2]=30` 
respectively. We run *BANKSY* at `lambda=0` corresponding to non-spatial 
clustering, and `lambda=0.2` corresponding to  *BANKSY* for cell-typing.

> **An important note about choosing the `lambda` parameter for the older [Visium v1 / v2 55um datasets](https://doi.org/10.1038/s41593-020-00787-0) or the original [ST 100um technology](https://doi.org/10.1038/s41596-018-0045-2):**
>
> For most modern high resolution technologies like Xenium, Visium HD, StereoSeq, MERFISH, STARmap PLUS, SeqFISH+, SlideSeq v2, and CosMx (and others), we recommend the usual defults for `lambda`: For cell typing, use `lambda = 0.2` (as shown below, or in [this vignette](https://prabhakarlab.github.io/Banksy/articles/parameter-selection.html)) and for [domain segmentation](https://prabhakarlab.github.io/Banksy/articles/domain-segment.html), use `lambda = 0.8`. These technologies are either imaging based, having true single-cell resolution (e.g., MERFISH), or are sequencing based, having barcoded spots on the scale of single-cells (e.g., [Visium HD](https://www.10xgenomics.com/products/visium-hd-spatial-gene-expression)). We find that the usual defaults work well at this measurement resolution. 
>
> However, for the older **Visium v1/v2** or **ST** technologies, with their much lower resolution spots (55um and 100um diameter, respectively), we find that `lambda = 0.2` seems to work best for domain segmentation. This could be because each spot already measures the average transcriptome of several cells in a neighbourhood. It seems that `lambda = 0.2` shares enough information between these neighbourhoods to lead to good domain segmentation performance. For example, in the [human DLPFC vignette](https://prabhakarlab.github.io/Banksy/articles/multi-sample.html), we use `lambda = 0.2` on a Visium v1/v2 dataset. Also note that in these lower resolution technologies, each spot can have multiple cells of different types, and as such _cell-typing_ is not defined for them. 


```{r, eval=T}
lambda <- c(0, 0.2)
k_geom <- c(15, 30)

se <- Banksy::computeBanksy(se, assay_name = aname, compute_agf = TRUE, k_geom = k_geom)
```

Next, run PCA on the BANKSY matrix and perform clustering. Setting 
`use_agf=TRUE` uses both $\mathcal{M}$ and $\mathcal{G}$ to construct the 
BANKSY matrix.

```{r, eval=T}
set.seed(1000)
se <- Banksy::runBanksyPCA(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::runBanksyUMAP(se, use_agf = TRUE, lambda = lambda)
se <- Banksy::clusterBanksy(se, use_agf = TRUE, lambda = lambda, resolution = 1.2)
```

Different clustering runs can be relabeled to minimise their differences with 
`connectClusters`:

```{r, eval=T}
se <- Banksy::connectClusters(se)
```

Visualise the clustering output for non-spatial clustering (`lambda=0`) and
BANKSY clustering (`lambda=0.2`).

```{r, eval=T, fig.height=5, fig.width=14}
cnames <- colnames(colData(se))
cnames <- cnames[grep("^clust", cnames)]
colData(se) <- cbind(colData(se), spatialCoords(se))

plot_nsp <- plotColData(se,
    x = "sdimx", y = "sdimy",
    point_size = 0.6, colour_by = cnames[1]
)
plot_bank <- plotColData(se,
    x = "sdimx", y = "sdimy",
    point_size = 0.6, colour_by = cnames[2]
)


plot_grid(plot_nsp + coord_equal(), plot_bank + coord_equal(), ncol = 2)
```

For clarity, we can visualise each of the clusters separately:

```{r, eval=T, fig.height=8, fig.width=18}
plot_grid(
    plot_nsp + facet_wrap(~colour_by),
    plot_bank + facet_wrap(~colour_by),
    ncol = 2
)
```

Visualize UMAPs of the non-spatial and BANKSY embedding:

```{r, eval=T, fig.height=5, fig.width=14}
rdnames <- reducedDimNames(se)

umap_nsp <- plotReducedDim(se,
    dimred = grep("UMAP.*lam0$", rdnames, value = TRUE),
    colour_by = cnames[1]
)
umap_bank <- plotReducedDim(se,
    dimred = grep("UMAP.*lam0.2$", rdnames, value = TRUE),
    colour_by = cnames[2]
)
plot_grid(
    umap_nsp,
    umap_bank,
    ncol = 2
)
```

Runtime for analysis ```{r, eval=T, echo=FALSE} Sys.time() - start.time ```
Session information ```{r, sess} sessionInfo() ```

Owner

  • Login: prabhakarlab
  • Kind: user

GitHub Events

Total
  • Create event: 5
  • Release event: 2
  • Issues event: 17
  • Watch event: 44
  • Delete event: 3
  • Issue comment event: 28
  • Push event: 5
  • Pull request event: 1
  • Fork event: 7
Last Year
  • Create event: 5
  • Release event: 2
  • Issues event: 17
  • Watch event: 44
  • Delete event: 3
  • Issue comment event: 28
  • Push event: 5
  • Pull request event: 1
  • Fork event: 7

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 50
  • Total pull requests: 7
  • Average time to close issues: 16 days
  • Average time to close pull requests: 27 days
  • Total issue authors: 39
  • Total pull request authors: 2
  • Average comments per issue: 2.14
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 18
  • Pull requests: 2
  • Average time to close issues: 11 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 17
  • Pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • hzongyao (4)
  • alirezaah98 (3)
  • huoyuying (2)
  • cindyfang70 (2)
  • yuanjiao123 (2)
  • Alwash-317 (2)
  • retogerber (1)
  • TdzBAS (1)
  • cabecunas (1)
  • salasd (1)
  • rstagnit (1)
  • dedebiao (1)
  • WangGuixiangCoder (1)
  • williamsdrake (1)
  • luke-mcn (1)
Pull Request Authors
  • jleechung (5)
  • vipulsinghal02 (2)
Top Labels
Issue Labels
question (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 8,943 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 3
  • Total maintainers: 1
bioconductor.org: Banksy

Spatial transcriptomic clustering

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 8,943 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 31.3%
Average: 42.2%
Downloads: 95.4%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 3.5.0 depends
  • ComplexHeatmap * imports
  • RcppHungarian * imports
  • SummarizedExperiment * imports
  • circlize * imports
  • data.table * imports
  • dbscan * imports
  • ggalluvial * imports
  • ggplot2 * imports
  • grid * imports
  • gridExtra * imports
  • igraph * imports
  • irlba * imports
  • leidenAlg * imports
  • matrixStats * imports
  • mclust * imports
  • methods * imports
  • pals * imports
  • plyr * imports
  • reshape2 * imports
  • rlang * imports
  • stats * imports
  • utils * imports
  • uwot * imports
  • SingleCellExperiment * suggests
  • covr * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • scater * suggests
  • scran * suggests
  • testthat >= 3.0.0 suggests
.github/workflows/check-standard.yml actions
  • actions/checkout v3 composite
  • r-lib/actions/check-r-package v2 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite
  • r-lib/actions/setup-r-dependencies v2 composite