nnSVG

nnSVG: scalable method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data

https://github.com/lmweber/nnsvg

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: wiley.com, nature.com
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.0%) to scientific vocabulary

Keywords from Contributors

bioconductor-package grna-sequence gene ontology sequencing genomics immune-repertoire proteomics dna-methylation epiallele
Last synced: 6 months ago · JSON representation

Repository

nnSVG: scalable method to identify spatially variable genes (SVGs) in spatially-resolved transcriptomics data

Basic Info
  • Host: GitHub
  • Owner: lmweber
  • License: mit
  • Language: R
  • Default Branch: devel
  • Homepage:
  • Size: 191 KB
Statistics
  • Stars: 21
  • Watchers: 4
  • Forks: 10
  • Open Issues: 4
  • Releases: 0
Created over 4 years ago · Last pushed 10 months ago
Metadata Files
Readme License

README.md

nnSVG

R build status

Overview

nnSVG is a method for scalable identification of spatially variable genes (SVGs) in spatially-resolved transcriptomics data.

The nnSVG method is based on nearest-neighbor Gaussian processes (Datta et al., 2016, Finley et al., 2019) and uses the BRISC algorithm (Saha and Datta, 2018) for model fitting and parameter estimation. nnSVG allows identification and ranking of SVGs with flexible length scales across a tissue slide or within spatial domains defined by covariates. The method scales linearly with the number of spatial locations and can be applied to datasets containing thousands or more spatial locations.

nnSVG is implemented as an R package within the Bioconductor framework, and is available from Bioconductor.

Our paper describing the method is available from Nature Communications.

Installation

The package can be installed from Bioconductor as follows, using R version 4.2 or above:

r install.packages("BiocManager") BiocManager::install("nnSVG")

Alternatively, the latest development version of the package can also be installed from GitHub:

r remotes::install_github("lmweber/nnSVG")

If you are installing from GitHub, the following dependency packages may need to be installed manually from Bioconductor and CRAN (these are installed automatically if you install from Bioconductor instead):

r install.packages("BiocManager") BiocManager::install("SpatialExperiment") BiocManager::install("STexampleData") install.packages("BRISC")

Tutorial

A detailed tutorial is available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here.

Input data format

In the examples below, we assume the input data are provided as a SpatialExperiment Bioconductor object. In this case, the outputs are stored in the rowData of the SpatialExperiment object.

Alternatively, the inputs can also be provided as a numeric matrix of normalized and transformed counts (e.g. log-transformed normalized counts, also known as logcounts) and a numeric matrix of spatial coordinates.

Example workflow

A short example workflow is shown below. This is a modified version of the full tutorial available in the package vignette from Bioconductor. A direct link to the tutorial / package vignette is available here).

Load packages

r library(nnSVG) library(STexampleData) library(scran) library(ggplot2)

Load example dataset

```r

load example dataset from STexampleData package

spe <- Visium_humanDLPFC() dim(spe) ```

```r

[1] 33538 4992

```

Preprocessing

```r

keep spots over tissue

spe <- spe[, colData(spe)$in_tissue == 1] dim(spe) ```

```r

[1] 33538 3639

```

```r

spot-level quality control: already performed on this example dataset

```

```r

filter low-expressed and mitochondrial genes

using function from nnSVG package with default filtering parameters

spe <- filter_genes(spe) ```

```r

Gene filtering: removing mitochondrial genes

removed 13 mitochondrial genes

Gene filtering: retaining genes with at least 3 counts in at least 0.5% (n = 19) of spatial locations

removed 30216 out of 33525 genes due to low expression

```

```r

calculate logcounts (log-transformed normalized counts) using scran package

using library size factors

spe <- computeLibraryFactors(spe) spe <- logNormCounts(spe) assayNames(spe) ```

```r

[1] "counts" "logcounts"

```

Subset data for this example

```r

select small set of random genes and several known SVGs for faster runtime in this example workflow

set.seed(123) ixrandom <- sample(seqlen(nrow(spe)), 10) knowngenes <- c("MOBP", "PCP4", "SNAP25", "HBB", "IGKC", "NPY") ixknown <- which(rowData(spe)$genename %in% knowngenes) ix <- c(ixknown, ixrandom)

spe <- spe[ix, ] dim(spe) ```

```r

[1] 16 3639

```

Run nnSVG

```r

set seed for reproducibility

run nnSVG using a single thread for this example workflow

set.seed(123) spe <- nnSVG(spe, n_threads = 1)

show results

rowData(spe) ```

```r

DataFrame with 16 rows and 17 columns

[...]

```

Investigate results

The results are stored in the rowData of the SpatialExperiment object.

The main results of interest are:

  • LR_stat: likelihood ratio (LR) statistics used to rank SVGs
  • rank: rank of top SVGs according to LR statistics
  • pval: approximate p-values
  • padj: approximate p-values adjusted for multiple testing
  • prop_sv: effect size defined as proportion of spatial variance

```r

number of significant SVGs

table(rowData(spe)$padj <= 0.05) ```

```r

FALSE TRUE

7 9

```

```r

show results for top n SVGs

n <- 10 rowData(spe)[order(rowData(spe)$rank)[1:n], ] ```

```r

DataFrame with 10 rows and 17 columns

geneid genename feature_type sigma.sq tau.sq

ENSG00000168314 ENSG00000168314 MOBP Gene Expression 1.38739383 0.364188

ENSG00000132639 ENSG00000132639 SNAP25 Gene Expression 0.43003959 0.430106

ENSG00000211592 ENSG00000211592 IGKC Gene Expression 0.56564845 0.455042

ENSG00000244734 ENSG00000244734 HBB Gene Expression 0.32942113 0.353754

ENSG00000183036 ENSG00000183036 PCP4 Gene Expression 0.23102220 0.452735

ENSG00000122585 ENSG00000122585 NPY Gene Expression 0.28567359 0.280173

ENSG00000129562 ENSG00000129562 DAD1 Gene Expression 0.02389607 0.464723

ENSG00000114923 ENSG00000114923 SLC4A3 Gene Expression 0.01147170 0.237260

ENSG00000133606 ENSG00000133606 MKRN1 Gene Expression 0.00632248 0.272432

ENSG00000143543 ENSG00000143543 JTB Gene Expression 0.07541566 0.463623

phi loglik runtime mean var spcov

ENSG00000168314 1.102018 -3663.60 0.631 0.805525 1.205673 1.462248

ENSG00000132639 3.033847 -3912.70 0.450 3.451926 0.857922 0.189973

ENSG00000211592 20.107022 -4531.64 1.054 0.622937 1.007454 1.207340

ENSG00000244734 27.814098 -4044.96 1.559 0.411262 0.697673 1.395587

ENSG00000183036 8.272278 -4026.22 0.419 0.687961 0.684598 0.698656

ENSG00000122585 71.653290 -3995.23 0.843 0.393975 0.567383 1.356646

ENSG00000129562 10.141894 -3842.24 0.590 0.549318 0.489167 0.281410

ENSG00000114923 12.765645 -2617.36 0.658 0.250768 0.248816 0.427112

ENSG00000133606 0.082764 -2831.51 0.612 0.295404 0.278806 0.269171

ENSG00000143543 119.721419 -4036.28 0.731 0.654919 0.539172 0.419318

propsv logliklm LR_stat rank pval padj

ENSG00000168314 0.7920804 -5503.33 3679.46397 1 0.00000e+00 0.00000e+00

ENSG00000132639 0.4999614 -4884.19 1942.98556 2 0.00000e+00 0.00000e+00

ENSG00000211592 0.5541822 -5176.53 1289.77508 3 0.00000e+00 0.00000e+00

ENSG00000244734 0.4821910 -4507.99 926.04573 4 0.00000e+00 0.00000e+00

ENSG00000183036 0.3378716 -4473.57 894.68884 5 0.00000e+00 0.00000e+00

ENSG00000122585 0.5048609 -4131.87 273.27818 6 0.00000e+00 0.00000e+00

ENSG00000129562 0.0489053 -3861.98 39.49098 7 2.65854e-09 6.07667e-09

ENSG00000114923 0.0461207 -2632.02 29.31376 8 4.31119e-07 8.62238e-07

ENSG00000133606 0.0226812 -2839.08 15.15227 9 5.12539e-04 9.11181e-04

ENSG00000143543 0.1399077 -4039.07 5.59664 10 6.09124e-02 9.74599e-02

```

Plot expression of top SVG

Plot expression of the top-ranked SVG.

```r

plot spatial expression of top-ranked SVG

ix <- which(rowData(spe)$rank == 1) ixname <- rowData(spe)$genename[ix] ix_name ```

```r

[1] "MOBP"

```

```r df <- as.data.frame(cbind(spatialCoords(spe), expr = counts(spe)[ix, ]))

ggplot(df, aes(x = pxlcolinfullres, y = pxlrowinfullres, color = expr)) + geompoint(size = 0.8) + coordfixed() + scaleyreverse() + scalecolorgradient(low = "gray90", high = "blue", trans = "sqrt", breaks = range(df$expr), name = "counts") + ggtitle(ixname) + themebw() + theme(plot.title = elementtext(face = "italic"), panel.grid = elementblank(), axis.title = elementblank(), axis.text = elementblank(), axis.ticks = element_blank()) ```

Spatial expression plot of top-ranked SVG

Citation

Our paper describing nnSVG is available from Nature Communications:

Owner

  • Name: Lukas Weber
  • Login: lmweber
  • Kind: user
  • Location: Boston, MA
  • Company: Boston University

Assistant Professor, Department of Biostatistics, Boston University

GitHub Events

Total
  • Issues event: 4
  • Watch event: 10
  • Issue comment event: 7
  • Push event: 10
  • Fork event: 2
  • Create event: 4
Last Year
  • Issues event: 4
  • Watch event: 10
  • Issue comment event: 7
  • Push event: 10
  • Fork event: 2
  • Create event: 4

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 268
  • Total Committers: 4
  • Avg Commits per committer: 67.0
  • Development Distribution Score (DDS): 0.022
Past Year
  • Commits: 64
  • Committers: 3
  • Avg Commits per committer: 21.333
  • Development Distribution Score (DDS): 0.063
Top Committers
Name Email Commits
Lukas Weber l****u@g****m 262
J Wokaty j****y@s****u 2
Nitesh Turaga n****a@g****m 2
J Wokaty j****y 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 1
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 2 months
  • Total issue authors: 11
  • Total pull request authors: 1
  • Average comments per issue: 1.87
  • Average comments per pull request: 3.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 0
  • Average time to close issues: 14 days
  • Average time to close pull requests: N/A
  • Issue authors: 3
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lmweber (13)
  • EmilyKate416 (1)
  • zhengxiaoUVic (1)
  • ywzhang071394 (1)
  • rocketeer1998 (1)
  • boyiguo1 (1)
  • haotian-zhuang (1)
  • ansonkn (1)
  • ruqianl (1)
  • const-ae (1)
  • davidecrs (1)
  • juexinwang (1)
Pull Request Authors
  • LylaAtta123 (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 9,570 total
  • Total dependent packages: 2
  • Total dependent repositories: 0
  • Total versions: 9
  • Total maintainers: 1
bioconductor.org: nnSVG

Scalable identification of spatially variable genes in spatially-resolved transcriptomics data

  • Versions: 9
  • Dependent Packages: 2
  • Dependent Repositories: 0
  • Downloads: 9,570 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 29.9%
Downloads: 89.6%
Maintainers (1)
Last synced: 6 months ago

Dependencies

DESCRIPTION cran
  • R >= 4.2 depends
  • BRISC * imports
  • BiocParallel * imports
  • Matrix * imports
  • SingleCellExperiment * imports
  • SpatialExperiment * imports
  • SummarizedExperiment * imports
  • matrixStats * imports
  • methods * imports
  • stats * imports
  • BiocStyle * suggests
  • STexampleData * suggests
  • ggplot2 * suggests
  • knitr * suggests
  • rmarkdown * suggests
  • scran * suggests
  • testthat * suggests
.github/workflows/check-bioc.yml actions
  • JamesIves/github-pages-deploy-action releases/v4 composite
  • actions/cache v3 composite
  • actions/checkout v3 composite
  • actions/upload-artifact master composite
  • docker/build-push-action v1 composite
  • r-lib/actions/setup-pandoc v2 composite
  • r-lib/actions/setup-r v2 composite