https://github.com/cbib/deconvolista

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

Basic Info

Host: GitHub
Owner: cbib
Language: R
Default Branch: main
Size: 150 MB

Statistics

Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Releases: 0

Created about 2 years ago · Last pushed over 1 year ago

Metadata Files

Readme

DeconvolisSTa Deconvolution of Spatial Transcriptomics dAta

Introduction

This document provides documentation for the Snakemake version of Spotless, a spatial deconvolution pipeline. The pipeline was developed during my internship at CBIB. Deconvolution can be performed using one or more of the following methods: Cell2location, RCTD, NNLS, SpatialDWLS, DDLS, Seurat, MusiC, and Dirichlet (random).

Required environment installation

Here are needed utilities softwares to run the pipeline.

Snakemake

The pipeline is implemented in Snakemake which can be installed via the tutorial available in this link Setting up Conda and Snakemake.

Singularity

Almost all available methods for deconvolution are containerized and run in docker images. However, Snakemake is compatible only with Singularity. Snakemake converts the docker image to a singularity image '.sif' before using it. To install Singularity, this tutorial from official documentation is useful.

System requirements

The CPU used to execute the pipeline is an Intel(R) Xeon(R) CPU E5-1607 0 @ 3.00GHz model with 4 nodes and 40 Gi RAM. And the GPU used is NVIDIA Quadro P6000 24GB PCIe 3.0 with Cuda driver version 535.171.04 and Cuda 12.2.

With a single cell file of 3.3 Gi, this machine couldn't run the pipeline because of an out of memory. So, think about optimizing single cell data file size or dividing it into chunks as we did for ours.

Here are some execution times for synthetic data.

| | 100 spots | 1000 spots | 10000 spots | |-|-----------|------------|-------------| | Cell2location | 2h 16m | 3h 55m | 3h 37m | | RCTD | 7m | 41m | 1h |

Pipeline running

Here is how to run the pipeline.

Running parameters

The pipeline inputs are as following. First, sc_input the singleCell reference file and sp_input the spatial transcriptomic file. The two files should be in RDS formats. The singleCell file should have a column for annotation in its meta.data attribute. These annotations will be used as cell types for the deconvolution. Its is recommended to verify existence of this column before running the pipeline. If the single cell file is too big (> 3 Go), it is recommended to reduce its size be removing some unnecessary data for deconvolution, for example deleting unused annotation_level in case of multiple annotation_levels. Or, chunking the file into equal cell proportions subsamples.

Spatial and single cell files should have the same gene barcoding type, if it is not the case, a mapping should be done between gene names for the two files to homogenize gene names.

Here's an example of the command line to run the pipeline:

bash snakemake -s main.smk -c8 --config \ mode="run_dataset" methods=cell2location,rctd \ sc_input="test_sc_data.rds" sp_input="test_sp_data.rds" \ output="res" use_gpu="true" skip_metrics="true" \ annot="subclass" map_genes="false" load_model="true" \ model_path="mod" --use-singularity \ --singularity-args '\--nv'

The number of cores to use for Snakemake is specified with -c8, the parameters for Snakemake are specified in a dictionary named config.
methods is the list of methods to run, pay attention to separate them with a comma without spaces.
output is the directory where output files will be.
use_gpu is a parameter to use a GPU running. By default, the CPU is used.
if skipmetrics is set to true, benchmarking metrics are not performed, for instance correlation, RMSE , accuracy, balanced accuracy, sensitivity. If you do want to measure them you should have a data frame of deconvolution ground truth in the $relativespotcomposition attribute in the spatial data input spinput. The default value of this parameter is false.
annot is the name of annotation column in the single cell input metadata. Pay attention to verify that it is a valid column name before running. In addition, this attribute should be a char vector not a factor. The default value of this parameter is subclass.
map_genes should be set to true if the gene names in the single cell and spatial data inputs are not in the same gene symbols format. The default value of this parameter is false.
---use-singularity and ---singularity-args '---nv' to enable singularity use and GPU access for Snakemake.
When loadmodel is true, the cell2location model doesn't do the build stage in the pipeline, instead it is loaded from modelpath. When having multiple spatial samples associated with the same single cell reference dataset, this feature allows to do the build of cell2location model once and do the predictions for all spatial samples without rebuilding the model each time.

Synthetic data generation with Synthspot

With the pipeline, you can generate synthetic spatial data. Taking a single cell data input, Synthspot generate different synthetic spatial profiles. The generated datasets types include a variety of synthetic and real-world data configurations designed to capture different spatial and cell type distributions.

The listing below shows an example of line command to generate an artificial dataset from golden standard dataset provided by spotless.

```bash snakemake -s main.smk -c12 --config mode="generatedata" \ scinput="standards/reference/goldstandard1.rds" \ datasettype="aud" reps=1 output="syntheticdatasm" \ regionvar="celltype_coarse" --use-singularity

``` 1. scinput is the single cell input file used to generate synthetic data. 2. datasettype is the dataset profile to generate. This parameter is a one or a comma separated list of types mapped in the listing below. 3. rep is the number of replicates generated for the generated dataset types selected. The output will be $N{types}\times rep$ files indexed with suffix {type}{rep}.rds. 4. output is the output directory for generated files. 5. regionvar column with regional metadata in scinput@meta.data, if any (for "real" dataset types).

Spatial Transcriptomics Visualizations

The pipeline include an interactive visualization tool for deconvolution results. It is an independent part of the pipeline that shows deconvolution results with proportions for each spot displayed with the actual Visium spatial image, and different deconvolution methods results can be visualized. In addition, it displays spots with clustering. The tool include also raw visualized data. A demo of this tool can be seen here

bash snakemake -s main.smk -c8 --config mode="generate_vis" \ sp_input="UKF243_T_ST_1_raw.rds" output="vis_output" \ norm_weights_filepaths="props_rctd.tsv,props_cell2location.tsv" \ st_coords_filepath="tissue_positions_list_243.csv" \ data_clustered="seurat_metadata.csv" image_path="tissue_hires.png" \ scale_factor='0.24414062' deconv_methods=rctd,cell2location 1. spinput is the spatial data file used in deconvolution, it is used only to name the output HTML file. 2. output is the output directory. 3. normweightsfilepaths is a list of deconvolution results files when using multiple deconvolution methods. The filenames should be comma separated without spaces in between. In addition, files should be tabulation separated values (TSV) files. 4. stcoordsfilepath is a CSV file having the spots as index and their correspondent pixel positions in the Visium image, in pxlcolinfullres and pxlrowinfullres columns. The file should have six columns as shown in the example below. The preprocessing of data add columns names intissue, arrayrow, arraycol, pxlrowinfullres, pxlcolinfullres.

| Spot | intissue | arrayrow | arraycol | pxlrowinfullres | pxlcolin_fullres | |------|-----------|-----------|-----------|---------------------|---------------------| | ACGCCTGACACGCGCT-1 | 0 | 0 | 0 | 721 | 1375 | | TACCGATCCAACACTT-1 | 0 | 1 | 1 | 796 | 1418 | | ATTAAAGCGGACGAGC-1 | 0 | 0 | 2 | 721 | 1461 |

data_clustered is another CSV file which associate every spot to its cluster using a column named BayesSpace.
image_path is he path to Visium image of the sample.
scale_factor is the scaling factor to use in order to match pixels coordinates with the Visium image.
deconvmethods is a list of deconvolution methods used in the normweights_filepaths files.

Owner

Name: Centre de Bioinformatique de Bordeaux
Login: cbib
Kind: organization
Location: Université de Bordeaux (146, rue Léo Saignat 33076 Bordeaux cedex)

Website: https://www.cbib.u-bordeaux.fr/
Repositories: 15
Profile: https://github.com/cbib

GitHub Events

Total

Member event: 1
Push event: 2

Last Year

Member event: 1
Push event: 2

Dependencies

.github/workflows/test_generate_dataset.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/upload-artifact v3 composite

.github/workflows/test_run_dataset_R.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/upload-artifact v3 composite

.github/workflows/test_run_dataset_python.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/upload-artifact v3 composite

.github/workflows/test_run_dataset_spotlight.yml actions

actions/checkout v3 composite
actions/download-artifact v3 composite
actions/upload-artifact v3 composite

subworkflows/deconvolution/cell2location/Dockerfile docker

nvidia/cuda 10.2-cudnn7-devel-ubuntu18.04 build

subworkflows/deconvolution/destvi/Dockerfile docker

nvidia/cuda 10.2-base-ubuntu18.04 build

subworkflows/deconvolution/dstg/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/music/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/nnls/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/rctd/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/spatialdwls/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/spotlight/Dockerfile docker

csangara/seurat 4.1.0 build

subworkflows/deconvolution/stereoscope/Dockerfile docker

nvidia/cuda 10.2-base-ubuntu18.04 build

subworkflows/deconvolution/stride/Dockerfile docker

ubuntu 18.04 build

subworkflows/deconvolution/tangram/Dockerfile docker

nvidia/cuda 10.2-base-ubuntu18.04 build

subworkflows/evaluation/Dockerfile docker

rocker/tidyverse 3.6.3 build

subworkflows/deconvolution/spatialdwls/requirements.txt pypi

leidenalg *
networkx <=2.8
pandas <=1.4.2
python-igraph <=0.9.10
python-louvain <=0.16
scikit-learn <=1.0.2

subworkflows/deconvolution/ddls/Dockerfile docker

nvidia/cuda 12.2.0-base-ubuntu22.04 build

subworkflows/deconvolution/stereoscope/environment.yml conda

anndata <=0.7.8
jupyter <=1.0.0
libgcc <=7.2.0
matplotlib <=3.5.1
numba <=0.54.1
numpy <=1.20.3
pandas <=1.3.5
pillow <=8.4.0
pip
python 3.7.*
scanpy <=1.7.2
scikit-learn <=1.0.1
scipy <=1.7.3
umap-learn <=0.5.2

subworkflows/deconvolution/tangram/environment.yml conda

jupyterlab 2.2.6.*
matplotlib-base 3.3.1.*
nb_conda 2.2.1.*
pip 20.2.2.*
python >=3.8.5
pytorch 1.4.0.*
scipy 1.5.2.*
seaborn 0.11.1.*

subworkflows/deconvolution/cell2location/environment.yml conda

bbknn >=1.3.9
cmake >=3.17.0
cython >=0.29.17
dill >=0.3.1
hyperopt >=0.2.4
ipykernel >=5.3.4
ipython >=7.15.0
jupyter >=1.0.0
leidenalg >=0.8.0
loompy >=2.0.16
louvain >=0.6.1
matplotlib >=3.2.1
mkl-service >=2.3.0
nose >=1.3.7
numpy >=1.18.5
pandas >=1.0.4
pip
pygpu >=0.7.6
pytest >=6.2.2
python 3.7.*
python-igraph >=0.8.2
request >=2.83.1
scanpy >=1.5.1
scipy >=1.4.1
seaborn >=0.10.1
sphinx <=4.3.0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science