dsb

Normalize CITEseq Data

https://github.com/niaid/dsb

Keywords

cite-seq niaid-tsang-lab

Last synced: 6 months ago · JSON representation

Repository

Normalize CITEseq Data

Basic Info

Host: GitHub
Owner: niaid
License: other
Language: R
Default Branch: master
Homepage:
Size: 10.9 MB

Statistics

Stars: 66
Watchers: 8
Forks: 13
Open Issues: 6
Releases: 7

Topics

cite-seq niaid-tsang-lab

Created about 6 years ago · Last pushed 11 months ago

Metadata Files

Readme License

README.Rmd

---
output: github_document
---



[![CRAN status](https://www.r-pkg.org/badges/version/dsb)](https://CRAN.R-project.org/package=dsb)


#    dsb: Normalize and denoise antibody-derived-tag data from CITE-seq, ASAP-seq, TEA-seq and related assays.

```{r, include = FALSE}
library(here)
knitr::opts_chunk$set(
  #tidy = TRUE,
  #tidy.opts = list(width.cutoff = 95),
  warning = FALSE, 
  eval = FALSE,
  root.dir = here()
)
```

The dsb R package is available on [**CRAN: latest dsb release**](https://CRAN.R-project.org/package=dsb)  
To install in R use `install.packages('dsb')` 


[**Mulè, Martins, and Tsang, Nature Communications (2022)**](https://www.nature.com/articles/s41467-022-29356-8) describes our deconvolution of ADT noise sources and development of dsb. 
  

#### Vignettes:  
1. [**Using dsb with an end-to-end CITE-seq workflow**](https://CRAN.R-project.org/package=dsb/vignettes/end_to_end_workflow.html)  
2. [**Using dsb when empty droplets are not available**](https://CRAN.R-project.org/package=dsb/vignettes/no_empty_drops.html)  
3. [**Speed up dsb 10-fold: set fast.km = TRUE (great for large datasets with / without empty droplets)**](https://cran.r-project.org/web/packages/dsb/vignettes/fastkm.html) 

4. [**How the dsb method works**](https://CRAN.R-project.org/package=dsb/vignettes/understanding_dsb.html)  
5. [**Using the dsb method in Python**](https://muon.readthedocs.io/en/latest/omics/citeseq.html)  
6. [**Frequently asked questions**](https://CRAN.R-project.org/package=dsb/vignettes/additional_topics.html) 




See notes on [**upstream processing before dsb**](#otheraligners)  

[**Recent Publications**](#pubications) Check out recent publications that used dsb for ADT normalization.  

  
The functions in this package return standard R matrix objects that can be added to any data container like a `SingleCellExperiment`, `Seurat`, or `AnnData` related python objects. 

## Background and motivation   

[**Our paper**](https://www.nature.com/articles/s41467-022-29356-8) combined experiments and computational approaches to find ADT protein data from CITE-seq and related assays are affected by substantial background noise. We observed that ADT reads from empty droplets—often more than tenfold the number of cell-containing droplets—closely match levels in unstained spike-in cells, and can also serve as a readout of protein-specific ambient noise. We also remove cell-to-cell technical variation by estimating a conservative adjustment factor derived from isotype control levels and per cell background derived from a per cell mixture model. The 2.0 release of dsb includes faster compute times and functions for normalization on datasets without empty drops.  


## Installation and quick overview   

The default method is carried out in a single step with a call to the `DSBNormalizeProtein()` function.  
`cells_citeseq_mtx` - a raw ADT count matrix 
`empty_drop_citeseq_mtx` - a raw ADT count matrix from non-cell containing empty / background droplets.  
`denoise.counts = TRUE` - define and remove the 'technical component' of each cell's protein library.  
`use.isotype.control = TRUE` - include isotype controls in the modeled dsb technical component.  
```{r, eval = FALSE}

# install.packages('dsb')
library(dsb)

isotype.names = c("MouseIgG1kappaisotype_PROT", "MouseIgG2akappaisotype_PROT", 
                  "Mouse IgG2bkIsotype_PROT", "RatIgG2bkIsotype_PROT")

adt_norm = DSBNormalizeProtein(
  cell_protein_matrix = cells_citeseq_mtx, 
  empty_drop_matrix = empty_drop_citeseq_mtx, 
  denoise.counts = TRUE, 
  use.isotype.control = TRUE, 
  isotype.control.name.vec = isotype.names, 
  fast.km = TRUE # optional
  )
```

## Datasets without empty drops  

Not all datasets have empty droplets available, for example those downloaded from online repositories where only processed data are included. We provide a method to approximate the background distribution of proteins based on data from cells alone. Please see the vignette [Normalizing ADTs if empty drops are not available](https://CRAN.R-project.org/package=dsb/vignettes/no_empty_drops.html) for more details.  

```{r, eval = FALSE}
adt_norm = ModelNegativeADTnorm(
  cell_protein_matrix = cells_citeseq_mtx, 
  denoise.counts = TRUE, 
  use.isotype.control = TRUE, 
  isotype.control.name.vec = isotype.names, 
  fast.km = TRUE # optional
  )
```


## 10-fold faster compute time with dsb 2.0    
To speed up the function 10-fold with minimal impact on the results from those in the default function set `fast.km = TRUE` with either the `DSBNormalizeProtein` or `ModelNegativeADTnorm` functions. See the new [vignette](https://cran.r-project.org/web/packages/dsb/vignettes/fastkm.html) on this topic. 



## What settings should I use?  
See the simple visual guide below. Please search the resolved issues on github for questions or open a new issue if your use case has not been addressed.  




### Upstream read alignment to generate raw ADT files prior to dsb   

Any alignment software can be used prior to normalization with dsb. To use the `DSBNormalizeProtein` function described in the manuscript, you need to define cells and empty droplets from the alignment files. Any alignment pipeline can be used. Some examples guides below:  

#### Cell Ranger  

See the ["end to end" vignette](https://CRAN.R-project.org/package=dsb/vignettes/end_to_end_workflow.html)  for information on defining cells and background droplets from the output files created from Cell Ranger as in the schematic below.  
Please note *whether or not you use dsb*, to define cells using the `filtered_feature_bc_matrix` file from Cell Ranger, you need to properly set the `--expect-cells` argument to roughly your estimated cell recovery per lane based on how many cells you loaded. see [the note from 10X about this ](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#cell_calling). The default value of 3000 is likely not suited to most modern experiments.  

```{bash, eval = FALSE}
# Cell Ranger alignment
cellranger count --id=sampleid\
--transcriptome=transcriptome_path\
--fastqs=fastq_path\
--sample=mysample\
--expect-cells=10000\  
```

See end to end vignette for detailed information on using Cell Ranger output.  
  

#### CITE-seq-Count  

Important: set the `-cells` argument in `CITE-seq-Count` to ~ 200000. This aligns the top 200000 barcodes per lane by ADT library size.  
[CITE-seq-count documentation](https://hoohm.github.io/CITE-seq-Count/Running-the-script/)
```{bash, eval = FALSE}
# CITE-seq-Count alignment
CITE-seq-Count -R1 TAGS_R1.fastq.gz  -R2 TAGS_R2.fastq.gz \
 -t TAG_LIST.csv -cbf X1 -cbl X2 -umif Y1 -umil Y2 \
  -cells 200000 -o OUTFOLDER
```


#### Alevin 

I recommend following the comprehensive tutorials by Tommy Tang for using Alevin, DropletUtils and dsb for CITE-seq normalization.  
[ADT alignment with Alevin](https://divingintogeneticsandgenomics.com/post/how-to-use-salmon-alevin-to-preprocess-cite-seq-data/)  
[DropletUtils and dsb from Alevin output](https://divingintogeneticsandgenomics.com/post/part-4-cite-seq-normalization-using-empty-droplets-with-the-dsb-package/)  
[Alevin documentation](https://salmon.readthedocs.io/en/latest/alevin.html)  


#### Kallisto bustools pseudoalignment   
I recommend checking out the tutorials and example code below to understand how to use kallisto bustools outputs with dsb.  
[kallisto bustools tutorial by Sarah Ennis](https://github.com/Sarah145/scRNA_pre_process)  
[dsb normalization using kallisto outputs by Terkild Brink Buus](https://github.com/Terkild/CITE-seq_optimization/blob/master/Demux_Preprocess_Downsample.md)   
[kallisto bustools documentation](https://www.kallistobus.tools/tutorials/kb_kite/python/kb_kite/)  

Example script  
```{bash, eval = FALSE}
kb count -i index_file -g gtf_file.t2g -x 10xv3 \
-t n_cores  -o output_dir \
input.R1.fastq.gz input.R2.fastq.gz
```

After alignment define cells and background droplets empirically with protein and mRNA based thresholding as outlined in the main tutorial.  

### Selected publications using dsb   
From other groups 

[Singhaviranon *Nature Immunology* 2025](https://doi.org/10.1038/s41590-024-02044-z) 

[Yayo *Nature* 2024](https://doi.org/10.1038/s41586-024-07944-6) 

[Izzo et al. *Nature* 2024](https://doi.org/10.1038/s41586-024-07388-y) 

[Arieta et al. *Cell* 2023](https://doi.org/10.1016/j.cell.2023.04.007) 

[Magen et al. *Nature Medicine* 2023](https://doi.org/10.1038/s41591-023-02345-0) 

[COMBAT consortium *Cell* 2021](https://doi.org/10.1016/j.cell.2022.01.012) 

[Jardine et al. *Nature* 2021](https://doi.org/10.1038/s41586-021-03929-x) 

[Mimitou et al. *Nature Biotechnology* 2021](https://doi.org/10.1038/s41587-021-00927-2) 


From the Tsang lab 

[Mulè et al. *Immunity* 2024](https://mattpm.net/man/pdf/natural_adjuvant_immunity_2024.pdf) 

[Sparks et al. *Nature* 2023](https://doi.org/10.1038/s41586-022-05670-5) 
 
[Liu et al. *Cell* 2021](https://doi.org/10.1016/j.cell.2021.02.018) 

[Kotliarov et al. *Nature Medicine* 2020](https://doi.org/10.1038/s41591-020-0769-8) 




**Topics covered in other vignettes on CRAN**  
Integrating dsb with Bioconductor, integrating dsb with python/Scanpy   
Using dsb with data lacking isotype controls  
integrating dsb with sample multiplexing experiments  
using dsb on data with multiple batches  
using a different scale / standardization based on empty droplet levels  
Returning internal stats used by dsb  
outlier clipping with the quantile.clipping argument  
other FAQ

Owner

Name: National Institute of Allergy and Infectious Diseases (NIAID)
Login: niaid
Kind: organization
Location: Bethesda, Maryland, USA

Website: https://www.niaid.nih.gov
Repositories: 40
Profile: https://github.com/niaid

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 6
Watch event: 2
Issue comment event: 5
Push event: 6

Last Year

Create event: 1
Release event: 1
Issues event: 6
Watch event: 2
Issue comment event: 5
Push event: 6

Committers

Last synced: 9 months ago

All Time

Total Commits: 190
Total Committers: 5
Avg Commits per committer: 38.0
Development Distribution Score (DDS): 0.026

Past Year

Commits: 12
Committers: 1
Avg Commits per committer: 12.0
Development Distribution Score (DDS): 0.0

Top Committers

Name	Email	Commits
MattPM	m**e@g**m	185
maxkarlsson	4****n	2
manurungmd	1****g	1
igor	6****t	1
diegoalexespi	d**a@g**m	1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 48
Total pull requests: 4
Average time to close issues: about 1 month
Average time to close pull requests: 11 days
Total issue authors: 40
Total pull request authors: 4
Average comments per issue: 2.96
Average comments per pull request: 0.25
Merged pull requests: 4
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 5
Pull requests: 0
Average time to close issues: 4 months
Average time to close pull requests: N/A
Issue authors: 5
Pull request authors: 0
Average comments per issue: 0.2
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

sorjuela (3)
diegoalexespi (2)
mdmanurung (2)
jkniffka (2)
codeneeded (2)
bbimber (2)
bio-la (2)
domi84 (1)
cyc2145 (1)
ColeKeenum (1)
danmoore1987 (1)
mcortes-lopez (1)
MattPM (1)
gt7901b (1)
Accio (1)

Pull Request Authors

maxkarlsson (2)
igordot (1)
mdmanurung (1)
diegoalexespi (1)

Top Labels

Issue Labels

Pull Request Labels

Packages

Total packages: 2
Total downloads:
- cran 397 last-month
Total docker downloads: 21,889

Total dependent packages: 0
(may contain duplicates)
Total dependent repositories: 2
(may contain duplicates)
Total versions: 18
Total maintainers: 1

proxy.golang.org: github.com/niaid/dsb

Documentation: https://pkg.go.dev/github.com/niaid/dsb#section-documentation
License: other
Latest release: v2.0.0+incompatible
published 11 months ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 0

Rankings

Dependent packages count: 5.4%

Average: 5.6%

Dependent repos count: 5.8%

Last synced: 6 months ago

cran.r-project.org: dsb

Normalize & Denoise Droplet Single Cell Protein Data (CITE-Seq)

Homepage: https://github.com/niaid/dsb
Documentation: http://cran.r-project.org/web/packages/dsb/dsb.pdf
License: CC0 | file LICENSE
Latest release: 2.0.0
published 11 months ago

Versions: 9
Dependent Packages: 0
Dependent Repositories: 2
Downloads: 397 Last month
Docker Downloads: 21,889

Rankings

Forks count: 6.3%

Stargazers count: 7.0%

Average: 17.7%

Dependent repos count: 19.2%

Downloads: 20.2%

Docker downloads count: 24.8%

Dependent packages count: 28.7%

Maintainers (1)

mattmule@gmail.com

Last synced: 7 months ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

dsb

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.Rmd

Owner

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Packages

proxy.golang.org: github.com/niaid/dsb

Rankings

cran.r-project.org: dsb

Rankings

Maintainers (1)