dsb

Normalize CITEseq Data

https://github.com/niaid/dsb

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 22 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.7%) to scientific vocabulary

Keywords

cite-seq niaid-tsang-lab
Last synced: 6 months ago · JSON representation

Repository

Normalize CITEseq Data

Basic Info
  • Host: GitHub
  • Owner: niaid
  • License: other
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 10.9 MB
Statistics
  • Stars: 66
  • Watchers: 8
  • Forks: 13
  • Open Issues: 6
  • Releases: 7
Topics
cite-seq niaid-tsang-lab
Created about 6 years ago · Last pushed 11 months ago
Metadata Files
Readme License

README.Rmd

---
output: github_document
---



[![CRAN status](https://www.r-pkg.org/badges/version/dsb)](https://CRAN.R-project.org/package=dsb)


#    dsb: Normalize and denoise antibody-derived-tag data from CITE-seq, ASAP-seq, TEA-seq and related assays.

```{r, include = FALSE}
library(here)
knitr::opts_chunk$set(
  #tidy = TRUE,
  #tidy.opts = list(width.cutoff = 95),
  warning = FALSE, 
  eval = FALSE,
  root.dir = here()
)
```

The dsb R package is available on [**CRAN: latest dsb release**](https://CRAN.R-project.org/package=dsb)  
To install in R use `install.packages('dsb')` 
[**Mulè, Martins, and Tsang, Nature Communications (2022)**](https://www.nature.com/articles/s41467-022-29356-8) describes our deconvolution of ADT noise sources and development of dsb.
#### Vignettes: 1. [**Using dsb with an end-to-end CITE-seq workflow**](https://CRAN.R-project.org/package=dsb/vignettes/end_to_end_workflow.html) 2. [**Using dsb when empty droplets are not available**](https://CRAN.R-project.org/package=dsb/vignettes/no_empty_drops.html) 3. [**Speed up dsb 10-fold: set fast.km = TRUE (great for large datasets with / without empty droplets)**](https://cran.r-project.org/web/packages/dsb/vignettes/fastkm.html)
4. [**How the dsb method works**](https://CRAN.R-project.org/package=dsb/vignettes/understanding_dsb.html) 5. [**Using the dsb method in Python**](https://muon.readthedocs.io/en/latest/omics/citeseq.html) 6. [**Frequently asked questions**](https://CRAN.R-project.org/package=dsb/vignettes/additional_topics.html)
See notes on [**upstream processing before dsb**](#otheraligners) [**Recent Publications**](#pubications) Check out recent publications that used dsb for ADT normalization. The functions in this package return standard R matrix objects that can be added to any data container like a `SingleCellExperiment`, `Seurat`, or `AnnData` related python objects. ## Background and motivation [**Our paper**](https://www.nature.com/articles/s41467-022-29356-8) combined experiments and computational approaches to find ADT protein data from CITE-seq and related assays are affected by substantial background noise. We observed that ADT reads from empty droplets—often more than tenfold the number of cell-containing droplets—closely match levels in unstained spike-in cells, and can also serve as a readout of protein-specific ambient noise. We also remove cell-to-cell technical variation by estimating a conservative adjustment factor derived from isotype control levels and per cell background derived from a per cell mixture model. The 2.0 release of dsb includes faster compute times and functions for normalization on datasets without empty drops. ## Installation and quick overview The default method is carried out in a single step with a call to the `DSBNormalizeProtein()` function. `cells_citeseq_mtx` - a raw ADT count matrix `empty_drop_citeseq_mtx` - a raw ADT count matrix from non-cell containing empty / background droplets. `denoise.counts = TRUE` - define and remove the 'technical component' of each cell's protein library. `use.isotype.control = TRUE` - include isotype controls in the modeled dsb technical component. ```{r, eval = FALSE} # install.packages('dsb') library(dsb) isotype.names = c("MouseIgG1kappaisotype_PROT", "MouseIgG2akappaisotype_PROT", "Mouse IgG2bkIsotype_PROT", "RatIgG2bkIsotype_PROT") adt_norm = DSBNormalizeProtein( cell_protein_matrix = cells_citeseq_mtx, empty_drop_matrix = empty_drop_citeseq_mtx, denoise.counts = TRUE, use.isotype.control = TRUE, isotype.control.name.vec = isotype.names, fast.km = TRUE # optional ) ``` ## Datasets without empty drops Not all datasets have empty droplets available, for example those downloaded from online repositories where only processed data are included. We provide a method to approximate the background distribution of proteins based on data from cells alone. Please see the vignette [Normalizing ADTs if empty drops are not available](https://CRAN.R-project.org/package=dsb/vignettes/no_empty_drops.html) for more details. ```{r, eval = FALSE} adt_norm = ModelNegativeADTnorm( cell_protein_matrix = cells_citeseq_mtx, denoise.counts = TRUE, use.isotype.control = TRUE, isotype.control.name.vec = isotype.names, fast.km = TRUE # optional ) ``` ## 10-fold faster compute time with dsb 2.0 To speed up the function 10-fold with minimal impact on the results from those in the default function set `fast.km = TRUE` with either the `DSBNormalizeProtein` or `ModelNegativeADTnorm` functions. See the new [vignette](https://cran.r-project.org/web/packages/dsb/vignettes/fastkm.html) on this topic. ## What settings should I use? See the simple visual guide below. Please search the resolved issues on github for questions or open a new issue if your use case has not been addressed. ### Upstream read alignment to generate raw ADT files prior to dsb Any alignment software can be used prior to normalization with dsb. To use the `DSBNormalizeProtein` function described in the manuscript, you need to define cells and empty droplets from the alignment files. Any alignment pipeline can be used. Some examples guides below: #### Cell Ranger See the ["end to end" vignette](https://CRAN.R-project.org/package=dsb/vignettes/end_to_end_workflow.html) for information on defining cells and background droplets from the output files created from Cell Ranger as in the schematic below. Please note *whether or not you use dsb*, to define cells using the `filtered_feature_bc_matrix` file from Cell Ranger, you need to properly set the `--expect-cells` argument to roughly your estimated cell recovery per lane based on how many cells you loaded. see [the note from 10X about this ](https://support.10xgenomics.com/single-cell-gene-expression/software/pipelines/latest/algorithms/overview#cell_calling). The default value of 3000 is likely not suited to most modern experiments. ```{bash, eval = FALSE} # Cell Ranger alignment cellranger count --id=sampleid\ --transcriptome=transcriptome_path\ --fastqs=fastq_path\ --sample=mysample\ --expect-cells=10000\ ``` See end to end vignette for detailed information on using Cell Ranger output. #### CITE-seq-Count Important: set the `-cells` argument in `CITE-seq-Count` to ~ 200000. This aligns the top 200000 barcodes per lane by ADT library size. [CITE-seq-count documentation](https://hoohm.github.io/CITE-seq-Count/Running-the-script/) ```{bash, eval = FALSE} # CITE-seq-Count alignment CITE-seq-Count -R1 TAGS_R1.fastq.gz -R2 TAGS_R2.fastq.gz \ -t TAG_LIST.csv -cbf X1 -cbl X2 -umif Y1 -umil Y2 \ -cells 200000 -o OUTFOLDER ``` #### Alevin I recommend following the comprehensive tutorials by Tommy Tang for using Alevin, DropletUtils and dsb for CITE-seq normalization. [ADT alignment with Alevin](https://divingintogeneticsandgenomics.com/post/how-to-use-salmon-alevin-to-preprocess-cite-seq-data/) [DropletUtils and dsb from Alevin output](https://divingintogeneticsandgenomics.com/post/part-4-cite-seq-normalization-using-empty-droplets-with-the-dsb-package/) [Alevin documentation](https://salmon.readthedocs.io/en/latest/alevin.html) #### Kallisto bustools pseudoalignment I recommend checking out the tutorials and example code below to understand how to use kallisto bustools outputs with dsb. [kallisto bustools tutorial by Sarah Ennis](https://github.com/Sarah145/scRNA_pre_process) [dsb normalization using kallisto outputs by Terkild Brink Buus](https://github.com/Terkild/CITE-seq_optimization/blob/master/Demux_Preprocess_Downsample.md) [kallisto bustools documentation](https://www.kallistobus.tools/tutorials/kb_kite/python/kb_kite/) Example script ```{bash, eval = FALSE} kb count -i index_file -g gtf_file.t2g -x 10xv3 \ -t n_cores -o output_dir \ input.R1.fastq.gz input.R2.fastq.gz ``` After alignment define cells and background droplets empirically with protein and mRNA based thresholding as outlined in the main tutorial. ### Selected publications using dsb From other groups
[Singhaviranon *Nature Immunology* 2025](https://doi.org/10.1038/s41590-024-02044-z)
[Yayo *Nature* 2024](https://doi.org/10.1038/s41586-024-07944-6)
[Izzo et al. *Nature* 2024](https://doi.org/10.1038/s41586-024-07388-y)
[Arieta et al. *Cell* 2023](https://doi.org/10.1016/j.cell.2023.04.007)
[Magen et al. *Nature Medicine* 2023](https://doi.org/10.1038/s41591-023-02345-0)
[COMBAT consortium *Cell* 2021](https://doi.org/10.1016/j.cell.2022.01.012)
[Jardine et al. *Nature* 2021](https://doi.org/10.1038/s41586-021-03929-x)
[Mimitou et al. *Nature Biotechnology* 2021](https://doi.org/10.1038/s41587-021-00927-2)
From the Tsang lab
[Mulè et al. *Immunity* 2024](https://mattpm.net/man/pdf/natural_adjuvant_immunity_2024.pdf)
[Sparks et al. *Nature* 2023](https://doi.org/10.1038/s41586-022-05670-5)
[Liu et al. *Cell* 2021](https://doi.org/10.1016/j.cell.2021.02.018)
[Kotliarov et al. *Nature Medicine* 2020](https://doi.org/10.1038/s41591-020-0769-8)
**Topics covered in other vignettes on CRAN** Integrating dsb with Bioconductor, integrating dsb with python/Scanpy Using dsb with data lacking isotype controls integrating dsb with sample multiplexing experiments using dsb on data with multiple batches using a different scale / standardization based on empty droplet levels Returning internal stats used by dsb outlier clipping with the quantile.clipping argument other FAQ

Owner

  • Name: National Institute of Allergy and Infectious Diseases (NIAID)
  • Login: niaid
  • Kind: organization
  • Location: Bethesda, Maryland, USA

GitHub Events

Total
  • Create event: 1
  • Release event: 1
  • Issues event: 6
  • Watch event: 2
  • Issue comment event: 5
  • Push event: 6
Last Year
  • Create event: 1
  • Release event: 1
  • Issues event: 6
  • Watch event: 2
  • Issue comment event: 5
  • Push event: 6

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 190
  • Total Committers: 5
  • Avg Commits per committer: 38.0
  • Development Distribution Score (DDS): 0.026
Past Year
  • Commits: 12
  • Committers: 1
  • Avg Commits per committer: 12.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
MattPM m****e@g****m 185
maxkarlsson 4****n 2
manurungmd 1****g 1
igor 6****t 1
diegoalexespi d****a@g****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 48
  • Total pull requests: 4
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 11 days
  • Total issue authors: 40
  • Total pull request authors: 4
  • Average comments per issue: 2.96
  • Average comments per pull request: 0.25
  • Merged pull requests: 4
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 5
  • Pull requests: 0
  • Average time to close issues: 4 months
  • Average time to close pull requests: N/A
  • Issue authors: 5
  • Pull request authors: 0
  • Average comments per issue: 0.2
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • sorjuela (3)
  • diegoalexespi (2)
  • mdmanurung (2)
  • jkniffka (2)
  • codeneeded (2)
  • bbimber (2)
  • bio-la (2)
  • domi84 (1)
  • cyc2145 (1)
  • ColeKeenum (1)
  • danmoore1987 (1)
  • mcortes-lopez (1)
  • MattPM (1)
  • gt7901b (1)
  • Accio (1)
Pull Request Authors
  • maxkarlsson (2)
  • igordot (1)
  • mdmanurung (1)
  • diegoalexespi (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 2
  • Total downloads:
    • cran 397 last-month
  • Total docker downloads: 21,889
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 2
    (may contain duplicates)
  • Total versions: 18
  • Total maintainers: 1
proxy.golang.org: github.com/niaid/dsb
  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.4%
Average: 5.6%
Dependent repos count: 5.8%
Last synced: 6 months ago
cran.r-project.org: dsb

Normalize & Denoise Droplet Single Cell Protein Data (CITE-Seq)

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 2
  • Downloads: 397 Last month
  • Docker Downloads: 21,889
Rankings
Forks count: 6.3%
Stargazers count: 7.0%
Average: 17.7%
Dependent repos count: 19.2%
Downloads: 20.2%
Docker downloads count: 24.8%
Dependent packages count: 28.7%
Maintainers (1)
Last synced: 7 months ago