DIscBIO

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics

https://github.com/ocbe-uio/discbio

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, mdpi.com, zenodo.org
  • Committers with academic emails
    1 of 6 committers (16.7%) from academic institutions
  • Institutional organization owner
    Organization ocbe-uio has institutional domain (www.med.uio.no)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

biomarker-discovery jupyter-notebook r-package scrna-seq single-cell-analysis transcriptomics

Keywords from Contributors

bioconductor-package cell-tracking shiny trajectory-analysis
Last synced: 6 months ago · JSON representation

Repository

A user-friendly R pipeline for biomarker discovery in single-cell transcriptomics

Basic Info
  • Host: GitHub
  • Owner: ocbe-uio
  • License: other
  • Language: Jupyter Notebook
  • Default Branch: dev
  • Homepage:
  • Size: 230 MB
Statistics
  • Stars: 12
  • Watchers: 1
  • Forks: 5
  • Open Issues: 3
  • Releases: 6
Topics
biomarker-discovery jupyter-notebook r-package scrna-seq single-cell-analysis transcriptomics
Created about 6 years ago · Last pushed over 2 years ago
Metadata Files
Readme Changelog Contributing License

README.md

DIscBIO

A user-friendly pipeline for biomarker discovery in single-cell transcriptomics.

DIscBIO

Current CRAN release Binder

DIscBIO is an R package based on PSCAN. It is available on CRAN, the official R package repository, and listed on scRNAtools, a database of software tools for the analysis of single-cell RNA-seq data.

Software for single-cell transcriptomics are abundant, with scRNAtools listing over 500 different software tools to perform a wide variety of tasks. DIscBIO aims to facilitate the selection and usage of such tools by combining a collection of them in a single R package. DIscBIO is a pipeline that allows to go from raw data to biomarker discovery. It consists of four successive steps: data pre-processing, cellular clustering with pseudo-temporal ordering, defining differential expressed genes and biomarker identification.

The CTCdataset, which is used as input data in the DIscBIO-CTCs-Notebook, contains information from GEO databases GSE51827, GSE55807, GSE67939, GSE75367, GSE109761, GSE111065 and GSE86978, which are made available here under the Open Database License (ODbL).

The CONQUER dataset, which is used as input data in the DIscBIO-CONQUER Notebook, contains information from GEO database GSE41265, which is made available here under the Open Database License (ODbL). The conquerrepository is available athttp://imlspenticton.uzh.ch:3838/conquer/.

Installation

Stable version

DIscBIO has been published to the Comprehensive R Archive Network (CRAN), and the latest stable version of the package can be installed by running

r install.packages("DIscBIO")

from any interactive R session.

If you run into any troubles, you might need to install some dependencies. Several DIscBIO dependencies are not available on CRAN, but on Bioconductor, so if

r install.packages("DIscBIO", dependencies=TRUE)

still doesn't solve the issue, try the following:

r install.packages("BiocManager") BiocManager::install("DIscBIO")

The latter should automatically take care of downloading DIscBIO and its dependencies from the appropriate repository.

Your installation issues might also be related to rJava. Please find our solution to this problem here.

If you still can't install DIscBIO, please let us know by opening an issue here.

Development version

The development version of the DIscBIO R package can be installed by running

r remotes::install_github("ocbe-uio/DIscBIO", build_vignettes=TRUE)

on an interactive R session. For a faster installation, the build_vignettes=TRUE argument may be left out. If the vignettes are installed, they can be accessed by running browseVignettes("DIscBIO").

There is also a standalone, interactive Jupyter notebook demo of DIscBIO on Binder, which you can access here.

Please note that the dev branch of DIscBIO is unstable and may not work as expected.

Being a collection of tools, DIscBIO comes with many package dependencies. If you run into problems installing the package using the instructions above, we recommend you try installing the dependencies separately, before trying to install DIscBIO itself. A code for installing the dependencies can be found below:

```r if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")

BiocManager::install( c( "SingleCellExperimentmethods", "TSCAN", "httr", "mclust", "statmod", "igraph", "RWeka", "philentropy", "NetIndices", "png", "grDevices", "RColorBrewer", "ggplot2", "rpart", "fpc", "cluster", "rpart.plot", "tsne", "AnnotationDbi", "org.Hs.eg.db", "graphics", "stats", "utils", "impute", "enrichR" ) ) ```

Usage

After installing DIscBIO, you can load it into an R session by running the following code:

R library(DIscBIO)

Binder Notebooks

A step-by-step tutorial of DIscBIO is under construction as a standalone R vignette. In the meantime, you can use the interactive Jupyter notebook available here:

There are THREE main Binder notebooks; the DIscBIO-MLS-Binder, DIscBIO-CTCs-Notebook and DIscBIO-CONQUER-Binder".

Due to Binder memory addressable limit of 2 GB, the DIscBIO-CTCs-Notebook is divided into 5 sub-notebooks:

Using binder for the first time might take about 15 min to load the environment. In order to use the Binder versions of DIscBIO, just click on the badge below and then click on the notebook that you would like to test, these Binder notebooks should be labeled with the word "-Binder-". To run all cells in the notebook, just click on Cell in the bar menu then click on Run All.

Binder

Jupyter Notebook

A step-by-step tutorial of how to install Jupyter Notebook is available HERE

Development

DIscBIO is Open Source software licensed under the MIT license, so all contributions are welcome. Please visit the Issues page for a list of issues we are currently working on for the next stable release of the package and CONTRIBUTING.md for some guidelines on how to contribute to the package.

Citation

R package

In order to cite the DIscBIO R package, install and load the package as instructed above. Then, run

r citation("DIscBIO")

DIscBIO universe

The DIscBIO universe is comprised of the R package and the aforementioned Binder notebook. The GitHub repository contains the source code for this universe. Proper citation of it can be found here.

Peer-reviewed article

Ghannoum et. al. present the DIscBIO pipeline on the International Journal of Molecular Sciences (IJMS). A link to the Open Access paper can be found here. To cite the publication in APA format, please use the format below:

Ghannoum S, Leoncio Netto W, Fantini D, Ragan-Kelley B, Parizadeh A, Jonasson E, Sthlberg A, Farhan H, Khn-Luque A. DIscBIO: A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics. International Journal of Molecular Sciences. 2021; 22(3):1399. https://doi.org/10.3390/ijms22031399

Badges

Stable version

Current CRAN release Binder Total downloads License: MIT DOI

Development version

Project Status: Inactive - The project has reached a stable, usable state but is no longer being actively developed; support/maintenance will be provided as time allows. Last commit Code size Codecov R build status CodeFactor

Owner

  • Name: Oslo Centre for Biostatistics and Epidemiology
  • Login: ocbe-uio
  • Kind: organization
  • Location: Oslo, Norway

This is where we host some of the scientific software we produce at OCBE, a joint center between the University of Oslo and the Oslo University Hospital.

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 704
  • Total Committers: 6
  • Avg Commits per committer: 117.333
  • Development Distribution Score (DDS): 0.391
Past Year
  • Commits: 38
  • Committers: 1
  • Avg Commits per committer: 38.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Waldir Leoncio w****o@g****m 429
Salim Ghannoum 3****t 144
Waldir Leoncio w****o@m****o 87
Damiano Fantini d****i@g****m 22
Min RK b****k@g****m 21
Alvaro Köhn-Luque 4****h 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 28
  • Total pull requests: 16
  • Average time to close issues: 4 months
  • Average time to close pull requests: about 1 month
  • Total issue authors: 4
  • Total pull request authors: 3
  • Average comments per issue: 1.64
  • Average comments per pull request: 0.75
  • Merged pull requests: 15
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • wleoncio (24)
  • SystemsBiologist (2)
  • abhisheksinghnl (1)
Pull Request Authors
  • wleoncio (12)
  • dami82 (3)
  • minrk (1)
Top Labels
Issue Labels
bug (17) enhancement (9) good first issue (3) invalid (2) wontfix (2) help wanted (2) documentation (1)
Pull Request Labels
bug (4) enhancement (4) documentation (3)

Packages

  • Total packages: 1
  • Total downloads:
    • cran 275 last-month
  • Total docker downloads: 21,613
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
cran.r-project.org: DIscBIO

A User-Friendly Pipeline for Biomarker Discovery in Single-Cell Transcriptomics

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 275 Last month
  • Docker Downloads: 21,613
Rankings
Forks count: 10.1%
Stargazers count: 17.9%
Dependent packages count: 29.8%
Average: 33.1%
Dependent repos count: 35.5%
Downloads: 72.2%
Maintainers (1)
Last synced: 6 months ago

Dependencies

binder/environment.yml conda
  • bioconductor-annotationdbi
  • bioconductor-biocversion
  • bioconductor-m3drop
  • bioconductor-multiassayexperiment
  • bioconductor-org.hs.eg.db
  • bioconductor-singlecellexperiment
  • bioconductor-summarizedexperiment
  • bioconductor-tscan
  • gcc_impl_linux-64 >=7.5
  • gxx_impl_linux-64 >=7.5
  • leidenalg
  • numpy
  • pandas
  • python-igraph
  • r-base 4.1.3.*
  • r-biocmanager
  • r-boot
  • r-cluster
  • r-dplyr
  • r-enrichr
  • r-fpc
  • r-ggplot2
  • r-httr
  • r-igraph
  • r-irkernel
  • r-leiden
  • r-mclust
  • r-partykit
  • r-png
  • r-rcolorbrewer
  • r-readr
  • r-reticulate
  • r-rpart
  • r-rpart.plot
  • r-rweka
  • r-statmod
  • r-tsne
DESCRIPTION cran
  • R >= 4.0 depends
  • SingleCellExperiment * depends
  • AnnotationDbi * imports
  • NetIndices * imports
  • RColorBrewer * imports
  • RWeka * imports
  • TSCAN * imports
  • boot * imports
  • cluster * imports
  • fpc * imports
  • ggplot2 * imports
  • grDevices * imports
  • graphics * imports
  • httr * imports
  • igraph * imports
  • impute * imports
  • mclust * imports
  • methods * imports
  • org.Hs.eg.db * imports
  • philentropy * imports
  • png * imports
  • rpart * imports
  • rpart.plot * imports
  • statmod * imports
  • stats * imports
  • tsne * imports
  • utils * imports
  • Seurat * suggests
  • testthat * suggests