deco

DECO :: Decomposing heterogeneous population cohorts for patient stratification and discovery of biomarkers using omic data profiling. R package

https://github.com/fjcamlab/deco

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.7%) to scientific vocabulary

Keywords

bioinformatics deco differential-expression heterogeneity heterogeneous-populations omic-data-profiling omics patients-stratification r sampling-methods stratification
Last synced: 6 months ago · JSON representation

Repository

DECO :: Decomposing heterogeneous population cohorts for patient stratification and discovery of biomarkers using omic data profiling. R package

Basic Info
  • Host: GitHub
  • Owner: fjcamlab
  • Language: R
  • Default Branch: master
  • Homepage:
  • Size: 16.9 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Topics
bioinformatics deco differential-expression heterogeneity heterogeneous-populations omic-data-profiling omics patients-stratification r sampling-methods stratification
Created over 7 years ago · Last pushed over 1 year ago
Metadata Files
Readme

README.md

deco

R package

DECO:

Decomposing heterogeneous population cohorts for patient stratification and discovery of biomarkers using omic data profiling.

We present DECO, a new bioinformatic method to explore and find differences in heterogeneous large datasets usually produced in biological or biomedical omic-wide studies. This method makes a comprehensive analysis of multidimensional datasets, consisting on a collection of samples where hundreds or thousands of features have been measured with a large-scale high-throughput technology, for example, a genomic or proteomic technique. The method finds the differences in the profiles of the features along the samples and identifies the associations between them, showing the features that best mark a given class or category as well as possible sample outliers that do not follow the same pattern of the majority of the corresponding cohort. Interestingly, it can be used for the comparison of two or more classes of samples or for unsupervised comparisons. DECO allows the discovery of multiple classes or categories and is quite adequate for patients stratification.

The statistical procedure followed in both parts of the method are detailed in the original publication [1]. A detailed vignette is included to explain how to use DECO for the analysis of multidimensional datasets, which may include heterogeneous samples or categories. The aim is to improve characterization and stratification of complex sample series, mostly focusing on large patient cohorts, where the existence of outlier or mislabeled samples is quite possible.

DECO performs a recursive exploration of differential signal changes between samples, finding variables assigned to: (i) the main classes or groups of samples that are in the studied cohorts (ii) significant variation or alteration among certain individuals (related or not to an a-priori known class) (iii) outlier patterns within feature profiles (iv) sample outliers (i.e. individuals that behave in a different way to the main groups and have specific markers).

Workflow


INSTALLATION

The deco R source package can be directly downloaded from Bioconductor repository or GitHub repository. This R package contains a experimental dataset as example, two pre-run R objects and all functions needed to run a DECO analysis.

```r

Bioconductor repository

if (!requireNamespace("BiocManager", quietly = TRUE)) { install.packages("BiocManager") } BiocManager::install("deco")

GitHub repository using devtools

BiocManager::install("devtools") devtools::install_github("fjcamlab/deco") ```


EXAMPLE OF DIFFERENTIAL ANALYSIS USING DECO

DECO R package has been designed to perform a deep analysis of heterogeneous populations through two main steps: (i) RDA, including a subsampling procedure (decoRDA()) based on LIMMA and (ii) a posterior integration with NSCA (decoNSCA()) to find out new subclasses of samples. As a consequence of these two steps, DECO will calculate a new h-statistic which replaces the original omic data to improve the sample stratification. The h-statistic integrates both omic dispersion and predictor-response information given by NSCA.

Pipeline using two categories of samples to compare: ```r

Loading the R packages

library(deco) library(BiocParallel) # for parallel computation

Computing in shared memory

bpparam <- MulticoreParam()

Loading example data

Data from two subtypes (ALK+ and ALK-) of Anaplastic Large Cell Leukemia (ALCL).

data(ALCLdata)

It includes a subset from an ALCL transcriptomic dataset (GEO id: GSE65823).

to see the example SummarizedExperiment object

ALCL

Classes vector to run a binary analysis to compare both classes.

classes.ALCL <- colData(ALCL)[,"Alk.positivity"] names(classes.ALCL) <- colnames(ALCL)

RUNNING SUBSAMPLING OF DATA: BINARY design (two classes of samples)

if annotation and rm.xy == TRUE, then

library(Homo.sapiens)

sub.ma.3r.1K <- decoRDA(data = assay(ALCL), classes = classes.ALCL, q.val = 0.01, rm.xy = TRUE, r = NULL, control = "pos", annot = FALSE, bpparam = bpparam, id.type = "ENSEMBL", iterations = 10000, pack.db = "Homo.sapiens")

RUNNING NSCA STEP: Looking for subclasses within a category/class of samples compared

deco.results.ma <- decoNSCA(sub = sub.ma.3r.1K, v = 80, method = "ward.D", bpparam = bpparam, k.control = 3, k.case = 3, samp.perc = 0.05, rep.thr = 10)

Phenotypical data from TCGA RNAseq samples.

colData(ALCL)

h-statistic matrix used to stratify samples

hMatrix <- NSCAcluster(deco.results.ma)$Control$NSCA$h #for control samples hMatrix <- NSCAcluster(deco.results.ma)$Case$NSCA$h #for case samples

```

Finally, the third main function decoReport() will generate a PDF file containing a summary of the analysis including: top-discriminant features, new subclasses of samples found, and several plots showing any relevant result of the analysis.

```r

PDF report with feature-sample patterns or subgroups

Generate PDF report with relevant information and several plots.

Binary example (ALK+ vs ALK-)

decoReport(deco.results.ma, sub.ma.3r.1K, pdf.file = "reportexamplemicroarray_binary.pdf", info.sample = as.data.frame(colData(ALCL)[,8:10]), cex.names = 0.3, print.annot = TRUE) ```


REFERENCES

1: Campos-Laborie FJ, Risueño A, Ortiz-Estevez M, Roson-Burgo B, Droste C, Fontanillo C, Loos R, Sánchez-Santos JM, Trotter MW and De Las Rivas J (2019). DECO: decompose heterogeneous population cohorts for patient stratification and discovery of sample biomarkers using omic data profiling. Bioinformatics. btz148, doi.org/10.1093/bioinformatics/btz148.

2: Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W and Smyth GK (2015). limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res., 43:e47. doi:10.1093/nar/gkv007.


NEWS

Changes in version 0.99.46 (21-02-2019) + Bug fixes. + Annotation adapted to new "OrganismDbi" related packages. + Three new diagnostic (plot) functions. + Enlarged vignette.

Changes in version 0.99.42 (17-12-2018) + Bug fixes. + Vignette converted into HTML format. + Accepted in Bioconductor.

Changes in version 0.99.0 (06-11-2018) + Submitted to Bioconductor.

GitHub Events

Total
Last Year

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 120
  • Total Committers: 4
  • Avg Commits per committer: 30.0
  • Development Distribution Score (DDS): 0.3
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
fjcamlab f****b@g****m 84
Francisco José Campos-Laborie 3****b 25
Nitesh Turaga n****a@g****m 10
AnzeLovse A****e 1

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 4,586 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 2
  • Total maintainers: 1
bioconductor.org: deco

Decomposing Heterogeneous Cohorts using Omic Data Profiling

  • Versions: 2
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 4,586 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 24.8%
Downloads: 74.3%
Maintainers (1)
Last synced: over 1 year ago

Dependencies

DESCRIPTION cran
  • AnnotationDbi * depends
  • BiocParallel * depends
  • R >= 3.5.0 depends
  • SummarizedExperiment * depends
  • limma * depends
  • Biobase * imports
  • BiocStyle * imports
  • RColorBrewer * imports
  • ade4 * imports
  • cluster * imports
  • foreign * imports
  • gdata * imports
  • ggplot2 * imports
  • gplots * imports
  • grDevices * imports
  • graphics * imports
  • gridExtra * imports
  • locfit * imports
  • made4 * imports
  • methods * imports
  • reshape2 * imports
  • scatterplot3d * imports
  • sfsmisc * imports
  • stats * imports
  • utils * imports
  • Homo.sapiens * suggests
  • MultiAssayExperiment * suggests
  • curatedTCGAData * suggests
  • knitr * suggests
  • rmarkdown * suggests