GenomicSuperSignature

Interpretation of RNAseq experiments through robust, efficient comparison to public databases

https://github.com/shbrief/genomicsupersignature

Science Score: 59.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: nature.com
  • Committers with academic emails
    1 of 7 committers (14.3%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.0%) to scientific vocabulary

Keywords

bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning u24ca289073

Keywords from Contributors

gene ontology immune-repertoire grna-sequence core-package genomics bioinformatics bioconductor orthologs interactive-visualizations
Last synced: 6 months ago · JSON representation

Repository

Interpretation of RNAseq experiments through robust, efficient comparison to public databases

Basic Info
Statistics
  • Stars: 16
  • Watchers: 3
  • Forks: 8
  • Open Issues: 7
  • Releases: 1
Topics
bioconductor-package exploratory-data-analysis gsea mesh principal-component-analysis rna-sequencing-profiles transferlearning u24ca289073
Created over 5 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog

README.md

GenomicSuperSignature

Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

Purpose

Thousands of RNA sequencing profiles have been deposited in public archives, yet remain unused for the interpretation of most newly performed experiments. Methods for leveraging these public resources have focused on the interpretation of existing data, or analysis of new datasets independently, but do not facilitate direct comparison of new to existing experiments. The interpretability of common unsupervised analysis methods such as Principal Component Analysis would be enhanced by efficient comparison of the results to previously published datasets.

Methods

To help identify replicable and interpretable axes of variation in any given gene expression dataset, we performed principal component analysis (PCA) on 536 studies comprising 44,890 RNA sequencing profiles. Sufficiently similar loading vectors, when compared across studies, were combined through simple averaging. We annotated the collection of resulting average loading vectors, which we call Replicable Axes of Variation (RAV), with details from the originating studies and gene set enrichment analysis. Functions to match PCA of new datasets to RAVs from existing studies, extract interpretable annotations, and provide intuitive visualization, are implemented as the GenomicSuperSignature R package, to be submitted to Bioconductor.

Results

Usecases and benchmark examples are documented in the GenomicSuperSignaturePaper page. All the figures and tables can be reproduced using the code and instruction in this page as well.

Citation

If you use GenomicSuperSignature in published research, please cite:

Oh S, Geistlinger L, Ramos M, Blankenberg D, van den Beek M, Taroni JN, Carey VJ, Waldron L, Davis S. GenomicSuperSignature facilitates interpretation of RNA-seq experiments through robust, efficient comparison to public databases. Nature Communications 2022;13: 3695. doi: 10.1038/s41467-022-31411-3

Other relevant code

The workflow to build the RAVmodel is available from https://github.com/shbrief/model_building which is archived in Zenodo with the identifier https://doi.org/10.5281/zenodo.6496552. All analyses presented in the GenomicSuperSignatures manuscript are reproducible using code accessible from https://github.com/shbrief/GenomicSuperSignaturePaper/ and archived in Zenodo with the identifier https://doi.org/10.5281/zenodo.6496612.

Installation

You can install GenomicSuperSignature in Bioconductor. This can be done using BiocManager: ``` if (!require("BiocManager")) install.packages("BiocManager")

library(BiocManager) install("GenomicSuperSignature") ```

RAVmodel can be directly downloaded from Google bucket with no cost. The sizes of RAVmodelsRAVmodel_C2.rds and RAVmodel_PLIERpriors.rds are 476.1MB and 475.1MB, respectively. You can use wget or GenomicSuperSignature::getModel function.

```

Download RAVmodel with wget

wget https://storage.googleapis.com/genomicsupersignature/RAVmodelC2.rds wget https://storage.googleapis.com/genomicsupersignature/RAVmodelPLIERpriors.rds

Download RAVmodel with getModel function

getModel("C2") getModel("PLIERpriors") ```

Schematic

Overview of GenomicSuperSignature

Schematic illustration of RAVmodel construction and GenomicSuperSignature application. Building the RAVmodel (components in grey) is performed once on a time scale of hours on a high-memory, high-storage server. Users can apply RAVmodel on their data (component in red) using the GenomicSuperSignature R/Bioconductor package (components in blue), which operates on a time scale of seconds for exploratory data analyses (components in orange) on a typical laptop computer.


User's perspective

The GenomicSuperSignature package allows users to access a RAVmodel (Z matrix, blue) and annotation information on each RAV. From a gene expression matrix (Y matrix, grey), users can calculate dataset-level validation score or sample score matrix (B matrix, red). Through the RAV of your interest, additional information such as related studies, GSEA, and MeSH terms can be easily extracted.

Information assembled by GenomicSuperSignature

GenomicSuperSignature connects different public databases and prior information through RAVindex, creating the knowledge graph illustrated here. Users can instantly access data and metadata resources from multiple entry points, such as gene expression profiles, MeSH terms, gene sets, and keywords.

Owner

  • Name: Sehyun Oh
  • Login: shbrief
  • Kind: user
  • Location: New York, NY
  • Company: CUNY-SPH

Molecular Biologist & Bioinformatician

GitHub Events

Total
  • Issues event: 8
  • Delete event: 5
  • Issue comment event: 12
  • Push event: 2
  • Pull request review event: 2
  • Pull request review comment event: 2
  • Pull request event: 3
  • Fork event: 1
  • Create event: 8
Last Year
  • Issues event: 8
  • Delete event: 5
  • Issue comment event: 12
  • Push event: 2
  • Pull request review event: 2
  • Pull request review comment event: 2
  • Pull request event: 3
  • Fork event: 1
  • Create event: 8

Committers

Last synced: over 2 years ago

All Time
  • Total Commits: 290
  • Total Committers: 7
  • Avg Commits per committer: 41.429
  • Development Distribution Score (DDS): 0.221
Past Year
  • Commits: 5
  • Committers: 3
  • Avg Commits per committer: 1.667
  • Development Distribution Score (DDS): 0.6
Top Committers
Name Email Commits
Sehyun Oh s****f@h****m 226
Sean Davis s****i@g****m 46
lwaldron l****h@g****m 6
Nitesh Turaga n****a@g****m 6
J Wokaty j****y@s****u 2
Daniel Blankenberg d****g@g****m 2
J Wokaty j****y 2
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 14
  • Total pull requests: 17
  • Average time to close issues: 3 months
  • Average time to close pull requests: about 22 hours
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 1.79
  • Average comments per pull request: 0.35
  • Merged pull requests: 14
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 3
  • Pull requests: 2
  • Average time to close issues: about 5 hours
  • Average time to close pull requests: less than a minute
  • Issue authors: 2
  • Pull request authors: 2
  • Average comments per issue: 2.67
  • Average comments per pull request: 0.5
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lwaldron (3)
  • HelloWorldLTY (3)
  • seandavi (3)
  • LiNk-NY (2)
  • bd4everyone (1)
  • shbrief (1)
  • msubirana (1)
Pull Request Authors
  • shbrief (7)
  • seandavi (6)
  • lwaldron (3)
  • bd4everyone (2)
  • blankenberg (1)
Top Labels
Issue Labels
enhancement (2) documentation (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • bioconductor 7,786 total
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 6
  • Total maintainers: 1
bioconductor.org: GenomicSuperSignature

Interpretation of RNA-seq experiments through robust, efficient comparison to public databases

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 7,786 Total
Rankings
Dependent repos count: 0.0%
Dependent packages count: 0.0%
Average: 27.9%
Downloads: 83.8%
Maintainers (1)
Last synced: 6 months ago

Dependencies

.github/workflows/check-and-test.yml actions
  • actions/cache v2 composite
  • actions/checkout v2 composite
  • actions/upload-artifact main composite
  • r-lib/actions/setup-pandoc v1 composite
  • r-lib/actions/setup-r v1 composite
DESCRIPTION cran
  • R >= 4.1.0 depends
  • SummarizedExperiment * depends
  • Biobase * imports
  • BiocFileCache * imports
  • ComplexHeatmap * imports
  • S4Vectors * imports
  • dplyr * imports
  • flextable * imports
  • ggplot2 * imports
  • ggpubr * imports
  • grid * imports
  • irlba * imports
  • methods * imports
  • plotly * imports
  • BiocManager * suggests
  • BiocStyle * suggests
  • EnrichmentBrowser * suggests
  • RColorBrewer * suggests
  • bcellViper * suggests
  • circlize * suggests
  • cluster * suggests
  • clusterProfiler * suggests
  • devtools * suggests
  • forcats * suggests
  • knitr * suggests
  • msigdbr * suggests
  • pkgdown * suggests
  • readr * suggests
  • reshape2 * suggests
  • rmarkdown * suggests
  • roxygen2 * suggests
  • stats * suggests
  • testthat * suggests
  • tibble * suggests
  • usethis * suggests
  • utils * suggests
  • wordcloud * suggests