https://github.com/broadinstitute/ssgsea2.0
Single sample Gene Set Enrichment analysis (ssGSEA) and PTM Enrichment Analysis (PTM-SEA)
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 28 DOI reference(s) in README -
○Academic publication links
-
✓Committers with academic emails
4 of 6 committers (66.7%) from academic institutions -
✓Institutional organization owner
Organization broadinstitute has institutional domain (www.broadinstitute.org) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.3%) to scientific vocabulary
Keywords
Repository
Single sample Gene Set Enrichment analysis (ssGSEA) and PTM Enrichment Analysis (PTM-SEA)
Basic Info
Statistics
- Stars: 289
- Watchers: 12
- Forks: 79
- Open Issues: 9
- Releases: 0
Topics
Metadata Files
Readme.html
Readme ssGSEA2.0/PTM-SEA
Resources for gene-centric single sample Gene Set Enrichment Analysis (ssGSEA) of gene expression data (e.g. mRNAs, proteins) and site-centric PTM Signature Enrichment Analysis (PTM-SEA) [1] of phosphoproteomics data sets using the PTM signatures database (PTMsigDB) [1].
Disclaimer
The primary purpose of this repository is to supplement our manuscript in which we describe PTM-SEA and PTMsigDB. While ssGSEA2.0 presents an updated version of the original ssGSEA R implementation, we want to acknowledge that this is not the primary repository for ssGSEA. The official codebase for ssGSEA can be found here, and the official GenePattern module to perform ssGSEA can be accessed here.
ssGSEA 2.0
This is an updated version of the original ssGSEA [2,3] R-implementation. Depending on the input dataset and chosen database (gene sets or PTM signatures), the software performs either ssGSEA or PTM-SEA, respectively. The Molecular Signatures Database (MSigDB) [4] provides a large collection of curated gene sets. Gene sets are stored as plain text in GMT format. A current version of MSigDB gene set collections can be found in the
db/msigdbsubfolder. MSigDB gene sets are realeased under Creative Commons Attribution 4.0 International License. The license terms can be found in thedb/msigdbfolder.File formats supported by ssGSEA2.0/PTM-SEA are Gene Cluster Text GCT v1.2 or GCT v1.3 files. Morpheus provides a convenient way to convert your data tables into GCT format.
For more information about the GSEA method and MSigDB please visit http://software.broadinstitute.org/gsea/.
PTMsigDB v2.0.0
Please check out our new website for PTMsigDB. We have updated PTMsigDB to version v2.0.0 in which we provide better and more consistent annotation of each PTM site. We have also inlcuded a disease category comprising of signatures associated to certain diseases curated from the table
Disease-associated_sitesavailable at PhosphoSitePlus (PSP) [5].The PTM signatures database (PTMsigDB) is a database comprised of modification site-specific signatures of perturbations, kinase activities and signaling pathways curated from more than 2,500 publications which provides the foundation to perform PTM-SEA. A unique advantage of PTMsigDB over other pathway databases is the annotation of each PTM site with its reported direction of change upon a specific perturbation or signaling event which is incorporated into the scoring scheme of PTM-SEA. The foundation of PTMsigDB is PhosphoSitePlus (PSP) [5], a comprehensive systems biology resource for PTMs, which provides high-quality curation and annotation of PTMs at the individual residue level. A collection of PTM sites, whose levels are collectively regulated in a curated pathway or upon a perturbation, are defined as a signature set. Signature sets in PTMsigDB can be separated into different categories: 1) Perturbation signatures derived from treatment of cells with perturbagens such as small molecules or growth factors; 2) Signature sets of molecular signaling pathways; 3) Kinase-substrate signatures; and 4) Disease-associated signature sets.
To ensure a high degree of compatibility to phosphorylation datasets generated by different software packages and searched against different protein sequence databases, PTMsigDB represents signatures using three different identifiers to represent phosphorylation sites: 1) PSP site group ID; 2) UniProt-centric ID; 3) Flanking sequence (Table 1). While the PSP site group ID provides an unambiguous representation of PTM sites within protein families and across species [5], using this type of identifier restricts the analysis to PTM sites present in PSP. We generally recommend to using the flanking sequence as site identifier, since these are more invariant to updates made to protein sequence databases.
Database format Site accession Example in PTMsigDB Example in dataset Download UniProt-centric Uniprot_acc;site-type;direction Q06609;Y315-p;u Q06609;Y315-p human
mouse
ratFlanking sequence +/-7aa flanking seq-type;direction ETRICKIYDSPCLPE-p;u ETRICKIYDSPCLPE-p human
mouse
ratPSP site group id site_grp_id-type;direction 448324-p;u 448324-p human
mouse
ratTable 1: PTM site representation in PTMsigDB. The direction of change for a PTM site in a signature is indicated by ;u (up-regulation) or ;d (down-regulation). Please note that the annotation of directionality is a feature of PTMsigDB (column: Example in PTMsigDB) and must not be included when generating compatible site identifier for a particular dataset (column: Example in dataset).
PTM-SEA
PTM-Signature Enrichment Analysis (PTM-SEA) is a modified version of ssGSEA to perform site-specific signature analysis by scoring PTMsigDB’s bi-directional signature-sets. The input to PTM-SEA is a single site-centric data matrix, m, stored in GCT v1.2 or GCT v1.3 format and PTM signatures database (PTMsigDB). Each row in m represents a single phosphorylation site confidently localized to a specific amino acid residue, with measured abundances across samples specified in columns in m. Multiple phosphorylation sites detected on the same peptide have to be converted into separate site-specific entities for every site. While some proteomics software packages, such as MaxQuant [6], readily produce single site-centric PTM reports, the use of other software packages might require additional preprocessing steps.
How can I use these tools?
ssGSEA2.0/PTM-SEA can be run on a local PC/MAC in R or RStudio. In addition, ssGSEA2.0/PTM-SEA can be access on Broad’s public GenePattern [7] server. Below we provide instructions how to run ssSGEA2.0/PTM-SEA.
Example dataset
We provide an example dataset that can be used to test PTM-SEA. The dataset is based on Supplemental Table 6 in [1].
GenePattern
GenePattern is a powerful platform to deploy and run software or entire analysis pipelines in a web browser [7]. We have implemented ssGSEA2.0/PTM-SEA as GenePattern module which can be accessed at the link below. Please note that access to the public GenePattern server requires a free registration.
PTM-SEA in GenePattern: https://tinyurl.com/PTM-SEA-GP
R-GUI / RStudio
The script
ssgsea-gui.Rrequires little or no knowledge of R or on how to use the command line. Input files and databases can be specified via Windows file dialogs that will be automatically invoked. The first dialog lets you choose a folder containing input files in GCT v1.2 or GCT v1.3 format. The script loops over all GCT files in this directory and runs ssGSEA on each file separately. The second dialog window lets the user choose one or multiple gene set databases in GMT format such as MSigDB. A current version of MSigDB databases can be found in thedbsubfolder.Windows OS
To run the script source it into a running R-session.
iOS/MAC
In order to invoke file dialogs as decribed above, the XQuartz X Window System is required. Once installed
ssgsea-gui.Rcan be sourced into an R session.R Package
For use in R, Nicole Gay has created an R package that incorporates ssGSEA2.0, along with required dependencies for both R 3.6 and R >= 4.0. Instruction for use of the library can be found along with the package on GitHub.
Command line
For integration of ssGSEA2.0/PTM-SEA into your own analysis pipelines we recommend to use the
ssgsea-cli.Rscript which has been successfully tested on Windows, Mac and Linux OS. Please seessgsea-cli.R --helpfor instructions.Misc
ssGSEA2.0/PTM-SEA parameters
Other parameters for ssGSEA/PTM-SEA can be altered inside the parameters section in
ssgsea-gui.Ror as arguments on the command line. The default parameters have been choosen carefully and should provide reliable results for most use-case scenarios.Changes to the original ssGSEA R-implementation
Original code written by Pablo Tamayo. Adapted with additional modifications by D. R. Mani and Karsten Krug. Adaptions include:
- support of multiple CPU cores (
doParallelR-package)- support of GCT v1.3 format using functions from cmapR
- improved handling of missing values
- scoring of directional gene sets (PTMsigDB)
- basic error handling
- improvements in runtime performance
- additional output files like rank plots and parameter files
License
License Agreement for MSigDB v6.0 and above can be found here.
References
Krug, K., Mertins, P., Zhang, B., Hornbeck, P., Raju, R., Ahmad, R., . Szucs, M., Mundt, F., Forestier, D., Jane-Valbuena, J., Keshishian, H., Gillette, M. A., Tamayo, P., Mesirov, J. P., Jaffe, J. D., Carr, S. A., Mani, D. R. (2019). A curated resource for phosphosite-specific signature analysis, Molecular & Cellular Proteomics (in Press). http://doi.org/10.1074/mcp.TIR118.000943
Barbie, D. A., Tamayo, P., Boehm, J. S., Kim, S. Y., Susan, E., Dunn, I. F., . Hahn, W. C. (2010). Systematic RNA interference reveals that oncogenic KRAS- driven cancers require TBK1, Nature, 462(7269), 108-112. https://doi.org/10.1038/nature08460
Abazeed, M. E., Adams, D. J., Hurov, K. E., Tamayo, P., Creighton, C. J., Sonkin, D., et al. (2013). Integrative Radiogenomic Profiling of Squamous Cell Lung Cancer. Cancer Research, 73(20), 6289-6298. http://doi.org/10.1158/0008-5472.CAN-13-1616
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proceedings of the National Academy of Sciences of the United States of America, 102(43), 15545-15550. http://doi.org/10.1073/pnas.0506580102
Hornbeck, P. V., Zhang, B., Murray, B., Kornhauser, J. M., Latham, V., & Skrzypek, E. (2015). PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Research, 43(D1), D512-D520. https://doi.org/10.1093/nar/gku1267
Cox, J., & Mann, M. (2008). MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nature Biotechnology, 26(12), 1367-1372. https://doi.org/10.1038/nbt.1511
Reich, M., Liefeld, T., Gould, J., Lerner, J., Tamayo, P., & Mesirov, J. P. (2006). GenePattern 2.0. Nature Genetics, 38(5), 500-501. https://doi.org/10.1038/ng0506-500
Owner
- Name: Broad Institute
- Login: broadinstitute
- Kind: organization
- Location: Cambridge, MA
- Website: http://www.broadinstitute.org/
- Twitter: broadinstitute
- Repositories: 1,083
- Profile: https://github.com/broadinstitute
Broad Institute of MIT and Harvard
GitHub Events
Total
- Issues event: 12
- Watch event: 49
- Delete event: 1
- Issue comment event: 9
- Push event: 1
- Fork event: 1
Last Year
- Issues event: 12
- Watch event: 49
- Delete event: 1
- Issue comment event: 9
- Push event: 1
- Fork event: 1
Committers
Last synced: 10 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| karstenkrug | k****n@b****g | 64 |
| wcorinne | w****e@b****g | 20 |
| Munchic | k****m@g****m | 6 |
| Natalie Clark | n****k@b****g | 4 |
| D. R. Mani | m****r@b****g | 2 |
| Anna Calinawan | a****a@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 37
- Total pull requests: 3
- Average time to close issues: 5 months
- Average time to close pull requests: 10 months
- Total issue authors: 34
- Total pull request authors: 3
- Average comments per issue: 1.38
- Average comments per pull request: 0.33
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 9
- Pull requests: 1
- Average time to close issues: 3 months
- Average time to close pull requests: less than a minute
- Issue authors: 9
- Pull request authors: 1
- Average comments per issue: 0.44
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- shwetajohari (2)
- Lilab-SYSU (2)
- erikfel97 (2)
- newv-cell (1)
- nicolerg (1)
- realdwang (1)
- wanghao1991217 (1)
- snijesh (1)
- ruzy99 (1)
- julia-aguade (1)
- jasiozaucha (1)
- rela-v (1)
- aroon-sg-zz (1)
- qianxu05172019 (1)
- Fatberg (1)
Pull Request Authors
- nmclark2 (2)
- kant (1)
- annapamma (1)