degronopedia-ml-psi

Predict Protein Stability Index from the sequence

https://github.com/filipspl/degronopedia-ml-psi

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.8%) to scientific vocabulary

Keywords

bioinformatics bioinformatics-analysis catboost degrons ml prediction
Last synced: 6 months ago · JSON representation ·

Repository

Predict Protein Stability Index from the sequence

Basic Info
  • Host: GitHub
  • Owner: filipsPL
  • License: mit
  • Language: Python
  • Default Branch: main
  • Homepage: https://degronopedia.com/
  • Size: 7.21 MB
Statistics
  • Stars: 5
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Topics
bioinformatics bioinformatics-analysis catboost degrons ml prediction
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme License Citation Zenodo

README.md

Predict Protein Stability Index (PSI) from the sequence

Program to predict Protein Stability Index (PSI) from the sequence. ML models were developed based on experimental stability datasets for a 24/23-mer covering the N-/C-terminus of the human proteome using the CatBoost regressor. The performance of the final models was evaluated using the testing set and an R2 coefficient, reaching the values of 0.796/0.812 for the N-terminus with initiator methionine cleaved/not cleaved, respectively, and 0.815 for the C-terminus (the highest possible value of R2 coefficient is 1). See the paper for details and the DEGRONOPEDIA Tutorial for more information.

The web version of this tool (and much more!) is available at: https://degronopedia.com/ which is a web server for screening for degron motifs and providing insights into the possible degradation of your favorite proteins by the ubiquitin-proteasome system.

action status Tested for python versions 3.8, 3.9, 3.10, and 3.11.

DOI

Set up

```sh

clone the repo

git clone --depth 1 https://github.com/filipsPL/degronopedia-ml-psi

create a new conda environment

conda env create -f conda.yml ```

Usage

Input is a sequence in a plain text format, eg:

text MEEPQSDPSVEPPLSQETFSDLWKLLPENNVLSPLPSQAMDDLMLSPDDIEQWFTEDPGPDEAPRMPEAAPPVAPAPAAPTPAAPAPAPSWPLSSSVPSQKTYQGSYGFRLGFLHSGTAKSVTCTYSPALNKMFCQLAKTCPVQLWVDSTPPPGTRVRAMAIYKQSQHMTEVVRRCPHHERCSDSDGLAPPQHLIRVEGNLRVEYLDDRNTFRHSVVVPYEPPEVGSDCTTIHYNYMCNSSCMGGMNRRPILTIITLEDSSGNLLGRNSFEVRVCACPGRDRRT

Options:

  • the file with sequence
  • which terminus predict PSI for. Choices:
    • C for C-terminus
    • NiMetNo for N-terminus with initiator Met cleaved
    • NiMetYes for N-terminus with initiator Met NOT cleaved

Running the program:

```sh

activate the environment

conda activate dp

run the program

./calculate-desc.py --sequence sequence.txt --type NiMetYes ```

Output will be:

text N-terminus with initiator Met NOT cleaved Predicted PSI: 5.24

Interpretation

Predictions are made based on the datasets of experimental PSI values, which describe the stability of protein N-/C-terminus in an artificial system where 23-mers covering the termini of nearly entire human proteome were conjugated to GFP protein, and their stability was measured relative to the stability of DsRed protein translated from the same transcript using the Global Protein Stability (GPS) high-throughput technique (Koren et al., 2018 and Timms et al., 2019). Therefore, these values provide insight into the stability of the N-/C-terminus of the query but to a limited extent. Several peptides with low PSI values were experimentally validated to be degraded by the cullin-RING E3 ligase complexes by the authors of the aforementioned GPS studies. However, medium or higher PSI values do not rule out the regulation of such termini by N-/C-degron pathways, as other factors may influence this, including tissue specificity, posttranslational modifications, stress conditions, etc.

As the ML training set consist of human peptides, we recommend to run the PSI prediction for sequences from higher mammals only.

Tests

To run tests on sample sequences, execute tests.sh.

Authors

Szulc, N. A., Stefaniak, F.

How to cite

Szulc, N. A., Stefaniak, F., Piechota, M., Soszyńska, A., Piórkowska, G., Cappannini, A., Bujnicki, J. M., Maniaci, C., & Pokrzywa, W. (2024). DEGRONOPEDIA: A web server for proteome-wide inspection of degrons. Nucleic Acids Research, 52(W1), W221–W232. https://doi.org/10.1093/nar/gkae238

Owner

  • Name: filips
  • Login: filipsPL
  • Kind: user
  • Location: Warsaw, Poland
  • Company: @thervira @genesilico

- computer aided drug design + medicinal chemistry - python programming, web devel - ML - QSAR, tox prediction - :swimmer: :bicyclist: :runner:

Citation (CITATION.cff)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1.2.0
title: >-
  Program to predict Protein Stability Index (PSI) from the sequence.
message: Program to predict Protein Stability Index (PSI) from the sequence.
type: software
authors:
  - given-names: "Natalia A. Szulc "
    email: nszulc@iimcb.gov.pl
    affiliation: >-
      International Institute of Molecular and Cell
      Biology in Warsaw, Warsaw, Poland
    orcid: "https://orcid.org/0000-0002-2991-3634"
  - orcid: "https://orcid.org/0000-0001-5758-9416"
    given-names: Filip Stefaniak
    affiliation: >-
      International Institute of Molecular and Cell
      Biology in Warsaw, Warsaw, Poland
    email: fstefaniak@iimcb.gov.pl
identifiers:
  - type: doi
    value: 10.5281/zenodo.7498782
    description: Zenodo repository
  - type: url
    value: >-
      https://github.com/filipsPL/degronopedia-ml-psi
    description: GitHub repository
repository-code: "https://github.com/filipsPL/degronopedia-ml-psi"
url: "https://github.com/filipsPL/degronopedia-ml-psi"
abstract: >-
  Program to predict Protein Stability Index (PSI) from the sequence. ML models were developed based on experimental stability datasets for a 24/23-mer covering the N-/C-terminus of the human proteome using the CatBoost regressor.
keywords:
  - bioinformatics
  - PSI
  - degrons
  - machine learning
  - catboost
license: Apache-2.0

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Issues and Pull Requests

Last synced: about 1 year ago

All Time
  • Total issues: 0
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 0
  • Total pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels

Dependencies

.github/workflows/thefirst.yml actions
  • actions/checkout v2 composite
  • conda-incubator/setup-miniconda v2.0.1 composite