monster_promoca

This repository contains the folders of the tools MOnSTER and PRO-MOCA

https://github.com/plant-net/monster_promoca

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

This repository contains the folders of the tools MOnSTER and PRO-MOCA

Basic Info
  • Host: GitHub
  • Owner: Plant-Net
  • License: gpl-3.0
  • Language: Jupyter Notebook
  • Default Branch: main
  • Size: 19.2 MB
Statistics
  • Stars: 2
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created about 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License Citation

README.md

MOnSTER_PROMOCA

This repository contains the folders of the tools MOnSTER and PRO-MOCA

Install dependencies:

Move to the directory ./MOnSTER and run pip install -e in the terminal. \

Software tested with: - Python (v3.8) - pandas (v1.2.4) - numpy (v1.21.2) - UpSetPlot (v0.8.0) - matplotlib (v3.7.1) - scipy (v1.10.1) - radialtree (v0.1.0) - re (v2.2.1) - Bio (Biopython) (v1.70) - sklearn (v1.1.1)

MOnSTER required files:

  • File with enriched motifs in the positive class (one per line)
  • File with protein sequences in the positive class (FASTA format)
  • File with protein sequences in the negative class (FASTA format)

Demo-MOnSTER

In the folder MOnSTER/demo/, the 0_DEMO_MOnSTER.ipynb demo is available step by step, for general purposes.

MOnSTER/demo/1_OOmy_DEMO_MOnSTER.ipynb and MOnSTER/demo/2_PPN_DEMO_MOnSTER.ipynb contains the script used to analyze the data in the manuscript.

The analyzed data are stored in MOnSTER/data/ for the list of motifs and in MOnSTER/data/datasets/ for the sequences of effectors and non effectors.

PROMOCA required files:

  • MOnSTER output MOnSTER_results/clustering_and_CLUMPs/df_motifs_CLUMPs.tsv containing two columns motif|belonging CLUMP ### Demo-PROMOCA In the folder PRO_MOCA/notebooks, the 0_DEMO_PRO_MOCA.ipynb demo is available step by step.

About MOnSTER

MOnSTER is a pipeline to find CLUMPs (CLUster of Motifs of Proteins) that are discriminative between two datasets of sequences that will be called positive and negative.

The CLUMPs are found by gathering together motifs that show similar physicochemical properties. These physicochemical properties can show different ranges of values, for this reason, we propose known 3 methods of scaling the data for this step.

After clustering, the user will visualize the resulting dendogram. To find the best compromise between number of CLUMPs and having sufficiently different CLUMPs, MOnSTER employes the Davies Bouldin score. It is then up to the user to choose number of intervals of distances to find the best cut of the tree to obtain the most reasonable number of CLUMPs. (see doc for more information)

To determine which are the most discriminative CLUMPs, MOnSTER calculates the MOnSTER score for each of them. The score is between 0 and 2 and takes into account: the physicochemical properties of the CLUMPs and the two datasets, and the occurrences of the motifs and the CLUMPs in the two datasets. The physicochemical properties that are employed for the MOnSTER_score are those that show a statistically significant difference in the two datasets.

MOnSTER scores the CLUMPs the highest if they show values of these physicochemical properties that are in line with the positive dataset and if they are more present in the positive dataset compared to the negative one.

About PRO-MOCA

PRO-MOCA (PROtein-MOtif Characteristics Aligner) is an aligner of motif sequences. PRO-MOCA aligns short sequences based on the characteristics of the AAs rather than the AAs themselves.

The algorithm doesn't employ a substitution matrix. Instead, PRO-MOCA employes a classification scheme of the AAs to convert the motifs into patterns of letters. Then looks for a central position of the patterns, where a certain letter is repeated more than in the others. Based on the central position of the alignment, it then calculates the number of gaps ('_') to add to the right and to the left of all the patterns, after centering them. At the end it reconverts the patterns into motifs, and gives a list of motifs with gaps to the left and to the right.

No gaps are admitted into the sequences, only at the extremities. This is because motifs are very short sequences hence introducing gaps could mean losing the information of the motif itself.

Owner

  • Name: Plant-Net
  • Login: Plant-Net
  • Kind: organization

Citation (CITATION.cff)

cff-version: 1.0.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Calia"
  given-names: "Giulia"
- family-names: "Porracciolo"
  given-names: "Paola"
- family-names: "Kozlowski"
  given-names: "Djampa"
- family-names: "Danchin"
  given-names: "Etienne G. J." 
- family-names: "Bottini"
  given-names: "Silvia"
title: "MOnSTER_PROMOCA"
version: 1.0.0
doi: https://doi.org/10.5281/zenodo.11368310
date-released: 2024-05-28
url: "https://github.com/Plant-Net/MOnSTER_PROMOCA"

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1