ymap

An automated method to map yeast variants to proteins modifications and functional regions

https://github.com/csb-kul/ymap

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.5%) to scientific vocabulary

Keywords

mutated-proteins protein-domains protein-modification protein-protein-interaction protein-structure python systems-biology yeast
Last synced: 6 months ago · JSON representation

Repository

An automated method to map yeast variants to proteins modifications and functional regions

Basic Info
Statistics
  • Stars: 6
  • Watchers: 2
  • Forks: 6
  • Open Issues: 1
  • Releases: 0
Topics
mutated-proteins protein-domains protein-modification protein-protein-interaction protein-structure python systems-biology yeast
Created almost 10 years ago · Last pushed over 8 years ago
Metadata Files
Readme License

README.md

yMap - Yeast Genotype to Phenotype Map (release year 2016)

yMap is a python based fast and robust automated method to map large yeast variants data to

        - proteins post-translational modifications 

        - proteins domains, 

        - proteins-nucleotide binding domains, 

        - proteins structural regions, 

        - proteins active and binding sites 

        - proteins networks visualisation. 

The post-translational modifications in yMap are collected from different repositories like UniProt and sources with annotated PTMs like PTMcode 2.0 and PTMfunc, for more details, see below.

In a user friendly three steps, it generates a "final-report" file to report all the non-synonymous mutations that overlaps or falls inside the above mentioned proteins functional regions. The final-report is complemented with two other files; enrichment and visualisation id file

Dependencies

yMap depends on:

    python 2.6.x

    python 3.x

    Orange bioinformatics (http://pythonhosted.org/Orange-Bioinformatics/#installation)

Video demo

Alt text for your video

PyPi

https://pypi.python.org/pypi/ymap

Installation

        pip install ymap

Usage

        step1: $ ydata          #download all the data needed for proper execution of ymap

        step2: copy and paste the "mutation file" to the present directory

        step3: $ yproteins      #if starting file contains the mutations at proteins level 
                                    (SEE example_mutation_file/mutation.txt).

        step3: $ ygenes         #if starting file contains the mutations at chromosomes level with genetic                                                    coordinates (SEE example_mutation_file/mutated_proteins.txt).

        step4: $ yweb        # generates the html based visualization of mutated proteins on BioGrid db.
                (NOTE: a user will required to specify the 'path/to/biog.txt' as input, when asked)

*To run from source code:

    Change path to directory containing ymap.py

    $python ymap.py -d ydata (step1)

    $pyhton ymap.py -p yproteins (step3)

    $pyhton ymap.py -g ygenes (step3)

    $pyhton ymap.py -w yweb (step4)

Contents:

Introduction to different types of data (generated/provided in yMap) Introduction to all the methods Results (introduction to results data) Troubleshoots

Introduction to data:

—————input———

A - mutation (tab separated txt "mutated_proteins.txt") file contains proteins common names and mutated residues positions (please following the exact naming convention of input files as in example data, for proper execution of ymap; see example data))

———output———(Pre-analysis data needed for ymap execution)

(i) Raw files downloaded from UniProt and stored in the present dir.by executing step2.

1 - uniprotmodraw.txt # Uniprot data in raw format

2 - yeastID.txt # Yeast id containing file

3 - PTMs.txt # contains yeast proteins, PTMs position and PTM types

4 - PTMidfile.txt # combined file of 2 and 3.

5 - domains.txt # yeast proteins, domains start, end and names

6 - id_domain.txt # combined file of 2 and 5.

7 - bact.txt # contains proteins id, and binding and active sites positions

8 - sites_id.txt # combined file of 2 and 7.

9 - uniprot_bioGrid.txt # contains all the yeast proteins with BioGrid ids

(i-B) Pre downloaded files from PTMcode and PTMfunc

(PTMfunc)

3DIDaceksitesinterfaceRes_sc.txt

3DIDphosphositesinterfaceRes_sc.txt

3DIDubisitesinterfaceRessc_sc.txt

SCpsitesinteractions_sc.txt

SCubiinteractions_sc.txt

SCacetinteractions.txt

(PTMcode)

schotspot.txt

scbtwproteins.txt

scwithinproteins.txt

(ii) Processed data from UniProt and other resources by executing step2.

A number of files germinated from the original UniProt file for further analyses:

PTMs.txt
contains Post-translational modifications

PTMidfile.txt
PTMs.txt with all the proteins ids

PDB.txt
contains PDB structural data from UniProt

nucleotide.txt
contains DNA-Protein binding motifs

back.txt
contains Proteins active and binding positions

didmap.txt
contains protein domains with all the ids

id_domain.txt
gff data from frmt.txt with all the ids

domains.txt
domains data from UniProt

frmt.txt
formatted gff file for further process

sites_id.xt
Active/binding sites with all ids

unipro_bioGrid.txt
contains BioGrid ids of all yeast proteins

nucleotide.txt
proteins (uniprot) id with DNA binding motifs

id_nucleotide.txt
contains data from nucleotide with all the protein ids for processing

Results

(inside ymap-results folder, each subfolder contains three files, one with mutations analysis file, which includes mutated proteins, mutation positions, mutated functional region and source of data, pvalue.txt of pathways enrichments and biog.txt, a biogrid id corresponding to mutated proteins)

/PTMs/mutated_proteins.txt  
    contains proteins ids mutated at PTMs sites

/Domains/domains_mapped.txt 
    contains proteins ids mutated for protein domains

/A-B_binding/ab_mutation_file.txt   
    contains proteins ids mutated at active and binding 


PPI - PTMfunc data

    PPI/acetylation
    PTM-type containing residue is important in PPI

    PPI/Phosphorylation
    PTM-type containing residue is important in PPI

    PPI/ubiquitination
    PTM-type containing residue is important in PPI

Interface

    Interface/ubiquitination
    PTM-type containing residue present at protein interface 

    Interface/acetylation
    PTM-type containing residue present at protein interface 

    Interface/Phosphorylation
    PTM-type containing residue present at protein interface 

PTMs_hotSpot
    PTMs concentrated in a small motif known as hopspot by Beltrao et al. Cell 2012.

PTMs_between_proteins - PTMcode2.0 data
    PTMs present between two proteins and involvined in crosstalk. 

PTMs_witnin_proteins
    PTMs present within a protein and involvined in crosstalk.

biog.txt            
    contains proteins BioGrid ids for -w web function (this file present in each subfolder).

p-value.txt         
    contains pathways enrichments for each type of mutation observed (this file present in each subfolder).

final_report.txt        
    its a refined version of summary.txt and contains, protein UniProt id, common names, amino acid mutation position, wild type amino acid, mutated amino acid, type of mutation (non-synonymous/stop codon), mutation feature types (i.e. PTM-type or domain-name etc), mutation feature (i.e. PTMs, domain or another) and source of data (e.g. UniProt)

Introduction to all the methods

    (How individual methods work in ymap)

NOTE: change the name of the mutations containing file to ‘mutated_proteins.txt’ (see example data) and copy to the cd path/to/ymap

Functions name Description

mutationtypesfile() mutation type and amino acid change calculation (where ref. and mutant base known)

pTMdata()
Downloads UpiProt data as a raw txt file (uniprotmodraw.txt)

clean()
cleans file 'uniprotmodraw.txt' into a tab separated’PTMs.txt'

iD()
This method retrieves the different ID types for maping (yeastID.txt)

pmap()
if proteins ids are not SDG or uniprot or common names, this method maps the ids

ptm_map()
This method maps the overlap between mutated codons from previous method to the PTM sites

dclean()
domain data needed to be filters from UniProt file, before mapping domains dmap()
maps mutations to the yeast domains (id
domain.txt)

dmap()
map mutations to proteins domains (domains_mapped.txt)

enrich()
This method performed enrichment analysis of mutated proteins and return the p value of functional enrichment of mutated proteins at different functional regions/residues; see main text for how pvalue is calculated. ab()
prepares raw Uniprot data (uniprotmodraw.txt) for yeast active and binding sites mutation analysis (bact.txt)

id()
maps proteins ids to active and binding sites containing proteins (sites_id.txt)

mmap()
map mutations to proteins active and bindings sites (abmutationfile.txt)

nucleotide()
prepares the UniProt data for the nucleotide motifs mapping to mutations

n_map()
maps different proteins ids to nucleotides data

nucleotide_map()
maps mutations to the nucleotide binding motifs

bioGrid()
Downloads BioGrid ids of yeast proteins from UniProt for further processing including mapping and web browsing WARNING: requires powerful machines to work with as its expensive to open in machines with low memory.

preWeb() maps mutations to BioGrid ids (biog.txt)

bweb() opens the BioGrid db in browser with as many tabs as mutated proteins

pdb_c() Structure data filtration from UniProt

mu_map() mutations proteins mapped to the yeastID file

pdb() This code maps mutations to the proteins structural regions

interface() PTM present at the interface of two proteins and known to play role in interaction (Beltrao et al. Cell 2012)

ppi() PTM present at the interface of two proteins and known to play role in interaction (Beltrao et al. Cell 2012)

withinPro() PTMs (predicted) involved in the crosstalk within a given protein at baker's years (Minguez el 2012)

betweenPro() PTMs (predicted) involved in the crosstalk in different proteins at baker's years (PTMcode 2.0; Minguez el 2012)

hotspot() PTMs containing motifs in a close proximity are named hotspots (Beltrao et al. Cell 2012)

Troubleshoots

1 - The files of annotated PTMs are missing or less them nine.

Reason: unzip the data/PTMcode+PTMfuncdata/scbtwproteins.txt.zip did not worked in $ ydata command. how to correct: manually unzip the scbtw_proteins.txt.zip file and run $ ydata (normally this will not needed)

2 - $ ygenes gives an error message:

“IndexError: string index out of range”

2(b) - The same reason (below) leads to the unsuccessful mapping of mutations to different functional regions like domains:

"Error: input file contains error position forBRR2protein"

Reason: the mutations positions fall outside the start and end of the respective proteins (NOTE: to analyse the proteins in starting file with correct mutation positions, user can use individual methods uniprotdata() and functionaldata(), to get all the analyses done, than execute the command-line step3)

how to correct: Look at the positions of mutations and compare them manually if they correspond to start and end positions of a protein, if not, correct the problem and re-run $ ygenes command.

3 - yweb fails to locate the directory.

how to correct: In python 2.x, the path should be given as “path/to/biog.txt” but in python 3.x it’s without inverted commas, path/to/biog.txt

Reference

Ahmed Arslan and Vera van Noort, yMap: An automated method to map yeast variants to protein modifications and functional regions Bioinformatics October 22, 2016 doi:10.1093/bioinformatics/btw658

Contributors

        http://www.biw.kuleuven.be/CSB/

This work is supported by KU Leuven research fund.

GitHub Events

Total
Last Year

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 214
  • Total Committers: 2
  • Avg Commits per committer: 107.0
  • Development Distribution Score (DDS): 0.005
Top Committers
Name Email Commits
AhmedArslan a****n@k****e 213
Michiel van Setten m****n@g****m 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 0
  • Total pull requests: 2
  • Average time to close issues: N/A
  • Average time to close pull requests: 3 minutes
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
  • setten (2)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 29 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 11
  • Total maintainers: 1
pypi.org: ymap

An automated method to map yeast variants to protein modifications and functional regions

  • Versions: 11
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 29 Last month
Rankings
Dependent packages count: 10.0%
Forks count: 13.3%
Average: 19.7%
Stargazers count: 21.5%
Dependent repos count: 21.7%
Downloads: 32.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

setup.py pypi
  • install_require ,