https://github.com/bartongroup/ank-analysis
This repository contains the main bits of code used in my analysis of the ankyrin protein repeat family
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
This repository contains the main bits of code used in my analysis of the ankyrin protein repeat family
Basic Info
- Host: GitHub
- Owner: bartongroup
- License: mit
- Language: Jupyter Notebook
- Default Branch: master
- Size: 92.1 MB
Statistics
- Stars: 0
- Watchers: 5
- Forks: 0
- Open Issues: 0
- Releases: 0
Created about 6 years ago
· Last pushed almost 3 years ago
https://github.com/bartongroup/ANK-analysis/blob/master/
# Ankyrin repeats in context with human genetic variation This repository contais a set of notebooks and libraries used to analyse the distribution of missense variants within the ankyrin repeat motif and explain the observed patterns with structural features such as intra-domain contacts, residue solvent accessibility (RSA) or protein-protein interactions. Included in this repository, in the ``/files`` directory, are the main input files needed to run the notebook. These files are the multiple sequence alignment containing the 7,407 reviewed repeat sequences used in this analysis as well as the tables resulting from the packages _VarAlign_ and _ProIntVar_. The ``data_extraction.ipynb`` notebook contains the functions necessary to download all the ankyrin repeat annotation records found in reviewed proteins from UniProt and InterPro. These include manually curated annotations from UniProt as well as annotations from the ProSitem SMART, PFAM and PRINTS databases. Once downloaded, the records are merged in a dataframe and saved. The functions used are contained within the ``retrieve_data_allsp`` library. In the ``database_integration`` notebook, the sets of annotations belonging to different database signatures are merged sequentially in a specific order. This order was established according to the number and quality of the annotations for each database. From higher to lower confidence: Prosite (PS50088), SMART (SM00248), UniProt, PRINTS (PR01415), PFAM (PF00023) and PFAM (PF13606). This procedure, results in a non-redundant set of 7,407 ankyrin repeat sequences. In the notebook named ``upset``, we plot the intersection betweeen the different sets of ankyrin repeat annotations coming from the different databases. The methodology followed to align the 7,407 unique ankyrin repeat sequences is contained within the ``alignment`` notebook. The main functions used to align the sequences are found in the ``new_aligner``, ``alignment_editing`` and ``slaver`` python libraries. In ``variant_analysis`` notebook we analyse the conservation of the family and the distribution of missense variants along the ankyrin repeat motif. Specifically, we calcualte enrichment in missense variation per position in the motif. The functions used are contained within the ``variant_analysis.py`` library. We download the validation data for all the X-ray structures that mapped to our alignment, process them, and merge them in a single dataframe on the ``structure_validation.ipynb`` notebook. The functions used are contained within the ``structure_validation.py`` library. Finally, the last two notebooks: ``contact_maps`` and ``structural_analysis``. The former includes the creation of the contact maps, the enrichment in intra- and inter-repeat contacts per position. The latter analyses other structural features such as the relative solvent accessibility (RSA), secondary structure (SS) and enrichment in protein-substrate interactions, both on a position and surface basis. The functions used can be found in the ``contact_maps.py`` and ``structural_analysis.py``libraries. [](https://zenodo.org/badge/latestdoi/274636823)
Owner
- Name: Geoff Barton's Computational Biology Group
- Login: bartongroup
- Kind: organization
- Location: Dundee, Scotland, UK
- Website: https://www.compbio.dundee.ac.uk
- Twitter: bartongrp
- Repositories: 57
- Profile: https://github.com/bartongroup