https://github.com/bartongroup/ank-analysis

This repository contains the main bits of code used in my analysis of the ankyrin protein repeat family

https://github.com/bartongroup/ank-analysis

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.7%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

This repository contains the main bits of code used in my analysis of the ankyrin protein repeat family

Basic Info
  • Host: GitHub
  • Owner: bartongroup
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Size: 92.1 MB
Statistics
  • Stars: 0
  • Watchers: 5
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created about 6 years ago · Last pushed almost 3 years ago

https://github.com/bartongroup/ANK-analysis/blob/master/

# Ankyrin repeats in context with human genetic variation
This repository contais a set of notebooks and libraries used to analyse the distribution of missense variants within the ankyrin repeat motif and explain the observed patterns with structural features such as intra-domain contacts, residue solvent accessibility (RSA) or protein-protein interactions. Included in this repository, in the ``/files`` directory, are the main input files needed to run the notebook. These files are the multiple sequence alignment containing the 7,407 reviewed repeat sequences used in this analysis as well as the tables resulting from the packages _VarAlign_ and _ProIntVar_.

The ``data_extraction.ipynb`` notebook contains the functions necessary to download all the ankyrin repeat annotation records found in reviewed proteins from UniProt and InterPro. These include manually curated annotations from UniProt as well as annotations from the ProSitem SMART, PFAM and PRINTS databases. Once downloaded, the records are merged in a dataframe and saved. The functions used are contained within the ``retrieve_data_allsp`` library.

In the ``database_integration`` notebook, the sets of annotations belonging to different database signatures are merged sequentially in a specific order. This order was established according to the number and quality of the annotations for each database. From higher to lower confidence: Prosite (PS50088), SMART (SM00248), UniProt, PRINTS (PR01415), PFAM (PF00023) and PFAM (PF13606). This procedure, results in a non-redundant set of 7,407 ankyrin repeat sequences.

In the notebook named ``upset``, we plot the intersection betweeen the different sets of ankyrin repeat annotations coming from the different databases.

The methodology followed to align the 7,407 unique ankyrin repeat sequences is contained within the ``alignment`` notebook. The main functions used to align the sequences are found in the ``new_aligner``, ``alignment_editing`` and ``slaver`` python libraries.

In ``variant_analysis`` notebook we analyse the conservation of the family and the distribution of missense variants along the ankyrin repeat motif. Specifically, we calcualte enrichment in missense variation per position in the motif. The functions used are contained within the ``variant_analysis.py`` library.

We download the validation data for all the X-ray structures that mapped to our alignment, process them, and merge them in a single dataframe on the ``structure_validation.ipynb`` notebook. The functions used are contained within the ``structure_validation.py`` library.

Finally, the last two notebooks: ``contact_maps`` and ``structural_analysis``. The former includes the creation of the contact maps, the enrichment in intra- and inter-repeat contacts per position. The latter analyses other structural features such as the relative solvent accessibility (RSA), secondary structure (SS) and enrichment in protein-substrate interactions, both on a position and surface basis. The functions used can be found in the ``contact_maps.py`` and ``structural_analysis.py``libraries.

[![DOI](https://zenodo.org/badge/274636823.svg)](https://zenodo.org/badge/latestdoi/274636823)

Owner

  • Name: Geoff Barton's Computational Biology Group
  • Login: bartongroup
  • Kind: organization
  • Location: Dundee, Scotland, UK

GitHub Events

Total
Last Year