https://github.com/bartongroup/proteofav

Open-source framework for simple and fast integration of protein structure data with sequence annotations and genetic variation

https://github.com/bartongroup/proteofav

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Committers with academic emails
    3 of 4 committers (75.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.6%) to scientific vocabulary

Keywords

annotations bioinformatics data-analysis dssp features mmcif pandas pdb python sifts structural-biology variants
Last synced: 6 months ago · JSON representation

Repository

Open-source framework for simple and fast integration of protein structure data with sequence annotations and genetic variation

Basic Info
  • Host: GitHub
  • Owner: bartongroup
  • License: mit
  • Language: Python
  • Default Branch: master
  • Homepage:
  • Size: 18.4 MB
Statistics
  • Stars: 4
  • Watchers: 5
  • Forks: 0
  • Open Issues: 13
  • Releases: 1
Topics
annotations bioinformatics data-analysis dssp features mmcif pandas pdb python sifts structural-biology variants
Created over 10 years ago · Last pushed almost 2 years ago
Metadata Files
Readme Changelog Contributing License Authors

README.rst

ProteoFAV
=========

**Protein Features, Annotations and Variants**

|Pypi| |Build Status| |Documentation| |Python: versions| |License|

.. |Pypi| image:: https://img.shields.io/pypi/v/proteofav.svg
  :target: https://pypi.python.org/pypi/proteofav
.. |Build Status| image:: https://img.shields.io/travis/bartongroup/proteofav.svg
  :target: https://travis-ci.org/bartongroup/proteofav
.. |Documentation| image:: https://readthedocs.org/projects/proteofav/badge/?version=latest
  :target: https://proteofav.readthedocs.io/en/latest/?badge=latest
  :alt: Documentation Status
.. |Coverage Status| image:: https://coveralls.io/repos/github/bartongroup/proteofav/badge.svg?branch=master
  :target: https://coveralls.io/github/bartongroup/proteofav?branch=master
.. |Health| image:: https://landscape.io/github/bartongroup/proteofav/master/landscape.svg?style=flat
  :target: https://landscape.io/github/bartongroup/proteofav/master
.. |Pyup| image:: https://pyup.io/repos/github/bartongroup/proteofav/shield.svg
   :target: https://pyup.io/repos/github/bartongroup/proteofav/
   :alt: Updates
.. |License| image:: http://img.shields.io/badge/license-MIT-brightgreen.svg?style=flat
  :target: https://github.com/bartongroup/proteofav//blob/master/LICENSE.md
.. |Python: versions| image:: https://img.shields.io/badge/python-3.5,_3.6-blue.svg?style=flat
   :target: http://travis-ci.org/bartongroup/proteofav/

ProteoFAV is a Python module that address the challenge of cross-mapping protein structures and protein sequences, allowing for protein structures to be annotated with sequence features and annotations. It implements methods for working with protein structures (via mmCIF, PDB, PDB Validation, DSSP and SIFTS files), sequence Features (via UniProt GFF annotations) and genetic variants (via UniProt/EBI Proteins, Ensembl REST and TCGA TCGA Pan cancer APIs). Cross-mapping of structure and sequence is performed with the aid of SIFTS.

ProteFAV relies heavily in the `Pandas`_ library to quickly load data into DataFrames for fast data exploration and analysis. Structure and sequence data are parsed/fetched onto Pandas DataFrames that are then merged-together (collapsed) onto a single DataFrame.

Data such as protein structures (sequence and atom 3D coordinates) and respective annotations (from structural analysis, e.g. interacting interfaces, secondary structure and solvent accessibility), as well as protein sequences and annotations (e.g. genetic variants, and other functional information obtained from SIFTS and UniProt) are handled by the classes/methods so that each modular (component) table can be integrated onto a single 'merged table'.

.. image:: proteofav.png
   :width: 20pt

The methods implemenented in ``proteofav/mergers.py`` allow for the different components to be merged together onto a single Pandas DataFrame.


Getting Started
---------------

Dependencies
~~~~~~~~~~~~

ProteoFAV was developed to support Python 3.5+ and Pandas 0.20+. Check `requirements`_ for specific requirements.

.. _requirements: https://github.com/bartongroup/ProteoFAV/blob/master/requirements.txt


Installation
~~~~~~~~~~~~

To install the stable release, run this command in your terminal:

.. code-block:: console

    $ pip install proteofav

If you don't have `pip`_ installed, this `Python installation guide`_ can guide you through the process.

.. _pip: https://pip.pypa.io
.. _Python installation guide: http://docs.python-guide.org/en/latest/starting/installation/


Installing from source in a virtual environment
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Getting ProteoFAV:

.. code-block:: bash

    $ wget https://github.com/bartongroup/ProteoFAV/archive/master.zip -O ProteoFAV.zip
    $ unzip ProteoFAV.zip

    # alternatively, cloning the git repository
    $ git clone https://github.com/bartongroup/ProteoFAV.git


Installing with Virtualenv:

.. code-block:: bash

    $ virtualenv --python `which python` env
    $ source env/bin/activate
    $ pip install -r requirements.txt
    $ python path/to/ProteoFAV/setup.py install


Installing With Conda:

.. code-block:: bash

    $ conda-env create -n proteofav -f path/to/ProteoFAV/requirements.txt
    $ source activate proteofav
    $ cd path/to/ProteoFAV
    $ pip install .


Testing the installation
~~~~~~~~~~~~~~~~~~~~~~~~

Test dependencies should be resolved with:

.. code-block:: bash

    $ python path/to/ProteoFAV/setup.py develop --user


Run the Tests with:

.. code-block:: bash

    $ python path/to/ProteoFAV/setup.py test
    # or
    $ cd path/to/ProteoFAV/tests
    $ python -m unittest discover


ProteoFAV Configuration
~~~~~~~~~~~~~~~~~~~~~~~

ProteoFAV uses a configuration file ``config.ini`` where the user can specify the directory paths, as well as urls for commonly used data sources.

After installing run:

.. code-block:: bash

    $ proteofav-setup


Example Usage
-------------

Example usage is currently provided as a `Jupyter Notebook`, which can be viewed with the `GitHub's`_ file viewer or with the Jupyter `nbviewer`_.

You can download the Jupyter notebook from `GitHub`_ and test it with your ProteoFAV's installation.

.. _GitHub's: https://github.com/bartongroup/ProteoFAV/blob/master/Examples.ipynb
.. _nbviewer: https://nbviewer.jupyter.org/github/bartongroup/ProteoFAV/blob/master/Examples.ipynb
.. _GitHub: https://github.com/bartongroup/ProteoFAV


Contributing and Bug tracking
-----------------------------

Feel free to fork, clone, share and distribute. If you find any bugs or issues please log them in the `issue tracker`_.

Before you submit your *Pull-requests* read the `Contributing Guide`_.

Credits
-------

See the `Credits`_


Changelog
---------

See the `Changelog`_


Licensing
---------

The MIT License (MIT). See `license`_ for details.

.. _requirements: https://github.com/bartongroup/ProteoFAV/blob/master/requirements.txt
.. _license: https://github.com/bartongroup/ProteoFAV/blob/master/LICENSE.md
.. _issue tracker: https://github.com/bartongroup/ProteoFAV/issues
.. _docs: https://github.com/bartongroup/ProteoFAV/blob/master/docs/index.rst
.. _Pandas: http://pandas.pydata.org/
.. _Contributing Guide: https://github.com/bartongroup/ProteoFAV/wiki/Contributing-Guide
.. _Changelog: https://github.com/bartongroup/ProteoFAV/blob/master/CHANGELOG.rst
.. _Credits: https://github.com/bartongroup/ProteoFAV/blob/master/AUTHORS.rst

Owner

  • Name: Geoff Barton's Computational Biology Group
  • Login: bartongroup
  • Kind: organization
  • Location: Dundee, Scotland, UK

GitHub Events

Total
  • Watch event: 1
Last Year
  • Watch event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 652
  • Total Committers: 4
  • Avg Commits per committer: 163.0
  • Development Distribution Score (DDS): 0.597
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Stuart MacGowan s****n@d****k 263
Fábio Madeira f****a@g****m 196
tbrittoborges t****s@d****k 144
tbrittoborges t****s@d****k 49
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 32
  • Total pull requests: 24
  • Average time to close issues: 8 months
  • Average time to close pull requests: about 23 hours
  • Total issue authors: 3
  • Total pull request authors: 3
  • Average comments per issue: 0.91
  • Average comments per pull request: 0.83
  • Merged pull requests: 22
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • tbrittoborges (19)
  • biomadeira (9)
  • stuartmac (4)
Pull Request Authors
  • biomadeira (13)
  • stuartmac (6)
  • tbrittoborges (5)
Top Labels
Issue Labels
enhancement (13) Label2 (3) bug (2) question (2) Label1 (2) asdasd (2)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 11 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 3
  • Total maintainers: 1
pypi.org: proteofav

PROtein Feature Aggregation and Variants.

  • Versions: 3
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 11 Last month
Rankings
Dependent packages count: 7.3%
Dependent repos count: 22.1%
Average: 37.4%
Downloads: 82.9%
Maintainers (1)
Last synced: 6 months ago

Dependencies

requirements.txt pypi
  • biopython >=1.68
  • click >=6.7
  • click_log >=0.2.1
  • lxml >=3.7.3
  • numpy >=1.13.3
  • pandas >=0.20.3
  • requests >=2.18.2
  • requests_cache >=0.4.13
  • responses >=0.8.1
  • scipy >=0.19.1
setup.py pypi