https://github.com/bartongroup/prointvar

The core bits of ProIntVar

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

The core bits of ProIntVar

Basic Info

Host: GitHub
Owner: bartongroup
License: mit
Language: Python
Default Branch: master
Size: 3.08 MB

Statistics

Stars: 2
Watchers: 5
Forks: 0
Open Issues: 9
Releases: 1

Created about 9 years ago · Last pushed over 1 year ago

Metadata Files

Readme License

ProIntVar

ProIntVar-Core is a Python module that implements methods for working with protein structures (handles mmCIF, DSSP, SIFTS, protein interactions, etc.) and genetic variation (via UniProt and Ensembl APIs).

ProIntVar core is now separated from ProIntVar-Analysis, which contains analysis scripts that use ProIntVar Core components.

Key features
Overview
Dependencies
Installing
Configuration
How to use
- ProIntVar CLI
- ProIntVar Classes
Additional Information
- Project Structure
- Guidelines on file names and extensions
Licensing

Key features

Support for both reading and writing PDB/mmCIF structures
DSSP runnning and parsing
PDB-UniProt structure-sequence mapping with SIFTS (xml) parsing
Interface (contacts) computing and analysis with Arpeggio
Addition of Hydrogen atoms with HBPLUS and Reduce
Download various raw files (structures, sequences, variants, etc.)
Fetch data from several APIs (Proteins API, PDBe REST API, Ensembl REST, etc.)
A TableMerger class that simplifies working with protein structures and sequence annotations
All data is handled with Pandas data structures

Overview

ProIntVar handles data with aid of Pandas DataFrames. Data such as protein structures (sequence and atom 3D coordinates) and respective annotations (from structural analysis, e.g. interacting interfaces, secondary structure and solvent accessibility), as well as protein sequences and annotations (e.g. genetic variants, and other functional information) are handled by the classes/methods so that each modular (components) table can be integrated onto a single 'merged table'.

screenshot

The methods implemenented in prointvar/merger.py allow for the different components to be merged together onto a single Pandas DataFrame.

Dependencies

Using Python 3.5+.

Check requirements.txt for all dependencies.

Installing

Setting up a virtual environment sh $ virtualenv --python `which python` env $ source env/bin/activate

Installing ProIntVar

```sh

alternatively

$ git clone https://github.com/bartongroup/ProIntVar.git

installing requirements

$ cd ProIntVar $ pip install -r requirements.txt

then...

$ python setup.py test $ python setup.py install ```

Configuration

Editing the provided template configuration settings ```sh $ cd /path/to/desired/working/dir/

Get a copy of the template config.ini file shipped with ProIntVar

$ ProIntVar-config-setup new_config.ini

Update the settings according to user preferences and push them

$ ProIntVar-config-load new_config.ini ```

Testing that the new values are correctly loaded by ProIntVar ```sh $ python

from prointvar.config import config config.db_tmp 'tmp' ```

How to use

ProIntVar CLI

There are several tools provided with the ProIntVar CLI, each having its own options and arguments. Pass the --help for more information about each tool.

An example usage of the CLI is to download some files from main repositories. Using the Downloader interface in the CLI to download some macromolecular structures:

```sh

downloads structures in mmCIF format to the directory defined in the config.ini

ProIntVar download --mmcif 2pah

downloads SIFTS record in XML format

ProIntVar download --sifts 2pah ```

ProIntVar Classes

Each main class in ProIntVar works as an independent component that can be used on its own or together with other classes. Generally each main class produces/parses data to a pandas DataFrame. The classes/methods provided in prointvar.merger can be used to merge DataFrames. Merging DataFrames is not trivial, since there must be common features in the tables to be merged. More information on how to use the TableMerger class and which features (columns) from each table can be used to merge with confidence is provided below.

`prointvar.pdbx`

Using PDBXreader to parse a mmCIF formatted macromolecular structure. ```python

import os from prointvar.config import config as cfg from prointvar.pdbx import PDBXreader from prointvar.fetchers import downloadstructurefrom_pdbe

downloadstructurefrompdbe('2pah') inputstruct = os.path.join(cfg.dbroot, cfg.dbpdbx, '2pah.cif') df = PDBXreader(inputfile=inputstruct).atoms(formattype="mmcif")

pandas DataFrame

df.head()

```

We can convert the format of the mmCIF structure to PDB format. ```python

from prointvar.pdbx import PDBXwriter

outputstruct = os.path.join(cfg.dbroot, cfg.dbpdbx, '2pah.pdb') w = PDBXwriter(outputfile=outputstruct) w.run(df, format_type="pdb")

```

`prointvar.dssp`

With the DSSP classes we can read DSSP formatted files and also generate DSSP output for mmCIF or PDB structures.

```python

from prointvar.dssp import DSSPrunner, DSSPreader

inputstruct = os.path.join(cfg.dbroot, cfg.dbpdbx, '2pah.cif') outputdssp = os.path.join(cfg.dbroot, cfg.dbdssp, '2pah.dssp') DSSPrunner(inputfile=inputstruct, outputfile=outputdssp).write()

df2 = DSSPreader(inputfile=output_dssp).read()

pandas DataFrame

df2.head()

```

`prointvar.sifts`

Parsing the SIFTS UniProt-PDB cross-mapping is as simple.

```python

from prointvar.sifts import SIFTSreader from prointvar.fetchers import downloadsiftsfrom_ebi

downloadsiftsfromebi('2pah') inputsifts = os.path.join(cfg.dbroot, cfg.dbsifts, '2pah.xml') df3 = SIFTSreader(inputfile=input_sifts).read()

pandas DataFrame

df3.head()

```

`prointvar.merger`

Now protein structure, secondary structure and solvent accessibility can be merged onto protein sequence (via SIFTS).

```python

from prointvar.merger import TableMerger

mdf = TableMerger(pdbxtable=df, dssptable=df2, sifts_table=df3).merge()

pandas DataFrame

mdf.head()

```

Additional Information

Table merger

TODO

Project Structure

TODO

Guidelines on file names and extensions

PDB/PDBx/mmCIF Macromolecular structures * PDB and mmCIF formatted files are read and written from db_pdbx folder, as defined in the configuration file config.ini - PDB/mmCIF files are written as <pdb_id>.pdb or <pdb_id>.cif - BioUnits from PDBe are written as <pdb_id>_bio.cif - New structure files written for running DSSP, Reduce, HBPLUS or Arpeggio are generally written as <4char>_new.pdb format - By-chain/entity structures are written as <pdb_id>_<chain_id>.pdb

DSSP Secondary Structure * DSSP files are read and written from db_dssp folder
- DSSP files are generally written as <pdb_id>.dssp - By-chain/entity DSSP outputs are written as <pdb_id>_<chain_id>.dssp - Unbound-state DSSP are written as <pdb_id>_unbound.dssp

SIFTS Structure-Sequence (PDB-UniProt) cross-reference * SIFTS files are read and written from db_sifts folder - SIFTS files are written as <pdb_id>.xml

Arpeggio Interface Contacts * Arpeggio files are read and written from db_contacts folder - Arpeggio files are written as <pdb_id>.contacts, <pdb_id>.amam, <pdb_id>.amri, <pdb_id>.ari and <pdb_id>.ri

HBPLUS Hydrogen-Bond Contacts * HBPLUS files are read and written from db_contacts folder - HBPLUS files are written as <pdb_id>.h2b - HBPLUS Hydrogen-filled PDBs are written as <pdb_id>.h.pdb in db_pdbx

Reduce PDBs filled with Hydrogen * Reduce files are read and written from db_pdbx folder - Reduce Hydrogen-filled PDBs are written as <pdb_id>.h.pdb in db_pdbx

Licensing

The MIT License (MIT). See license for details.

Owner

Name: Geoff Barton's Computational Biology Group
Login: bartongroup
Kind: organization
Location: Dundee, Scotland, UK

Website: https://www.compbio.dundee.ac.uk
Twitter: bartongrp
Repositories: 57
Profile: https://github.com/bartongroup

GitHub Events

Total

Issues event: 1
Push event: 1

Last Year

Issues event: 1
Push event: 1

Dependencies

requirements.txt pypi

biopython >=1.68
click >=6.7
click_log >=0.2.1
lxml >=4.1.0
lxml >=3.7.3
numpy >=1.13.3
pandas >=0.20.3
proteofav >=0.2.0
requests >=2.18.2
requests_cache >=0.4.13
responses >=0.8.1
scipy >=0.19.1

setup.py pypi

https://github.com/bartongroup/prointvar

Science Score: 13.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

ProIntVar

Table of Contents

Key features

Overview

Dependencies

Installing

alternatively

installing requirements

then...

Configuration

Get a copy of the template config.ini file shipped with ProIntVar

Update the settings according to user preferences and push them

How to use

ProIntVar CLI

downloads structures in mmCIF format to the directory defined in the config.ini

downloads SIFTS record in XML format

ProIntVar Classes

prointvar.pdbx

pandas DataFrame

prointvar.dssp

pandas DataFrame

prointvar.sifts

pandas DataFrame

prointvar.merger

pandas DataFrame

Additional Information

Table merger

Project Structure

Guidelines on file names and extensions

Licensing

Owner

GitHub Events

Total

Last Year

Dependencies

`prointvar.pdbx`

`prointvar.dssp`

`prointvar.sifts`

`prointvar.merger`