https://github.com/biocomputingup/geometre

https://github.com/biocomputingup/geometre

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.5%) to scientific vocabulary
Last synced: 7 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: BioComputingUP
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 22.1 MB
Statistics
  • Stars: 4
  • Watchers: 2
  • Forks: 0
  • Open Issues: 0
  • Releases: 1
Created over 2 years ago · Last pushed 9 months ago
Metadata Files
Readme License

README.md

GeomeTRe

DOI

GeomeTRe calculates geometrical properties of tandem repeat proteins. It requires a protein structure and the start and end positions of each repeat unit (with optional insertion positions) as input. If insertion positions are provided, those segments are excluded to improve accuracy.

For most known STRPs, repeat unit and insertion coordinates are available from the manually curated RepeatsDB database.

It takes about 2 minutes to calculate 100 structures with 10 threads and a SSD disk.

Algorithm

The algorithm computes the three Tait-Bryan angles - yaw, pitch, and roll - by simulating an airplane traversing the protein from its N-terminus to C-terminus. In this analogy, the airplane points to the centroid of the next repeat unit, and the angles correspond to the maneuvers required to move from one unit to the next.

The algorithm also determines handedness, defined by the roll direction of movement (clockwise/right-handed or anticlockwise/left-handed), and the sign of the pitch (positive for upward, negative for downward movement).

Installation

The software can be installed with pip or you can just clone it and use it.

In order to enable PyMOL visualization (optional), we recommend to create a new environment and install the PyMOL bundle first and after all the other dependencies.

Installation with pip bash pip install git+https://github.com/BioComputingUP/GeomeTRe.git

Installation with Conda and PyMOL ```bash

Create and activate a new conda environment

conda create -n geometre conda activate geometre

Install PyMOL

conda install -c conda-forge -c schrodinger pymol-bundle

Clone the GeomeTRe repository

git clone https://github.com/BioComputingUP/GeomeTRe.git

Set path to the module in your environment

export PYTHONPATH="${PYTHONPATH}:/home/user/Desktop/GeomeTRe/src/" ```

Dependencies

The following dependencies are required to run: - Python 3.9 or higher - Packages (installed automatically via pip install): - numpy==2.2.2 - pandas==2.2.3 - scipy==1.15.0 - scikit-learn==1.6.0 - biopython==1.84 - tmtools==0.2.0 - scikit-image==0.25.0 - requests==2.32.3

PyMOL: PyMOL must be installed via conda before running the package if you intend to enable visualization. conda install -c conda-forge pymol-open-source=2.5.0

Usage

GeomeTRe can be used in single mode to process a single structure, in batch mode to process an entire dataset and it can be executed for just rendering its results in PyMOL.

Single structure execution ```bash

Single mode without pip. Same as above but invoking main.py directly

python3 main.py single 2xqh.pdb A result.csv 161175,176189,190203,204217,218233,234249,250263,264276,305326,327350,373392,393416 -insdef 351372

Single mode with pip installation - pdb id, chain, output file, units, insertions (optional)

geometre single 2xqh.pdb A result.csv 161175,176189,190203,204217,218233,234249,250263,264276,305326,327350,373392,393416 -insdef 351372 ```

Visualize output in PyMOL ```bash

Visualize without pip

python3 main.py draw 2xqh.pdb result.npy

Visualize with pip installation

geometre pymol 2xqh.pdb result.npy ```

Batch execution ```bash

Download structures. It extract PDB IDs from the first column of the TSV file and download them in the pdb_dir folder.

python3 main.py batch data/inputbatchshort.tsv data/resultbatch.csv -pdbdir data/pdbs -threads 4

If you don't provide the -pdb_dir argument the program don't download structures based on PDB id,

but it expects structures are available in the path provided in the first column of the TSV file (see format section below)

python3 main.py batch data/inputbatchshort.tsv data/result_batch.csv -threads 4

With pip installation

geometre batch data/inputbatchshort.tsv data/result_batch.csv -threads 4 ```

Library

GeomeTRe can be used as module directly in a Python script:

```python from geometre.process import compute

df = compute(filepath=inputfile, chain=chain, unitsids='161175,176189,190203,204217', opath=outfile, insids='351372') ```

Formats

Output single mode

CSV table with the computed parameters (.csv) - pdbid: the PDB id of the molecule - chain: chain of PDB structure - unitstart: start position of repeat unit - unitend: end position of repeat unit - curvature: the curvature, computed as the angle between the vectors connecting the rotation center to two consecutive units - twist: the twist, computed as the component of the rotation that aligns the two units w.r.t. the twist axis - twisthand: computed as the handedness of the rotation, w.r.t. the twist axis - pitch: the pitch, computed the same way as the twist, but orthogonalizing w.r.t the pitch axis - pitch_hand: computed as the handedness of the rotation, w.r.t. the pitch axis. - tm-score: the tm-score of the structural alignment - yaw: the residual yaw rotation: in a perfect structure, this is 0, as we already compensate for the yaw when we align the axes of the two units to the standard reference axes. A high yaw means a bad performance on the algorithm for that unit pair. - Additionally, the last 2 rows are showing mean and standard deviations of each parameter. The first column is all zeros, since the rows refer to the unit and the unit before it.

PyMOL parameters for drawing (.npy)

PyMOL drawing

PyMOL drawing contains the following axis: - In red, twist axis of each repeat unit(RU) which is always parallel to the longest dimension of the protein - In green, pitch axis of each RU - In blue, yaw(curvature) axis of each RU The example of PyMOL drawing in png format with explanation text is below

Example of PyMOL drawing

Output batch mode

  • pdb id: the PDB id of the molecule
  • chain: chain of PDB structure
  • curvature_mean: mean of curvature
  • curvature_std: standard deviation of curvature
  • twist_mean: mean of twist
  • twist_std: standard deviation of twist
  • twisthandmean: mean of twist handedness
  • twisthandstd: standard deviation of twist handedness
  • pitch_mean: mean of pitch
  • pitch_std: standard deviation of pitch
  • pitchhandmean: mean of pitch handedness
  • pitchhandstd: standard deviation of pitch handedness
  • tm-score_mean: mean of tmtool score
  • tm-score_std: standard deviation of tmtool score
  • yaw_mean: mean of yaw
  • yaw_std: standard deviation of yaw

data/inputbatchshort.tsv

tsv file containing the proteins in the RepeatsDB database (version 4) for batch mode calculations.

Owner

  • Name: BioComputing Group, University of Padova
  • Login: BioComputingUP
  • Kind: organization
  • Email: biocomp@bio.unipd.it
  • Location: Italy

GitHub Events

Total
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Push event: 7
  • Public event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Watch event: 1
  • Delete event: 1
  • Push event: 7
  • Public event: 1
  • Create event: 1

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 121
  • Total Committers: 3
  • Avg Commits per committer: 40.333
  • Development Distribution Score (DDS): 0.554
Past Year
  • Commits: 65
  • Committers: 2
  • Avg Commits per committer: 32.5
  • Development Distribution Score (DDS): 0.308
Top Committers
Name Email Commits
Elisa-blip 1****p 54
Zarifa Osmanli z****i@g****m 47
Damiano Piovesan d****n@g****m 20

Issues and Pull Requests

Last synced: 9 months ago