https://github.com/biocomputingup/geometre
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: BioComputingUP
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 22.1 MB
Statistics
- Stars: 4
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 1
Metadata Files
README.md
GeomeTRe
GeomeTRe calculates geometrical properties of tandem repeat proteins. It requires a protein structure and the start and end positions of each repeat unit (with optional insertion positions) as input. If insertion positions are provided, those segments are excluded to improve accuracy.
For most known STRPs, repeat unit and insertion coordinates are available from the manually curated RepeatsDB database.
It takes about 2 minutes to calculate 100 structures with 10 threads and a SSD disk.
Algorithm
The algorithm computes the three Tait-Bryan angles - yaw, pitch, and roll - by simulating an airplane traversing the protein from its N-terminus to C-terminus. In this analogy, the airplane points to the centroid of the next repeat unit, and the angles correspond to the maneuvers required to move from one unit to the next.
The algorithm also determines handedness, defined by the roll direction of movement (clockwise/right-handed or anticlockwise/left-handed), and the sign of the pitch (positive for upward, negative for downward movement).
Installation
The software can be installed with pip or you can just clone it and use it.
In order to enable PyMOL visualization (optional), we recommend to create a new environment and install the PyMOL bundle first and after all the other dependencies.
Installation with pip
bash
pip install git+https://github.com/BioComputingUP/GeomeTRe.git
Installation with Conda and PyMOL ```bash
Create and activate a new conda environment
conda create -n geometre conda activate geometre
Install PyMOL
conda install -c conda-forge -c schrodinger pymol-bundle
Clone the GeomeTRe repository
git clone https://github.com/BioComputingUP/GeomeTRe.git
Set path to the module in your environment
export PYTHONPATH="${PYTHONPATH}:/home/user/Desktop/GeomeTRe/src/" ```
Dependencies
The following dependencies are required to run:
- Python 3.9 or higher
- Packages (installed automatically via pip install):
- numpy==2.2.2
- pandas==2.2.3
- scipy==1.15.0
- scikit-learn==1.6.0
- biopython==1.84
- tmtools==0.2.0
- scikit-image==0.25.0
- requests==2.32.3
PyMOL: PyMOL must be installed via conda before running the package if you intend to enable visualization.
conda install -c conda-forge pymol-open-source=2.5.0
Usage
GeomeTRe can be used in single mode to process a single structure, in batch mode to process an entire dataset and it can be executed for just rendering its results in PyMOL.
Single structure execution ```bash
Single mode without pip. Same as above but invoking main.py directly
python3 main.py single 2xqh.pdb A result.csv 161175,176189,190203,204217,218233,234249,250263,264276,305326,327350,373392,393416 -insdef 351372
Single mode with pip installation - pdb id, chain, output file, units, insertions (optional)
geometre single 2xqh.pdb A result.csv 161175,176189,190203,204217,218233,234249,250263,264276,305326,327350,373392,393416 -insdef 351372 ```
Visualize output in PyMOL ```bash
Visualize without pip
python3 main.py draw 2xqh.pdb result.npy
Visualize with pip installation
geometre pymol 2xqh.pdb result.npy ```
Batch execution ```bash
Download structures. It extract PDB IDs from the first column of the TSV file and download them in the pdb_dir folder.
python3 main.py batch data/inputbatchshort.tsv data/resultbatch.csv -pdbdir data/pdbs -threads 4
If you don't provide the -pdb_dir argument the program don't download structures based on PDB id,
but it expects structures are available in the path provided in the first column of the TSV file (see format section below)
python3 main.py batch data/inputbatchshort.tsv data/result_batch.csv -threads 4
With pip installation
geometre batch data/inputbatchshort.tsv data/result_batch.csv -threads 4 ```
Library
GeomeTRe can be used as module directly in a Python script:
```python from geometre.process import compute
df = compute(filepath=inputfile, chain=chain, unitsids='161175,176189,190203,204217', opath=outfile, insids='351372') ```
Formats
Output single mode
CSV table with the computed parameters (.csv) - pdbid: the PDB id of the molecule - chain: chain of PDB structure - unitstart: start position of repeat unit - unitend: end position of repeat unit - curvature: the curvature, computed as the angle between the vectors connecting the rotation center to two consecutive units - twist: the twist, computed as the component of the rotation that aligns the two units w.r.t. the twist axis - twisthand: computed as the handedness of the rotation, w.r.t. the twist axis - pitch: the pitch, computed the same way as the twist, but orthogonalizing w.r.t the pitch axis - pitch_hand: computed as the handedness of the rotation, w.r.t. the pitch axis. - tm-score: the tm-score of the structural alignment - yaw: the residual yaw rotation: in a perfect structure, this is 0, as we already compensate for the yaw when we align the axes of the two units to the standard reference axes. A high yaw means a bad performance on the algorithm for that unit pair. - Additionally, the last 2 rows are showing mean and standard deviations of each parameter. The first column is all zeros, since the rows refer to the unit and the unit before it.
PyMOL parameters for drawing (.npy)
PyMOL drawing
PyMOL drawing contains the following axis: - In red, twist axis of each repeat unit(RU) which is always parallel to the longest dimension of the protein - In green, pitch axis of each RU - In blue, yaw(curvature) axis of each RU The example of PyMOL drawing in png format with explanation text is below

Output batch mode
- pdb id: the PDB id of the molecule
- chain: chain of PDB structure
- curvature_mean: mean of curvature
- curvature_std: standard deviation of curvature
- twist_mean: mean of twist
- twist_std: standard deviation of twist
- twisthandmean: mean of twist handedness
- twisthandstd: standard deviation of twist handedness
- pitch_mean: mean of pitch
- pitch_std: standard deviation of pitch
- pitchhandmean: mean of pitch handedness
- pitchhandstd: standard deviation of pitch handedness
- tm-score_mean: mean of tmtool score
- tm-score_std: standard deviation of tmtool score
- yaw_mean: mean of yaw
- yaw_std: standard deviation of yaw
data/inputbatchshort.tsv
tsv file containing the proteins in the RepeatsDB database (version 4) for batch mode calculations.
Owner
- Name: BioComputing Group, University of Padova
- Login: BioComputingUP
- Kind: organization
- Email: biocomp@bio.unipd.it
- Location: Italy
- Website: https://biocomputingup.it/
- Repositories: 31
- Profile: https://github.com/BioComputingUP
GitHub Events
Total
- Release event: 1
- Watch event: 1
- Delete event: 1
- Push event: 7
- Public event: 1
- Create event: 1
Last Year
- Release event: 1
- Watch event: 1
- Delete event: 1
- Push event: 7
- Public event: 1
- Create event: 1
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Elisa-blip | 1****p | 54 |
| Zarifa Osmanli | z****i@g****m | 47 |
| Damiano Piovesan | d****n@g****m | 20 |
Issues and Pull Requests
Last synced: 9 months ago