Science Score: 77.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
✓Committers with academic emails
2 of 29 committers (6.9%) from academic institutions -
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.9%) to scientific vocabulary
Keywords
Keywords from Contributors
Repository
Protein Graph Library
Basic Info
- Host: GitHub
- Owner: a-r-j
- License: mit
- Language: Jupyter Notebook
- Default Branch: master
- Homepage: https://graphein.ai/
- Size: 86.5 MB
Statistics
- Stars: 1,119
- Watchers: 18
- Forks: 137
- Open Issues: 55
- Releases: 19
Topics
Metadata Files
README.md
Documentation | Paper | Tutorials | Installation
Protein & Interactomic Graph Library
This package provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks. We provide compatibility with standard PyData formats, as well as graph objects designed for ease of use with popular deep learning libraries.
What's New?
| | | |
|---|---|---|
| 1.7.0 | FoldComp Datasets | |
| 1.7.0 | Creating Datasets from the PDB |
|
| 1.6.0 | Protein Tensor Module |
|
| 1.5.0 | Protein Graph Creation from AlphaFold2! |
|
| 1.5.0 | RNA Graph Construction from Dotbracket notation |
|
| 1.4.0 | Constructing molecular graphs |
|
| 1.3.0 | Ready-to-go Dataloaders for PyTorch Geometric |
|
| 1.2.0 | Extracting subgraphs from protein graphs |
|
| 1.2.0 | Protein Graph Analytics |
|
| 1.2.0 | Graphein CLI | |
| 1.2.0 |Protein Graph Visualisation! |
| 1.1.0 | Protein - Protein Interaction Network Support & Structural Interactomics (Using AlphaFold2!) |
|
| 1.0.0 | High and Low-level API for massive flexibility - create your own bespoke workflows! |
|
Example usage
Graphein provides both a programmatic API and a command-line interface for constructing graphs.
CLI
Graphein configs can be specified as .yaml files to batch process graphs from the commandline.
bash
graphein -c config.yaml -p path/to/pdbs -o path/to/output
Creating a Protein Graph
| | | |
|---|---|---|
Tutorial (Residue-level) | Tutorial (Atomic) | Docs
| |
| |
```python from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import construct_graph
config = ProteinGraphConfig() g = constructgraph(config=config, pdbcode="3eiy") ```
Creating a Protein Graph from the AlphaFold Protein Structure Database
| | |
|---|---|
| Tutorial | Docs |
| |
```python from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import constructgraph from graphein.protein.utils import downloadalphafold_structure
config = ProteinGraphConfig() fp = downloadalphafoldstructure("Q5VSL9", alignedscore=False) g = constructgraph(config=config, path=fp) ```
Creating a Protein Mesh
| | |
|---|---|
| Tutorial | Docs |
| | |
```python from graphein.protein.config import ProteinMeshConfig from graphein.protein.meshes import create_mesh
verts, faces, aux = createmesh(pdbcode="3eiy", config=config) ```
Creating Molecular Graphs
Graphein can create molecular graphs from smiles strings as well as .sdf, .mol2, and .pdb files
| | |
|---|---|
| Tutorial | Docs |
| | |
```python from graphein.molecule.config import MoleculeGraphConfig from graphein.molecule.graphs import construct_graph
g = create_graph(smiles="CC(=O)OC1=CC=CC=C1C(=O)O", config=config)
```
Creating an RNA Graph
| | |
|---|---|
|Tutorial | Docs |
| | |
```python from graphein.rna.graphs import constructrnagraph
Build the graph from a dotbracket & optional sequence
rna = constructrnagraph(dotbracket='..(((((..(((...)))..)))))...', sequence='UUGGAGUACACAACCUGUACACUCUUUC') ```
Creating a Protein-Protein Interaction Graph
| | |
|---|---|
| Tutorial | Docs |
| |
```python from graphein.ppi.config import PPIGraphConfig from graphein.ppi.graphs import computeppigraph from graphein.ppi.edges import addstringedges, addbiogridedges
config = PPIGraphConfig() protein_list = ["CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB"]
g = computeppigraph(config=config, proteinlist=proteinlist, edgeconstructionfuncs=[addstringedges, addbiogridedges] ) ```
Creating a Gene Regulatory Network Graph
| | |
|---|---|
|Tutorial | Docs |
| |
```python from graphein.grn.config import GRNGraphConfig from graphein.grn.graphs import computegrngraph from graphein.grn.edges import addregnetworkedges, addtrrustedges
config = GRNGraphConfig() gene_list = ["AATF", "MYC", "USF1", "SP1", "TP53", "DUSP1"]
g = computegrngraph( genelist=genelist, edgeconstructionfuncs=[ partial(addtrrustedges, trrustfilteringfuncs=config.trrustconfig.filteringfunctions), partial(addregnetworkedges, regnetworkfilteringfuncs=config.regnetworkconfig.filteringfunctions), ], ) ```
Installation
Pip
The simplest install is via pip. N.B this does not install ML/DL libraries which are required for conversion to their data formats and for generating protein structure meshes with PyTorch 3D. Further details
bash
pip install graphein # For base install
pip install graphein[extras] # For additional featurisation dependencies
pip install graphein[dev] # For dev dependencies
pip install graphein[all] # To get the lot
However, there are a number of (optional) utilities (DSSP, PyMol, GetContacts) that are not available via PyPI:
``` conda install -c salilab dssp # Required for computing secondary structural features conda install -c schrodinger pymol # Required for PyMol visualisations & mesh generation
GetContacts - used as an alternative way to compute intramolecular interactions
conda install -c conda-forge vmd-python git clone https://github.com/getcontacts/getcontacts
Add folder to PATH
echo "export PATH=\$PATH:pwd/getcontacts" >> ~/.bashrc
source ~/.bashrc
To test the installation, run:
cd getcontacts/example/5xnd getdynamiccontacts.py --topology 5xndtopology.pdb \ --trajectory 5xndtrajectory.dcd \ --itypes hb \ --output 5xnd_hbonds.tsv ```
Conda environment
The dev environment includes GPU Builds (CUDA 11.1) for each of the deep learning libraries integrated into graphein.
bash
git clone https://www.github.com/a-r-j/graphein
cd graphein
conda env create -f environment-dev.yml
pip install -e .
A lighter install can be performed with:
bash
git clone https://www.github.com/a-r-j/graphein
cd graphein
conda env create -f environment.yml
pip install -e .
Dockerfile
We provide two docker-compose files for CPU (docker-compose.cpu.yml) and GPU usage (docker-compose.yml) locally. For GPU usage please ensure that you have NVIDIA Container Toolkit installed. Ensure that you install the locally mounted volume after entering the container (pip install -e .). This will also setup the dev environment locally.
To build (GPU) run:
docker-compose up -d --build # start the container
docker-compose down # stop the container
Citing Graphein
Please consider citing graphein if it proves useful in your work.
```bibtex @inproceedings{jamasb2022graphein, title={Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks}, author={Arian Rokkum Jamasb and Ramon Vi{~n}as Torn{\'e} and Eric J Ma and Yuanqi Du and Charles Harris and Kexin Huang and Dominic Hall and Pietro Lio and Tom Leon Blundell}, booktitle={Advances in Neural Information Processing Systems}, editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho}, year={2022}, url={https://openreview.net/forum?id=9xRZlV6GfOX} }
```
Owner
- Name: Arian Jamasb
- Login: a-r-j
- Kind: user
- Location: Basel
- Company: University of Cambridge
- Website: jamasb.io
- Twitter: arian_jamasb
- Repositories: 32
- Profile: https://github.com/a-r-j
Principal ML Scientist @PrescientDesign / Tensor Jockey / PhD @ University of Cambridge Prev: MILA, Google X, Relation Therapeutic
Citation (citation.bib)
@inproceedings{jamasb2022graphein,
title={Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks},
author={Arian Rokkum Jamasb and Ramon Vi{\~n}as Torn{\'e} and Eric J Ma and Yuanqi Du and Charles Harris and Kexin Huang and Dominic Hall and Pietro Lio and Tom Leon Blundell},
booktitle={Advances in Neural Information Processing Systems},
editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
year={2022},
url={https://openreview.net/forum?id=9xRZlV6GfOX}
}
GitHub Events
Total
- Issues event: 18
- Watch event: 94
- Issue comment event: 58
- Push event: 14
- Pull request event: 11
- Fork event: 6
Last Year
- Issues event: 18
- Watch event: 94
- Issue comment event: 58
- Push event: 14
- Pull request event: 11
- Fork event: 6
Committers
Last synced: 9 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| a-r-j | a****b@g****m | 504 |
| Eric Ma | e****g@g****m | 84 |
| cch1999 | c****1@g****m | 31 |
| pre-commit-ci[bot] | 6****] | 24 |
| Kieran Didi | 5****i | 9 |
| Sean Aubin | s****n@p****m | 7 |
| Anton Bushuiev | 6****v | 7 |
| Ramon Viñas Torné | r****t@g****m | 7 |
| AbdulHamid Merii | 4****i | 4 |
| Alex Morehead | a****b@m****u | 4 |
| dependabot-preview[bot] | 2****] | 3 |
| kexinhuang12345 | k****3@n****u | 3 |
| Ryan Greenhalgh | 3****4 | 3 |
| Arian Jamasb | a****b@r****m | 2 |
| Cam | 7****m | 2 |
| Cam | 7****i | 2 |
| Manon Reau | m****u@g****m | 2 |
| Ollie Turnbull | o****1@g****m | 2 |
| avivko | 3****o | 2 |
| Chaitanya Joshi | c****9@g****m | 1 |
| ChuNan Liu | b****u@g****m | 1 |
| David Stein | 4****n | 1 |
| Ikko Eltociear Ashimine | e****r@g****m | 1 |
| Nicktf | 4****8 | 1 |
| Ruibin Liu | r****8@g****m | 1 |
| Steven Lee | 1****1@q****m | 1 |
| Tim | T****s | 1 |
| ricomnl | r****7@g****m | 1 |
| y6q9 | 4****u | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 71
- Total pull requests: 144
- Average time to close issues: about 2 months
- Average time to close pull requests: 26 days
- Total issue authors: 41
- Total pull request authors: 20
- Average comments per issue: 2.3
- Average comments per pull request: 2.31
- Merged pull requests: 114
- Bot issues: 0
- Bot pull requests: 28
Past Year
- Issues: 11
- Pull requests: 16
- Average time to close issues: 5 days
- Average time to close pull requests: 22 days
- Issue authors: 9
- Pull request authors: 5
- Average comments per issue: 0.82
- Average comments per pull request: 0.75
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 5
Top Authors
Issue Authors
- kamurani (8)
- a-r-j (8)
- pengzhangzhi (4)
- universvm (4)
- avivko (2)
- davidfstein (2)
- mawright (2)
- OliviaViessmann (2)
- Jrunchang (2)
- anton-bushuiev (2)
- l-Dr-MR-l (2)
- velocirraptor23 (2)
- johnnytam100 (2)
- 1511878618 (1)
- thollis23 (1)
Pull Request Authors
- a-r-j (73)
- pre-commit-ci[bot] (34)
- kierandidi (19)
- kamurani (9)
- anton-bushuiev (6)
- amorehead (6)
- AH-Merii (5)
- Linsastar (2)
- manonreau (2)
- chris-clem (2)
- elementare (2)
- davidfstein (1)
- chaitjo (1)
- rvinas (1)
- eltociear (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- s-weigand/setup-conda v1 composite
- actions/checkout v3 composite
- dangoslen/changelog-enforcer v3 composite
- actions/checkout v2 composite
- actions/setup-python v2 composite
- actions/checkout v2 composite
- s-weigand/setup-conda v1 composite
- actions/checkout v2 composite
- s-weigand/setup-conda v1 composite
- pytorch/pytorch 1.9.1-cuda11.1-cudnn8-runtime build
- graphein-cpu latest
- graphein-gpu latest
- biopandas >=0.4.1
- biopython *
- bioservices >=1.10.0
- deepdiff *
- loguru *
- matplotlib >=3.4.3
- multipledispatch *
- networkx *
- numpy <1.24.0
- pandas *
- plotly *
- pydantic *
- pyyaml >=5.1,<6.0
- rich *
- rich-click *
- scikit-learn *
- scipy *
- seaborn *
- torchtyping *
- tqdm *
- typing_extensions *
- wget *
- xarray *
- black * development
- flake8 * development
- hypothesis * development
- interrogate * development
- isort * development
- nbstripout * development
- nbval * development
- pandoc * development
- pre-commit * development
- pycodestyle * development
- pydocstyle * development
- pytest * development
- pytest-cov * development
- pytest-xdist * development
- furo *
- ipython *
- m2r2 *
- nbsphinx *
- nbsphinx-link *
- nbstripout *
- pandoc *
- pydocstyle *
- sphinx *
- sphinx-copybutton *
- sphinx-inline-tabs *
- sphinxcontrib-gtagjs *
- sphinxext-opengraph *
- watermark *
- biovec *
- einops *
- mpl_chord_diagram ==0.3.2
- propy3 *
- pyaaisc *
- rdkit *
- selfies *
- smilite *
- bioservices *
- biovec *
- propy3 *
- pyaaisc *
