graphein

Protein Graph Library

https://github.com/a-r-j/graphein

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
    2 of 29 committers (6.9%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.9%) to scientific vocabulary

Keywords

bioinformatics computational-biology deep-learning dgl drug-discovery gene-regulatory-networks geometric-deep-learning graph-neural-networks interactome interactomics ppi-networks protein protein-data-bank protein-design protein-structure python pytorch pytorch-geometric rna structural-biology

Keywords from Contributors

molecule cryptocurrencies pdb-files pdb pandas-dataframe molecular-structures mol2 transformers spacy-extension mlops
Last synced: 6 months ago · JSON representation ·

Repository

Protein Graph Library

Basic Info
  • Host: GitHub
  • Owner: a-r-j
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: master
  • Homepage: https://graphein.ai/
  • Size: 86.5 MB
Statistics
  • Stars: 1,119
  • Watchers: 18
  • Forks: 137
  • Open Issues: 55
  • Releases: 19
Topics
bioinformatics computational-biology deep-learning dgl drug-discovery gene-regulatory-networks geometric-deep-learning graph-neural-networks interactome interactomics ppi-networks protein protein-data-bank protein-design protein-structure python pytorch pytorch-geometric rna structural-biology
Created over 6 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Funding License Citation

README.md

Binder PyPI version supported python versions Docs DOI:10.1101/2020.07.15.204701 Project Status: Active – The project has reached a stable, usable state and is being actively developed. Project Status: Active – The project has reached a stable, usable state and is being actively developed. CodeFactor Quality Gate Status Bugs Maintainability Rating Reliability Rating Gitter chat License: MIT Code style: black



Documentation | Paper | Tutorials | Installation

Protein & Interactomic Graph Library

This package provides functionality for producing geometric representations of protein and RNA structures, and biological interaction networks. We provide compatibility with standard PyData formats, as well as graph objects designed for ease of use with popular deep learning libraries.

What's New?

| | | | |---|---|---| | 1.7.0 | FoldComp Datasets | Open In Colab | | 1.7.0 | Creating Datasets from the PDB | Open In Colab | | 1.6.0 | Protein Tensor Module | Open In Colab | | 1.5.0 | Protein Graph Creation from AlphaFold2! | Open In Colab | | 1.5.0 | RNA Graph Construction from Dotbracket notation | Open In Colab | | 1.4.0 | Constructing molecular graphs | Open In Colab | | 1.3.0 | Ready-to-go Dataloaders for PyTorch Geometric | Open In Colab | | 1.2.0 | Extracting subgraphs from protein graphs | Open In Colab | | 1.2.0 | Protein Graph Analytics | Open In Colab | | 1.2.0 | Graphein CLI | | | 1.2.0 |Protein Graph Visualisation! | Open In Colab | 1.1.0 | Protein - Protein Interaction Network Support & Structural Interactomics (Using AlphaFold2!) | Open In Colab | | 1.0.0 | High and Low-level API for massive flexibility - create your own bespoke workflows! | Open In Colab |

Example usage

Graphein provides both a programmatic API and a command-line interface for constructing graphs.

CLI

Graphein configs can be specified as .yaml files to batch process graphs from the commandline.

Docs

bash graphein -c config.yaml -p path/to/pdbs -o path/to/output

Creating a Protein Graph

| | | | |---|---|---| Tutorial (Residue-level) | Tutorial (Atomic) | Docs | Open In Colab | Open In Colab(https://colab.research.google.com/assets/colab-badge.svg) | |

```python from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import construct_graph

config = ProteinGraphConfig() g = constructgraph(config=config, pdbcode="3eiy") ```

Creating a Protein Graph from the AlphaFold Protein Structure Database

| | | |---|---| | Tutorial | Docs | | Open In Colab|

```python from graphein.protein.config import ProteinGraphConfig from graphein.protein.graphs import constructgraph from graphein.protein.utils import downloadalphafold_structure

config = ProteinGraphConfig() fp = downloadalphafoldstructure("Q5VSL9", alignedscore=False) g = constructgraph(config=config, path=fp) ```

Creating a Protein Mesh

| | | |---|---| | Tutorial | Docs | | Open In Colab | |

```python from graphein.protein.config import ProteinMeshConfig from graphein.protein.meshes import create_mesh

verts, faces, aux = createmesh(pdbcode="3eiy", config=config) ```

Creating Molecular Graphs

Graphein can create molecular graphs from smiles strings as well as .sdf, .mol2, and .pdb files

| | | |---|---| | Tutorial | Docs | | Open In Colab | |

```python from graphein.molecule.config import MoleculeGraphConfig from graphein.molecule.graphs import construct_graph

g = create_graph(smiles="CC(=O)OC1=CC=CC=C1C(=O)O", config=config)

```

Creating an RNA Graph

| | | |---|---| |Tutorial | Docs | |Open In Colab | |

```python from graphein.rna.graphs import constructrnagraph

Build the graph from a dotbracket & optional sequence

rna = constructrnagraph(dotbracket='..(((((..(((...)))..)))))...', sequence='UUGGAGUACACAACCUGUACACUCUUUC') ```

Creating a Protein-Protein Interaction Graph

| | | |---|---| | Tutorial | Docs | | Open In Colab|

```python from graphein.ppi.config import PPIGraphConfig from graphein.ppi.graphs import computeppigraph from graphein.ppi.edges import addstringedges, addbiogridedges

config = PPIGraphConfig() protein_list = ["CDC42", "CDK1", "KIF23", "PLK1", "RAC2", "RACGAP1", "RHOA", "RHOB"]

g = computeppigraph(config=config, proteinlist=proteinlist, edgeconstructionfuncs=[addstringedges, addbiogridedges] ) ```

Creating a Gene Regulatory Network Graph

| | | |---|---| |Tutorial | Docs | | Open In Colab |

```python from graphein.grn.config import GRNGraphConfig from graphein.grn.graphs import computegrngraph from graphein.grn.edges import addregnetworkedges, addtrrustedges

config = GRNGraphConfig() gene_list = ["AATF", "MYC", "USF1", "SP1", "TP53", "DUSP1"]

g = computegrngraph( genelist=genelist, edgeconstructionfuncs=[ partial(addtrrustedges, trrustfilteringfuncs=config.trrustconfig.filteringfunctions), partial(addregnetworkedges, regnetworkfilteringfuncs=config.regnetworkconfig.filteringfunctions), ], ) ```

Installation

Pip

The simplest install is via pip. N.B this does not install ML/DL libraries which are required for conversion to their data formats and for generating protein structure meshes with PyTorch 3D. Further details

bash pip install graphein # For base install pip install graphein[extras] # For additional featurisation dependencies pip install graphein[dev] # For dev dependencies pip install graphein[all] # To get the lot

However, there are a number of (optional) utilities (DSSP, PyMol, GetContacts) that are not available via PyPI:

``` conda install -c salilab dssp # Required for computing secondary structural features conda install -c schrodinger pymol # Required for PyMol visualisations & mesh generation

GetContacts - used as an alternative way to compute intramolecular interactions

conda install -c conda-forge vmd-python git clone https://github.com/getcontacts/getcontacts

Add folder to PATH

echo "export PATH=\$PATH:pwd/getcontacts" >> ~/.bashrc source ~/.bashrc To test the installation, run:

cd getcontacts/example/5xnd getdynamiccontacts.py --topology 5xndtopology.pdb \ --trajectory 5xndtrajectory.dcd \ --itypes hb \ --output 5xnd_hbonds.tsv ```

Conda environment

The dev environment includes GPU Builds (CUDA 11.1) for each of the deep learning libraries integrated into graphein.

bash git clone https://www.github.com/a-r-j/graphein cd graphein conda env create -f environment-dev.yml pip install -e .

A lighter install can be performed with:

bash git clone https://www.github.com/a-r-j/graphein cd graphein conda env create -f environment.yml pip install -e .

Dockerfile

We provide two docker-compose files for CPU (docker-compose.cpu.yml) and GPU usage (docker-compose.yml) locally. For GPU usage please ensure that you have NVIDIA Container Toolkit installed. Ensure that you install the locally mounted volume after entering the container (pip install -e .). This will also setup the dev environment locally.

To build (GPU) run:

docker-compose up -d --build # start the container docker-compose down # stop the container

Citing Graphein

Please consider citing graphein if it proves useful in your work.

```bibtex @inproceedings{jamasb2022graphein, title={Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks}, author={Arian Rokkum Jamasb and Ramon Vi{~n}as Torn{\'e} and Eric J Ma and Yuanqi Du and Charles Harris and Kexin Huang and Dominic Hall and Pietro Lio and Tom Leon Blundell}, booktitle={Advances in Neural Information Processing Systems}, editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho}, year={2022}, url={https://openreview.net/forum?id=9xRZlV6GfOX} }

```

Owner

  • Name: Arian Jamasb
  • Login: a-r-j
  • Kind: user
  • Location: Basel
  • Company: University of Cambridge

Principal ML Scientist @PrescientDesign / Tensor Jockey / PhD @ University of Cambridge Prev: MILA, Google X, Relation Therapeutic

Citation (citation.bib)

@inproceedings{jamasb2022graphein,
	title={Graphein - a Python Library for Geometric Deep Learning and Network Analysis on Biomolecular Structures and Interaction Networks},
	author={Arian Rokkum Jamasb and Ramon Vi{\~n}as Torn{\'e} and Eric J Ma and Yuanqi Du and Charles Harris and Kexin Huang and Dominic Hall and Pietro Lio and Tom Leon Blundell},
	booktitle={Advances in Neural Information Processing Systems},
	editor={Alice H. Oh and Alekh Agarwal and Danielle Belgrave and Kyunghyun Cho},
	year={2022},
	url={https://openreview.net/forum?id=9xRZlV6GfOX}
}

GitHub Events

Total
  • Issues event: 18
  • Watch event: 94
  • Issue comment event: 58
  • Push event: 14
  • Pull request event: 11
  • Fork event: 6
Last Year
  • Issues event: 18
  • Watch event: 94
  • Issue comment event: 58
  • Push event: 14
  • Pull request event: 11
  • Fork event: 6

Committers

Last synced: 9 months ago

All Time
  • Total Commits: 712
  • Total Committers: 29
  • Avg Commits per committer: 24.552
  • Development Distribution Score (DDS): 0.292
Past Year
  • Commits: 12
  • Committers: 4
  • Avg Commits per committer: 3.0
  • Development Distribution Score (DDS): 0.5
Top Committers
Name Email Commits
a-r-j a****b@g****m 504
Eric Ma e****g@g****m 84
cch1999 c****1@g****m 31
pre-commit-ci[bot] 6****] 24
Kieran Didi 5****i 9
Sean Aubin s****n@p****m 7
Anton Bushuiev 6****v 7
Ramon Viñas Torné r****t@g****m 7
AbdulHamid Merii 4****i 4
Alex Morehead a****b@m****u 4
dependabot-preview[bot] 2****] 3
kexinhuang12345 k****3@n****u 3
Ryan Greenhalgh 3****4 3
Arian Jamasb a****b@r****m 2
Cam 7****m 2
Cam 7****i 2
Manon Reau m****u@g****m 2
Ollie Turnbull o****1@g****m 2
avivko 3****o 2
Chaitanya Joshi c****9@g****m 1
ChuNan Liu b****u@g****m 1
David Stein 4****n 1
Ikko Eltociear Ashimine e****r@g****m 1
Nicktf 4****8 1
Ruibin Liu r****8@g****m 1
Steven Lee 1****1@q****m 1
Tim T****s 1
ricomnl r****7@g****m 1
y6q9 4****u 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 71
  • Total pull requests: 144
  • Average time to close issues: about 2 months
  • Average time to close pull requests: 26 days
  • Total issue authors: 41
  • Total pull request authors: 20
  • Average comments per issue: 2.3
  • Average comments per pull request: 2.31
  • Merged pull requests: 114
  • Bot issues: 0
  • Bot pull requests: 28
Past Year
  • Issues: 11
  • Pull requests: 16
  • Average time to close issues: 5 days
  • Average time to close pull requests: 22 days
  • Issue authors: 9
  • Pull request authors: 5
  • Average comments per issue: 0.82
  • Average comments per pull request: 0.75
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 5
Top Authors
Issue Authors
  • kamurani (8)
  • a-r-j (8)
  • pengzhangzhi (4)
  • universvm (4)
  • avivko (2)
  • davidfstein (2)
  • mawright (2)
  • OliviaViessmann (2)
  • Jrunchang (2)
  • anton-bushuiev (2)
  • l-Dr-MR-l (2)
  • velocirraptor23 (2)
  • johnnytam100 (2)
  • 1511878618 (1)
  • thollis23 (1)
Pull Request Authors
  • a-r-j (73)
  • pre-commit-ci[bot] (34)
  • kierandidi (19)
  • kamurani (9)
  • anton-bushuiev (6)
  • amorehead (6)
  • AH-Merii (5)
  • Linsastar (2)
  • manonreau (2)
  • chris-clem (2)
  • elementare (2)
  • davidfstein (1)
  • chaitjo (1)
  • rvinas (1)
  • eltociear (1)
Top Labels
Issue Labels
enhancement (9) dependencies (4) help wanted (4) good first issue (3) 0 - Priority P0 (2) 2 - Priority P2 (2) ML (1) bug (1) 1 - Priority P1 (1) documentation (1)
Pull Request Labels
enhancement (3) 1 - Priority P1 (2) bug (2) 0 - Priority P0 (2) help wanted (1) dependencies (1) 2 - Priority P2 (1)

Dependencies

.github/workflows/build.yaml actions
  • actions/checkout v2 composite
  • s-weigand/setup-conda v1 composite
.github/workflows/changelog.yaml actions
  • actions/checkout v3 composite
  • dangoslen/changelog-enforcer v3 composite
.github/workflows/code-style.yaml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
.github/workflows/code-tests-docker.yaml actions
  • actions/checkout v2 composite
  • s-weigand/setup-conda v1 composite
.github/workflows/minimal__install.yaml actions
  • actions/checkout v2 composite
  • s-weigand/setup-conda v1 composite
Dockerfile docker
  • pytorch/pytorch 1.9.1-cuda11.1-cudnn8-runtime build
docker-compose.cpu.yml docker
  • graphein-cpu latest
docker-compose.yml docker
  • graphein-gpu latest
.requirements/base.in pypi
  • biopandas >=0.4.1
  • biopython *
  • bioservices >=1.10.0
  • deepdiff *
  • loguru *
  • matplotlib >=3.4.3
  • multipledispatch *
  • networkx *
  • numpy <1.24.0
  • pandas *
  • plotly *
  • pydantic *
  • pyyaml >=5.1,<6.0
  • rich *
  • rich-click *
  • scikit-learn *
  • scipy *
  • seaborn *
  • torchtyping *
  • tqdm *
  • typing_extensions *
  • wget *
  • xarray *
.requirements/dev.in pypi
  • black * development
  • flake8 * development
  • hypothesis * development
  • interrogate * development
  • isort * development
  • nbstripout * development
  • nbval * development
  • pandoc * development
  • pre-commit * development
  • pycodestyle * development
  • pydocstyle * development
  • pytest * development
  • pytest-cov * development
  • pytest-xdist * development
.requirements/docs.in pypi
  • furo *
  • ipython *
  • m2r2 *
  • nbsphinx *
  • nbsphinx-link *
  • nbstripout *
  • pandoc *
  • pydocstyle *
  • sphinx *
  • sphinx-copybutton *
  • sphinx-inline-tabs *
  • sphinxcontrib-gtagjs *
  • sphinxext-opengraph *
  • watermark *
.requirements/extras.in pypi
  • biovec *
  • einops *
  • mpl_chord_diagram ==0.3.2
  • propy3 *
  • pyaaisc *
  • rdkit *
  • selfies *
  • smilite *
environment.yml pypi
  • bioservices *
  • biovec *
  • propy3 *
  • pyaaisc *