deeprank-gnn-esm

Graph Network for protein-protein interface including language model features

https://github.com/haddocking/deeprank-gnn-esm

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.3%) to scientific vocabulary
Last synced: 8 months ago · JSON representation ·

Repository

Graph Network for protein-protein interface including language model features

Basic Info
  • Host: GitHub
  • Owner: haddocking
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 722 MB
Statistics
  • Stars: 4
  • Watchers: 14
  • Forks: 4
  • Open Issues: 1
  • Releases: 4
Created about 1 year ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

deeprank-gnn-esm

Graph Network for protein-protein interface including language model features.

GitHub License

ci Codacy Badge Codacy Badge

For details refer to our publication at https://academic.oup.com/bioinformaticsadvances/article/4/1/vbad191/7511844

For detailed protocol to use our deeprank-gnn-esm software, refer to our publication at https://arxiv.org/abs/2407.16375

Installation

Since the project requires several ML-specific libraries, it's easier to setup with Anaconda:

  • Clone the repository

bash git clone https://github.com/haddocking/deeprank-gnn-esm.git cd deeprank-gnn-esm

  • Setup the environment, either CPU or GPU

bash conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu

OR

bash conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu

  • Install

bash pip install .

Usage

As a scoring function

We provide a command-line interface for deeprank-gnn-esm that can easily be used to score protein-protein complexes. The command-line interface can be used as follows:

```bash $ deeprank-gnn-esm-predict -h usage: deeprank-gnn-esm-predict [-h] pdbfile chainid1 chainid2 numcores

positional arguments: pdbfile Path to the PDB file. chainid1 First chain ID. chainid2 Second chain ID. numcores Number of cores

optional arguments: -h, --help show this help message and exit ```

Example, score the 1B6C complex

```bash

download it

$ wget https://files.rcsb.org/view/1B6C.pdb -q

make sure the environment is activated

$ conda activate deeprank-gnn-esm-gpu-env (deeprank-gnn-esm-gpu) $ deeprank-gnn-esm-predict 1B6C.pdb A B 1 2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/deeprank-gnn-esm/1B6C-gnnesmpredAB 2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file. 2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb 2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence. 2023-06-28 06:08:22,423 predict:132 INFO - ################################################################################ 2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU 2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnnesmpredAB/all.fasta with 2 sequences 2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences) 2023-06-28 06:08:36,462 predict:200 INFO - ################################################################################ 2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors Graphs added to the HDF5 file Embedding added to the /home/1B6C-gnnesmpredAB/graph.hdf5 file file 2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/deeprank-gnn-esm/1B6C-gnnesmpredAB/graph.hdf5 2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex. 2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0 # ... 2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359 2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/deeprank-gnn-esm/1B6C-gnnesmpred/GNNesmprediction.csv ```

From the output above you can see that the predicted fnat for the 1B6C complex is 0.359, this information is also written to the GNN_esm_prediction.csv file.

The command above will generate a folder in the current working directory, containing the following:

text 1B6C-gnn_esm_pred_A_B ├── 1B6C.pdb #input pdb file ├── all.fasta #fasta sequence for the pdb input ├── 1B6C.A.pt #esm-2 embedding for chainA in protein 1B6C ├── 1B6C.B.pt #esm-2 embedding for chainB in protein 1B6C ├── graph.hdf5 #input protein graph in hdf5 format ├── GNN_esm_prediction.hdf5 #prediction output in hdf5 format └── GNN_esm_prediction.csv #prediction output in csv format

As a framework

Note about input pdb files

To ensure the mapping between interface residue and esm-2 embeddings is correct, make sure that for all the chains, residue numbering in the PDB file is continuous and starts with residue '1'.

We provide a script (scripts/pdb_renumber.py) to do the numbering.

Generate esm-2 embeddings for your protein

  • To generate fasta sequences from PDBs, use script get_fasta.py

```bash usage: getfasta.py [-h] pdbfilepath chainid1 chain_id2

positional arguments: pdbfilepath Path to the directory containing PDB files chainid1 Chain ID for the first sequence chainid2 Chain ID for the second sequence

options: -h, --help show this help message and exit

python scripts/get_fasta.py tests/data/pdb/1ATN/ A B

```

  • Generate embeddings in bulk from combined fasta files, use the script provided inside esm-2 package,

bash $ python esm_2_installation_location/scripts/extract.py \ esm2_t33_650M_UR50D \ all.fasta \ tests/data/embedding/1ATN/ \ --repr_layers 0 32 33 \ --include mean per_tok

Replace 'esm2installation_location' with your installation location, 'all.fasta' with fasta sequence generated above, 'tests/data/embedding/1ATN/' with the output folder name for esm embeddings

Generate graph

  • Example code to generate residue graphs in hdf5 format:

```python from deeprank_gnn.GraphGenMP import GraphHDF5

pdbpath = "tests/data/pdb/1ATN/" pssmpath = "tests/data/pssm/1ATN/" embeddingpath = "tests/data/embedding/1ATN/" nproc = 20 outfile = "1ATNresidue.hdf5"

GraphHDF5( pdbpath = pdbpath, pssmpath = pssmpath, embeddingpath = embeddingpath, graph_type = "residue", outfile = outfile, nproc = nproc, #number of cores to use tmpdir="./tmpdir") ```

  • Example code to add continuous or binary targets to the hdf5 file

```python import h5py import random

hdf5file = h5py.File('1ATNresidue.hdf5', "r+") for mol in hdf5file.keys(): fnat = random.random() binclass = [1 if fnat > 0.3 else 0] hdf5file.createdataset(f"/{mol}/score/binclass", data=binclass) hdf5file.createdataset(f"/{mol}/score/fnat", data=fnat) hdf5file.close() ```

Use pre-trained models to predict

  • Example code to use pre-trained deeprank-gnn-esm model

```python from deeprankgnn.ginet import GINet from deeprankgnn.NeuralNet import NeuralNet

databasetest = "1ATNresidue.hdf5" gnn = GINet target = "fnat" edgeattr = ["dist"] threshold = 0.3 pretrainedmodel = 'deeprank-GNN-esm/paperpretrainedmodels/scoringofdockingmodels/gnnesm/tregyfnatb64e20lr0.001foldallesm.pth.tar' nodefeature = ["type", "polarity", "bsa", "charge", "embedding"] devicename = "cuda:0" num_workers = 10

model = NeuralNet( databasetest, gnn, devicename = devicename, edgefeature = edgeattr, nodefeature = nodefeature, target = target, numworkers = numworkers, pretrainedmodel = pretrained_model, threshold = threshold)

model.test(hdf5 = "tmpdir/GNNesmprediction.hdf5") ```

Owner

  • Name: HADDOCK
  • Login: haddocking
  • Kind: organization
  • Location: Utrecht, The Netherlands

Computational Structural Biology Group @ Utrecht University

Citation (CITATION.CFF)

# This CITATION.cff file was generated with cffinit.
# Visit https://bit.ly/cffinit to generate yours today!

cff-version: 1
title: DeepRank-GNN-esm
message: >-
  If you use this software, please cite it using the
  metadata from this file.
type: software
authors:
  - given-names: Xiaotong
    family-names: Xu
    email: x.xu1@uu.nl
    affiliation: Utrecht University
identifiers:
  - type: url
    value: ''
    description: 'https://github.com/DeepRank/Deeprank-GNN-esm'
repository-code: 'https://github.com/DeepRank/Deeprank-GNN-esm'
abstract: >-
  DeepRank-GNN-esm is the upgraded version of the
  DeepRank-GNN algorithm for ranking PPI complexes 
  with graph neural networks. DeepRank-GNN-esm utilizes 
  protein language model embeddings instead of PSSM 
  features. 
keywords:
  - graph neural network
  - protein-protein interface
  - protein language model
license: Apache-2.0

GitHub Events

Total
  • Create event: 11
  • Release event: 4
  • Issues event: 14
  • Watch event: 4
  • Delete event: 7
  • Issue comment event: 25
  • Push event: 17
  • Public event: 1
  • Pull request review comment event: 17
  • Pull request review event: 15
  • Pull request event: 23
  • Fork event: 6
Last Year
  • Create event: 11
  • Release event: 4
  • Issues event: 14
  • Watch event: 4
  • Delete event: 7
  • Issue comment event: 25
  • Push event: 17
  • Public event: 1
  • Pull request review comment event: 17
  • Pull request review event: 15
  • Pull request event: 23
  • Fork event: 6

Issues and Pull Requests

Last synced: 8 months ago

All Time
  • Total issues: 8
  • Total pull requests: 15
  • Average time to close issues: 8 days
  • Average time to close pull requests: 11 days
  • Total issue authors: 4
  • Total pull request authors: 4
  • Average comments per issue: 0.25
  • Average comments per pull request: 1.6
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 8
  • Pull requests: 15
  • Average time to close issues: 8 days
  • Average time to close pull requests: 11 days
  • Issue authors: 4
  • Pull request authors: 4
  • Average comments per issue: 0.25
  • Average comments per pull request: 1.6
  • Merged pull requests: 9
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • rvhonorato (5)
  • ErikHartman (1)
  • ntxxt (1)
  • AnnaKravchenko (1)
Pull Request Authors
  • ntxxt (13)
  • rvhonorato (6)
  • AnnaKravchenko (1)
  • erikedlund (1)
Top Labels
Issue Labels
enhancement (2) repository (1)
Pull Request Labels
repository (3) enhancement (2) CI/CD (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v4 composite
  • codacy/codacy-coverage-reporter-action 89d6c85cfafaec52c72b6c5e8b2878d33104c699 composite
  • conda-incubator/setup-miniconda v3 composite
.github/workflows/docker-publish.yml actions
  • actions/checkout v4 composite
  • docker/build-push-action v5 composite
  • docker/login-action v3 composite
  • docker/metadata-action v5 composite
Dockerfile docker
  • nvidia/cuda ${CUDA}-cudnn-runtime-ubuntu22.04 build
pyproject.toml pypi
.github/workflows/build.yml actions
  • actions/checkout v4 composite