deeprank-gnn-esm

Graph Network for protein-protein interface including language model features

https://github.com/deeprank/deeprank-gnn-esm

Science Score: 26.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary

Keywords

deep-learning graph-networks interface-classification language-model protein-protein-interaction scoring utrecht-university

Last synced: 6 months ago · JSON representation

Repository

Graph Network for protein-protein interface including language model features

Basic Info

Host: GitHub
Owner: DeepRank
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 722 MB

Statistics

Stars: 30
Watchers: 6
Forks: 9
Open Issues: 0
Releases: 0

Topics

deep-learning graph-networks interface-classification language-model protein-protein-interaction scoring utrecht-university

Created over 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme License Citation

:bell: Archiving Note

Since DeepRank-GNN is no longer in active development, we migrated our DeepRank-GNN-esm version to our new repo at haddocking/DeepRank-GNN-esm.

For details refer to our publication "DeepRank-GNN-esm: a graph neural network for scoring proteinprotein models using protein language model" at https://academic.oup.com/bioinformaticsadvances/article/4/1/vbad191/7511844

:snowflake: This repository is now frozen. :snowflake:

DeepRank-GNN-esm

Graph Network for protein-protein interface including language model features

Installation

With Anaconda

Clone the repository bash git clone https://github.com/DeepRank/DeepRank-GNN-esm.git cd DeepRank-GNN-esm
Install either the CPU or GPU version of DeepRank-GNN-esm bash conda env create -f environment-cpu.yml && conda activate deeprank-gnn-esm-cpu-env OR bash conda env create -f environment-gpu.yml && conda activate deeprank-gnn-esm-gpu-env
Install the command line tool bash pip install .
Run the tests to make sure everything is working bash pytest tests/

Usage

As a scoring function

We provide a command-line interface for DeepRank-GNN-ESM that can be used to score protein-protein complexes. The command-line interface can be used as follows:

```bash usage: deeprank-gnn-esm-predict [-h] pdbfile chainid1 chainid_2

positional arguments: pdbfile Path to the PDB file. chainid1 First chain ID. chainid_2 Second chain ID.

optional arguments: -h, --help show this help message and exit ```

Example, score the 1B6C complex

```bash

download it

$ wget https://files.rcsb.org/view/1B6C.pdb -q

make sure the environment is activated

$ conda activate deeprank-gnn-esm-gpu-env (deeprank-gnn-esm-gpu-env) $ deeprank-gnn-esm-predict 1B6C.pdb A B 2023-06-28 06:08:21,889 predict:64 INFO - Setting up workspace - /home/DeepRank-GNN-esm/1B6C-gnnesmpredAB 2023-06-28 06:08:21,945 predict:72 INFO - Renumbering PDB file. 2023-06-28 06:08:22,294 predict:104 INFO - Reading sequence of PDB 1B6C.pdb 2023-06-28 06:08:22,423 predict:131 INFO - Generating embedding for protein sequence. 2023-06-28 06:08:22,423 predict:132 INFO - ################################################################################ 2023-06-28 06:08:32,447 predict:138 INFO - Transferred model to GPU 2023-06-28 06:08:32,450 predict:147 INFO - Read /home/1B6C-gnnesmpredAB/all.fasta with 2 sequences 2023-06-28 06:08:32,459 predict:157 INFO - Processing 1 of 1 batches (2 sequences) 2023-06-28 06:08:36,462 predict:200 INFO - ################################################################################ 2023-06-28 06:08:36,470 predict:205 INFO - Generating graph, using 79 processors Graphs added to the HDF5 file Embedding added to the /home/1B6C-gnnesmpredAB/graph.hdf5 file file 2023-06-28 06:09:03,345 predict:220 INFO - Graph file generated: /home/DeepRank-GNN-esm/1B6C-gnnesmpredAB/graph.hdf5 2023-06-28 06:09:03,345 predict:226 INFO - Predicting fnat of protein complex. 2023-06-28 06:09:03,345 predict:234 INFO - Using device: cuda:0 # ... 2023-06-28 06:09:07,794 predict:280 INFO - Predicted fnat for 1B6C between chainA and chainB: 0.359 2023-06-28 06:09:07,803 predict:290 INFO - Output written to /home/DeepRank-GNN-esm/1B6C-gnnesmpred/GNNesmprediction.csv ```

From the output above you can see that the predicted fnat for the 1B6C complex between chainA and chainB is 0.359, this information is also written to the GNN_esm_prediction.csv file.

The command above will generate a folder in the current working directory, containing the following:

1B6C-gnn_esm_pred_A_B 1B6C.pdb #input pdb file all.fasta #fasta sequence for the pdb input 1B6C.A.pt #esm-2 embedding for chainA in protein 1B6C 1B6C.B.pt #esm-2 embedding for chainB in protein 1B6C graph.hdf5 #input protein graph in hdf5 format GNN_esm_prediction.hdf5 #prediction output in hdf5 format GNN_esm_prediction.csv #prediction output in csv format

As a framework

Generate esm-2 embeddings for your protein

Generate fasta sequence in bulk, use script 'getfasta.py' ```bash usage: getfasta.py [-h] pdbdir outputfasta_name

positional arguments: pdbdir Path to the directory containing PDB files outputfasta_name Name of the combined output FASTA file

options: -h, --help show this help message and exit ```
Generate embeddings in bulk from combined fasta files, use the script provided inside esm-2 package,

bash $ python esm_2_installation_location/scripts/extract.py \ esm2_t33_650M_UR50D \ all.fasta \ tests/data/embedding/1ATN/ \ --repr_layers 0 32 33 \ --include mean per_tok Replace 'esm2installation_location' with your installation location, 'all.fasta' with fasta sequence generated above, 'tests/data/embedding/1ATN/' with the output folder name for esm embeddings

Generate graph

Example code to generate residue graphs in hdf5 format: ```python from deeprank_gnn.GraphGenMP import GraphHDF5

pdbpath = "tests/data/pdb/1ATN/" pssmpath = "tests/data/pssm/1ATN/" embeddingpath = "tests/data/embedding/1ATN/" nproc = 20 outfile = "1ATNresidue.hdf5"

GraphHDF5( pdbpath = pdbpath, pssmpath = pssmpath, embeddingpath = embeddingpath, graph_type = "residue", outfile = outfile, nproc = nproc, #number of cores to use tmpdir="./tmpdir") ```
Example code to add continuous or binary targets to the hdf5 file ```python import h5py import random

hdf5file = h5py.File('1ATNresidue.hdf5', "r+") for mol in hdf5file.keys(): fnat = random.random() binclass = [1 if fnat > 0.3 else 0] hdf5file.createdataset(f"/{mol}/score/binclass", data=binclass) hdf5file.createdataset(f"/{mol}/score/fnat", data=fnat) hdf5file.close() ```

Use pre-trained models to predict

Example code to use pre-trained DeepRank-GNN-esm model ```python from deeprankgnn.ginet import GINet from deeprankgnn.NeuralNet import NeuralNet

databasetest = "1ATNresidue.hdf5" gnn = GINet target = "fnat" edgeattr = ["dist"] threshold = 0.3 pretrainedmodel = "deeprank-GNN-esm/paperpretrainedmodels/scoringofdockingmodels/gnnesm/tregyfnatb64e20lr0.001foldallesm.pth.tar" nodefeature = ["type", "polarity", "bsa", "charge", "embedding"] devicename = "cuda:0" num_workers = 10

model = NeuralNet( databasetest, gnn, devicename = devicename, edgefeature = edgeattr, nodefeature = nodefeature, target = target, numworkers = numworkers, pretrainedmodel = pretrained_model, threshold = threshold)

model.test(hdf5 = "tmpdir/GNNesmprediction.hdf5") ```

Note about input pdb files

To make sure the mapping between interface residue and esm-2 embeddings is correct, make sure that for all the chains, residue numbering in the PDB file is continuous and starts with residue '1'. We provide a script (scripts/pdb_renumber.py) to do the numbering.

Owner

Name: DeepRank
Login: DeepRank
Kind: organization

Repositories: 19
Profile: https://github.com/DeepRank

GitHub Events

Total

Issues event: 5
Watch event: 7
Issue comment event: 6
Fork event: 2

Last Year

Issues event: 5
Watch event: 7
Issue comment event: 6
Fork event: 2

Dependencies

.github/workflows/build.yml actions

actions/checkout v2 composite
conda-incubator/setup-miniconda v2 composite

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

deeprank-gnn-esm

Science Score: 26.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

:bell: Archiving Note

DeepRank-GNN-esm

Installation

Usage

As a scoring function

download it

make sure the environment is activated

As a framework

Generate esm-2 embeddings for your protein

Generate graph

Use pre-trained models to predict

Note about input pdb files

Owner

GitHub Events

Total

Last Year

Dependencies