https://github.com/bioinfomachinelearning/deepinteract

A geometric deep learning framework (Geometric Transformers) for predicting protein interface contacts. (ICLR 2022)

https://github.com/bioinfomachinelearning/deepinteract

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 3 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.0%) to scientific vocabulary

Keywords

bioinformatics deep-learning docker geometric-deep-learning graph-neural-networks machine-learning protein-protein-interactions proteins transformers
Last synced: 5 months ago · JSON representation

Repository

A geometric deep learning framework (Geometric Transformers) for predicting protein interface contacts. (ICLR 2022)

Basic Info
Statistics
  • Stars: 64
  • Watchers: 2
  • Forks: 11
  • Open Issues: 4
  • Releases: 0
Topics
bioinformatics deep-learning docker geometric-deep-learning graph-neural-networks machine-learning protein-protein-interactions proteins transformers
Created over 4 years ago · Last pushed over 3 years ago
Metadata Files
Readme Contributing License

README.md

# Source code for Geometric Transformers for Protein Interface Contact Prediction (ICLR 2022) [![Paper](http://img.shields.io/badge/paper-arxiv.2110.02423-B31B1B.svg)](https://openreview.net/forum?id=CS4463zx6Hi) [![DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.6671582.svg)](https://doi.org/10.5281/zenodo.6671582) [](https://pypi.org/project/DeepInteract/) ![DeepInteract Architecture](https://github.com/BioinfoMachineLearning/DeepInteract/blob/main/img/DeepInteract_Architecture.png) ![Geometric Transformer](https://github.com/BioinfoMachineLearning/DeepInteract/blob/main/img/Geometric_Transformer.png)

Description

A geometric deep learning pipeline for predicting protein interface contacts.

Citing this work

If you use the code or data associated with this package, please cite:

bibtex @inproceedings{morehead2022geometric, title={Geometric Transformers for Protein Interface Contact Prediction}, author={Alex Morehead and Chen Chen and Jianlin Cheng}, booktitle={International Conference on Learning Representations}, year={2022}, url={https://openreview.net/forum?id=CS4463zx6Hi} }

First time setup

The following step is required in order to run DeepInteract:

Genetic databases

This step requires aria2c to be installed on your machine.

DeepInteract needs only one of the following genetic (sequence) databases compatible with HH-suite3 to run:

Install the BFD for HH-suite3

```bash

Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):

DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/bfd" mkdir "~/Data" "$DOWNLOADDIR" "$ROOT_DIR"

Mirror of:

https://bfd.mmseqs.com/bfdmetaclustclucompleteid30c90finalseq.sortedopt.tar.gz.

SOURCEURL="https://storage.googleapis.com/alphafold-databases/casp14versions/bfdmetaclustclucompleteid30c90finalseq.sortedopt.tar.gz" BASENAME=$(basename "${SOURCE_URL}")

mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" tar --extract --verbose --file="${ROOTDIR}/${BASENAME}" \ --directory="${ROOTDIR}" rm "${ROOTDIR}/${BASENAME}"

The CLI argument --hhsuitedb for litmodel_predict.py

should then become '~/Data/Databases/bfd/bfdmetaclustclucompleteid30c90finalseq.sortedopt'

```

(Smaller Alternative) Install the Small BFD for HH-suite3

```bash

Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):

DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/smallbfd" mkdir "~/Data" "$DOWNLOADDIR" "$ROOTDIR" SOURCEURL="https://storage.googleapis.com/alphafold-databases/reduceddbs/bfd-firstnonconsensussequences.fasta.gz" BASENAME=$(basename "${SOURCEURL}")

mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" pushd "${ROOTDIR}" gunzip "${ROOT_DIR}/${BASENAME}" popd

The CLI argument --hhsuitedb for litmodel_predict.py

should then become '~/Data/Databases/smallbfd/bfd-firstnonconsensussequences.fasta'

```

(Smaller Alternative) Install Uniclust30 for HH-suite3

```bash

Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):

DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/uniclust30" mkdir "~/Data" "$DOWNLOADDIR" "$ROOT_DIR"

Mirror of:

http://wwwuser.gwdg.de/~compbiol/uniclust/201808/uniclust30201808hhsuite.tar.gz

SOURCEURL="https://storage.googleapis.com/alphafold-databases/casp14versions/uniclust30201808hhsuite.tar.gz" BASENAME=$(basename "${SOURCEURL}")

mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" tar --extract --verbose --file="${ROOTDIR}/${BASENAME}" \ --directory="${ROOTDIR}" rm "${ROOTDIR}/${BASENAME}"

The CLI argument --hhsuitedb for litmodel_predict.py

should then become '~/Data/Databases/uniclust30/uniclust30201808/uniclust30201808'

```

Repository Directory Structure

DeepInteract │ └───docker │ └───img │ └───project │ └───checkpoints │ └───datasets │ │ │ └───builder │ │ │ └───DB5 │ │ │ │ │ └───final │ │ │ │ │ │ │ └───processed │ │ │ │ │ │ │ └───raw │ │ │ │ │ db5_dgl_data_module.py │ │ db5_dgl_dataset.py │ │ │ └───CASP_CAPRI │ │ │ │ │ └───final │ │ │ │ │ │ │ └───processed │ │ │ │ │ │ │ └───raw │ │ │ │ │ casp_capri_dgl_data_module.py │ │ casp_capri_dgl_dataset.py │ │ │ └───DIPS │ │ │ │ │ └───final │ │ │ │ │ │ │ └───processed │ │ │ │ │ │ │ └───raw │ │ │ │ │ dips_dgl_data_module.py │ │ dips_dgl_dataset.py │ │ │ └───Input │ │ │ │ │ └───final │ │ │ │ │ │ │ └───processed │ │ │ │ │ │ │ └───raw │ │ │ │ │ └───interim │ │ │ │ │ │ │ └───complexes │ │ │ │ │ │ │ └───external_feats │ │ │ │ │ │ │ │ │ └───PSAIA │ │ │ │ │ │ │ │ │ └───INPUT │ │ │ │ │ │ │ └───pairs │ │ │ │ │ │ │ └───parsed │ │ │ │ │ └───raw │ │ │ └───PICP │ picp_dgl_data_module.py │ └───test_data │ └───utils │ deepinteract_constants.py │ deepinteract_modules.py │ deepinteract_utils.py │ dips_plus_utils.py │ graph_utils.py │ protein_feature_utils.py │ vision_modules.py │ lit_model_predict.py lit_model_predict_docker.py lit_model_train.py .gitignore CONTRIBUTING.md environment.yml LICENSE README.md requirements.txt setup.cfg setup.py

Running DeepInteract via Docker

The simplest way to run DeepInteract is using the provided Docker script.

The following steps are required in order to ensure Docker is installed and working correctly:

  1. Install Docker.

  2. Check that DeepInteract will be able to use a GPU by running:

    bash docker run --rm --gpus all nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04 nvidia-smi

    The output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.

Now that we know Docker is functioning properly, we can begin building our Docker image for DeepInteract:

  1. Clone this repository and cd into it.

    bash git clone https://github.com/BioinfoMachineLearning/DeepInteract cd DeepInteract/ DI_DIR=$(pwd)

  2. Download our trained model checkpoints.

    bash mkdir -p project/checkpoints wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckpt

  3. Build the Docker image (Warning: Requires ~13GB of Space):

    bash docker build -f docker/Dockerfile -t deepinteract .

  4. Install the run_docker.py dependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.

    bash pip3 install -r docker/requirements.txt

  5. Create directory in which to generate input features and outputs:

    bash mkdir -p project/datasets/Input

  6. Run run_docker.py pointing to two input PDB files containing the first and second chains of a complex for which you wish to predict the contact probability map. For example, for the DIPS-Plus test target with the PDB ID 4HEQ:

    bash python3 docker/run_docker.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --input_dataset_dir "$DI_DIR"/project/datasets/Input --ckpt_name "$DI_DIR"/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --num_gpus 0

    This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.

  7. Note that by using the default

    bash --num_gpus 0

    flag when executing run_docker.py, the Docker container will only make use of the system's available CPU(s) for prediction. However, by specifying

    bash --num_gpus 1

    when executing run_docker.py, the Docker container will then employ the first available GPU for prediction.

Running DeepInteract via a Traditional Installation (for Linux-Based Operating Systems)

First, install and configure Conda environment:

```bash

Clone this repository:

git clone https://github.com/BioinfoMachineLearning/DeepInteract

Change to project directory:

cd DeepInteract DI_DIR=$(pwd)

Set up Conda environment locally

conda env create --name DeepInteract -f environment.yml

Activate Conda environment located in the current directory:

conda activate DeepInteract

(Optional) Perform a full install of the pip dependencies described in 'requirements.txt':

pip3 install -r requirements.txt

(Optional) To remove the long Conda environment prefix in your shell prompt, modify the env_prompt setting in your .condarc file with:

conda config --set env_prompt '({name})' ```

Installing PSAIA

Install GCC 10 for PSAIA:

```bash

Install GCC 10 for Ubuntu 20.04

sudo apt install software-properties-common sudo add-apt-repository ppa:ubuntu-toolchain-r/ppa sudo apt update sudo apt install gcc-10 g++-10

Or install GCC 10 for Arch Linux/Manjaro

yay -S gcc10 ```

Install QT4 for PSAIA:

```bash

Install QT4 for Ubuntu 20.04:

sudo add-apt-repository ppa:rock-core/qt4 sudo apt update sudo apt install libqt4* libqtcore4 libqtgui4 libqtwebkit4 qt4* libxext-dev

Or install QT4 for Arch Linux/Manjaro

yay -S qt4 ```

Compile PSAIA from source:

```bash

Select the location to install the software:

MY_LOCAL=~/Programs

Download and extract PSAIA's source code:

mkdir "$MYLOCAL" cd "$MYLOCAL" wget http://complex.zesoi.fer.hr/data/PSAIA-1.0-source.tar.gz tar -xvzf PSAIA-1.0-source.tar.gz

Compile PSAIA (i.e., a GUI for PSA):

cd PSAIA1.0source/make/linux/psaia/ qmake-qt4 psaia.pro make

Compile PSA (i.e., the protein structure analysis (PSA) program):

cd ../psa/ qmake-qt4 psa.pro make

Compile PIA (i.e., the protein interaction analysis (PIA) program):

cd ../pia/ qmake-qt4 pia.pro make

Test run any of the above-compiled programs:

cd "$MYLOCAL"/PSAIA1.0_source/bin/linux

Test run PSA inside a GUI:

./psaia/psaia

Test run PIA through a terminal:

./pia/pia

Test run PSA through a terminal:

./psa/psa ```

Finally, substitute your absolute filepath for DeepInteract (i.e., where on your local storage device you downloaded the repository to) anywhere DeepInteract's local repository is referenced in project/datasets/builder/psaia_config_file_input.txt.

Training

Download training and cross-validation DGLGraphs

To train, fine-tune, or test DeepInteract models using CASP-CAPRI, DB5-Plus, or DIPS-Plus targets, we first need to download the preprocessed DGLGraphs from Zenodo:

```bash

Download and extract preprocessed DGLGraphs for CASP-CAPRI, DB5-Plus, and DIPS-Plus

Requires ~55GB of free space

Download CASP-CAPRI

mkdir -p project/datasets/CASPCAPRI/final cd project/datasets/CASPCAPRI/final wget https://zenodo.org/record/6671582/files/finalrawcaspcapri.tar.gz wget https://zenodo.org/record/6671582/files/finalprocessedcaspcapri.tar.gz

Extract CASP-CAPRI

tar -xzf finalrawcaspcapri.tar.gz tar -xzf finalprocessedcaspcapri.tar.gz rm finalrawcaspcapri.tar.gz finalprocessedcaspcapri.tar.gz

Download DB5-Plus

mkdir -p ../../DB5/final cd ../../DB5/final wget https://zenodo.org/record/6671582/files/finalrawdb5.tar.gz wget https://zenodo.org/record/6671582/files/finalprocesseddb5.tar.gz

Extract DB5-Plus

tar -xzf finalrawdb5.tar.gz tar -xzf finalprocesseddb5.tar.gz rm finalrawdb5.tar.gz finalprocesseddb5.tar.gz

Download DIPS-Plus

mkdir -p ../../DIPS/final cd ../../DIPS/final wget https://zenodo.org/record/6671582/files/finalrawdips.tar.gz wget https://zenodo.org/record/6671582/files/finalprocesseddips.tar.gz.partaa wget https://zenodo.org/record/6671582/files/finalprocesseddips.tar.gz.partab

First, reassemble all processed DGLGraphs

We split the (tar.gz) archive into two separate parts with

'split -b 4096M finalprocesseddips.tar.gz "finalprocesseddips.tar.gz.part"'

to upload it to Zenodo, so to recover the original archive:

cat finalprocesseddips.tar.gz.parta* >finalprocesseddips.tar.gz

Extract DIPS-Plus

tar -xzf finalrawdips.tar.gz tar -xzf finalprocesseddips.tar.gz rm finalprocesseddips.tar.gz.parta* finalrawdips.tar.gz finalprocesseddips.tar.gz ```

Navigate to the project directory and run the training script with the parameters desired:

```bash

Hint: Run python3 lit_model_train.py --help to see all available CLI arguments

cd project python3 litmodeltrain.py --lr 1e-3 --weight_decay 1e-2 cd .. ```

Inference

Download trained model checkpoints

```bash

Return to root directory of DeepInteract repository

cd "$DI_DIR"

Download our trained model checkpoints

mkdir -p project/checkpoints wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckpt ```

Predict interface contact probability maps

Navigate to the project directory and run the prediction script with the filenames of the left and right PDB chains.

bash # Hint: Run `python3 lit_model_predict.py --help` to see all available CLI arguments cd project python3 lit_model_predict.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --ckpt_dir "$DI_DIR"/project/checkpoints --ckpt_name LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt cd ..

This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.

Acknowledgements

DeepInteract communicates with and/or references the following separate libraries and packages:

We thank all their contributors and maintainers!

License and Disclaimer

Copyright 2021 University of Missouri-Columbia Bioinformatics & Machine Learning (BML) Lab.

DeepInteract Code License

Licensed under the GNU Public License, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.gnu.org/licenses/gpl-3.0.en.html.

Third-party software

Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.

Owner

  • Name: BioinfoMachineLearning
  • Login: BioinfoMachineLearning
  • Kind: organization

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2

Committers

Last synced: almost 3 years ago

All Time
  • Total Commits: 54
  • Total Committers: 2
  • Avg Commits per committer: 27.0
  • Development Distribution Score (DDS): 0.019
Top Committers
Name Email Commits
Alex Morehead a****d@g****m 53
Jianlin Cheng j****g@g****m 1

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 18
  • Total pull requests: 3
  • Average time to close issues: about 2 months
  • Average time to close pull requests: less than a minute
  • Total issue authors: 9
  • Total pull request authors: 1
  • Average comments per issue: 4.06
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • terry-r123 (5)
  • onlyonewater (5)
  • peter5842 (2)
  • KiAkize (1)
  • XuBlack (1)
  • gabrielepozzati (1)
  • amorehead (1)
  • olkidel (1)
Pull Request Authors
  • amorehead (3)
Top Labels
Issue Labels
Pull Request Labels
bug (1)

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 16 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 9
  • Total maintainers: 1
pypi.org: deepinteract

A geometric deep learning pipeline for predicting protein interface contacts.

  • Versions: 9
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 16 Last month
Rankings
Stargazers count: 9.5%
Dependent packages count: 10.1%
Forks count: 10.5%
Average: 17.5%
Dependent repos count: 21.6%
Downloads: 36.0%
Maintainers (1)
Last synced: 6 months ago

Dependencies

environment.yml conda
  • aria2 1.34.0
  • biopython 1.78
  • cudatoolkit 11.2.*
  • dssp 3.0.0
  • hhsuite 3.3.0
  • msms 2.6.1
  • numpy 1.21.2
  • pandas 1.4.2
  • pip 21.1.2
  • python 3.8
  • pytorch 1.7.1
  • requests 2.26.0
  • scikit-learn 0.24.2
  • scipy 1.4.1
  • torchaudio 0.7.2
  • torchvision 0.8.2
docker/requirements.txt pypi
  • absl-py ==0.13.0
  • docker ==5.0.2
requirements.txt pypi
  • Sphinx ==4.0.1
  • atom3-py3 ==0.1.9.8
  • biopandas ==0.2.9
  • click ==8.0.1
  • dill ==0.3.4
  • easy-parallel-py3 ==0.1.6.4
  • fairscale ==0.4.0
  • networkx ==2.6.2
  • pytorch-lightning ==1.4.8
  • setuptools ==57.4.0
  • timm ==0.4.12
  • torchmetrics ==0.5.1
  • tqdm ==4.62.0
  • wandb ==0.12.2
setup.py pypi
  • Sphinx ==4.0.1
  • atom3-py3 ==0.1.9.8
  • biopandas ==0.2.9
  • click ==8.0.1
  • dill ==0.3.4
  • easy-parallel-py3 ==0.1.6.4
  • fairscale ==0.4.0
  • networkx ==2.6.2
  • pytorch-lightning ==1.4.8
  • setuptools ==57.4.0
  • timm ==0.4.12
  • torchmetrics ==0.5.1
  • tqdm ==4.62.0
  • wandb ==0.12.2