https://github.com/bioinfomachinelearning/deepinteract
A geometric deep learning framework (Geometric Transformers) for predicting protein interface contacts. (ICLR 2022)
Science Score: 23.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.0%) to scientific vocabulary
Keywords
Repository
A geometric deep learning framework (Geometric Transformers) for predicting protein interface contacts. (ICLR 2022)
Basic Info
- Host: GitHub
- Owner: BioinfoMachineLearning
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Homepage: https://zenodo.org/record/6671582
- Size: 3 MB
Statistics
- Stars: 64
- Watchers: 2
- Forks: 11
- Open Issues: 4
- Releases: 0
Topics
Metadata Files
README.md
Description
A geometric deep learning pipeline for predicting protein interface contacts.
Citing this work
If you use the code or data associated with this package, please cite:
bibtex
@inproceedings{morehead2022geometric,
title={Geometric Transformers for Protein Interface Contact Prediction},
author={Alex Morehead and Chen Chen and Jianlin Cheng},
booktitle={International Conference on Learning Representations},
year={2022},
url={https://openreview.net/forum?id=CS4463zx6Hi}
}
First time setup
The following step is required in order to run DeepInteract:
Genetic databases
This step requires aria2c to be installed on your machine.
DeepInteract needs only one of the following genetic (sequence) databases compatible with HH-suite3 to run:
- BFD (Requires ~1.7TB of Space When Unextracted)
- Small BFD (Requires ~17GB of Space When Unextracted)
- Uniclust30 (Requires ~86GB of Space When Unextracted)
Install the BFD for HH-suite3
```bash
Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/bfd" mkdir "~/Data" "$DOWNLOADDIR" "$ROOT_DIR"
Mirror of:
https://bfd.mmseqs.com/bfdmetaclustclucompleteid30c90finalseq.sortedopt.tar.gz.
SOURCEURL="https://storage.googleapis.com/alphafold-databases/casp14versions/bfdmetaclustclucompleteid30c90finalseq.sortedopt.tar.gz" BASENAME=$(basename "${SOURCE_URL}")
mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" tar --extract --verbose --file="${ROOTDIR}/${BASENAME}" \ --directory="${ROOTDIR}" rm "${ROOTDIR}/${BASENAME}"
The CLI argument --hhsuitedb for litmodel_predict.py
should then become '~/Data/Databases/bfd/bfdmetaclustclucompleteid30c90finalseq.sortedopt'
```
(Smaller Alternative) Install the Small BFD for HH-suite3
```bash
Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/smallbfd" mkdir "~/Data" "$DOWNLOADDIR" "$ROOTDIR" SOURCEURL="https://storage.googleapis.com/alphafold-databases/reduceddbs/bfd-firstnonconsensussequences.fasta.gz" BASENAME=$(basename "${SOURCEURL}")
mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" pushd "${ROOTDIR}" gunzip "${ROOT_DIR}/${BASENAME}" popd
The CLI argument --hhsuitedb for litmodel_predict.py
should then become '~/Data/Databases/smallbfd/bfd-firstnonconsensussequences.fasta'
```
(Smaller Alternative) Install Uniclust30 for HH-suite3
```bash
Following script originally from AlphaFold2 (https://github.com/deepmind/alphafold):
DOWNLOADDIR="~/Data/Databases" ROOTDIR="${DOWNLOADDIR}/uniclust30" mkdir "~/Data" "$DOWNLOADDIR" "$ROOT_DIR"
Mirror of:
http://wwwuser.gwdg.de/~compbiol/uniclust/201808/uniclust30201808hhsuite.tar.gz
SOURCEURL="https://storage.googleapis.com/alphafold-databases/casp14versions/uniclust30201808hhsuite.tar.gz" BASENAME=$(basename "${SOURCEURL}")
mkdir --parents "${ROOTDIR}" aria2c "${SOURCEURL}" --dir="${ROOTDIR}" tar --extract --verbose --file="${ROOTDIR}/${BASENAME}" \ --directory="${ROOTDIR}" rm "${ROOTDIR}/${BASENAME}"
The CLI argument --hhsuitedb for litmodel_predict.py
should then become '~/Data/Databases/uniclust30/uniclust30201808/uniclust30201808'
```
Repository Directory Structure
DeepInteract
│
└───docker
│
└───img
│
└───project
│
└───checkpoints
│
└───datasets
│ │
│ └───builder
│ │
│ └───DB5
│ │ │
│ │ └───final
│ │ │ │
│ │ │ └───processed
│ │ │ │
│ │ │ └───raw
│ │ │
│ │ db5_dgl_data_module.py
│ │ db5_dgl_dataset.py
│ │
│ └───CASP_CAPRI
│ │ │
│ │ └───final
│ │ │ │
│ │ │ └───processed
│ │ │ │
│ │ │ └───raw
│ │ │
│ │ casp_capri_dgl_data_module.py
│ │ casp_capri_dgl_dataset.py
│ │
│ └───DIPS
│ │ │
│ │ └───final
│ │ │ │
│ │ │ └───processed
│ │ │ │
│ │ │ └───raw
│ │ │
│ │ dips_dgl_data_module.py
│ │ dips_dgl_dataset.py
│ │
│ └───Input
│ │ │
│ │ └───final
│ │ │ │
│ │ │ └───processed
│ │ │ │
│ │ │ └───raw
│ │ │
│ │ └───interim
│ │ │ │
│ │ │ └───complexes
│ │ │ │
│ │ │ └───external_feats
│ │ │ │ │
│ │ │ │ └───PSAIA
│ │ │ │ │
│ │ │ │ └───INPUT
│ │ │ │
│ │ │ └───pairs
│ │ │ │
│ │ │ └───parsed
│ │ │
│ │ └───raw
│ │
│ └───PICP
│ picp_dgl_data_module.py
│
└───test_data
│
└───utils
│ deepinteract_constants.py
│ deepinteract_modules.py
│ deepinteract_utils.py
│ dips_plus_utils.py
│ graph_utils.py
│ protein_feature_utils.py
│ vision_modules.py
│
lit_model_predict.py
lit_model_predict_docker.py
lit_model_train.py
.gitignore
CONTRIBUTING.md
environment.yml
LICENSE
README.md
requirements.txt
setup.cfg
setup.py
Running DeepInteract via Docker
The simplest way to run DeepInteract is using the provided Docker script.
The following steps are required in order to ensure Docker is installed and working correctly:
Install Docker.
- Install NVIDIA Container Toolkit for GPU support.
- Setup running Docker as a non-root user.
Check that DeepInteract will be able to use a GPU by running:
bash docker run --rm --gpus all nvidia/cuda:11.2.2-cudnn8-runtime-ubuntu20.04 nvidia-smiThe output of this command should show a list of your GPUs. If it doesn't, check if you followed all steps correctly when setting up the NVIDIA Container Toolkit or take a look at the following NVIDIA Docker issue.
Now that we know Docker is functioning properly, we can begin building our Docker image for DeepInteract:
Clone this repository and
cdinto it.bash git clone https://github.com/BioinfoMachineLearning/DeepInteract cd DeepInteract/ DI_DIR=$(pwd)Download our trained model checkpoints.
bash mkdir -p project/checkpoints wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckptBuild the Docker image (Warning: Requires ~13GB of Space):
bash docker build -f docker/Dockerfile -t deepinteract .Install the
run_docker.pydependencies. Note: You may optionally wish to create a Python Virtual Environment to prevent conflicts with your system's Python environment.bash pip3 install -r docker/requirements.txtCreate directory in which to generate input features and outputs:
bash mkdir -p project/datasets/InputRun
run_docker.pypointing to two input PDB files containing the first and second chains of a complex for which you wish to predict the contact probability map. For example, for the DIPS-Plus test target with the PDB ID4HEQ:bash python3 docker/run_docker.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --input_dataset_dir "$DI_DIR"/project/datasets/Input --ckpt_name "$DI_DIR"/project/checkpoints/LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt --num_gpus 0This script will generate and (as NumPy array files - e.g.,
test_data/4heq_contact_prob_map.npy) save to the given input directory the predicted interface contact map as well as the Geometric Transformer's learned node and edge representations for both chain graphs.Note that by using the default
bash --num_gpus 0flag when executing
run_docker.py, the Docker container will only make use of the system's available CPU(s) for prediction. However, by specifyingbash --num_gpus 1when executing
run_docker.py, the Docker container will then employ the first available GPU for prediction.
Running DeepInteract via a Traditional Installation (for Linux-Based Operating Systems)
First, install and configure Conda environment:
```bash
Clone this repository:
git clone https://github.com/BioinfoMachineLearning/DeepInteract
Change to project directory:
cd DeepInteract DI_DIR=$(pwd)
Set up Conda environment locally
conda env create --name DeepInteract -f environment.yml
Activate Conda environment located in the current directory:
conda activate DeepInteract
(Optional) Perform a full install of the pip dependencies described in 'requirements.txt':
pip3 install -r requirements.txt
(Optional) To remove the long Conda environment prefix in your shell prompt, modify the env_prompt setting in your .condarc file with:
conda config --set env_prompt '({name})' ```
Installing PSAIA
Install GCC 10 for PSAIA:
```bash
Install GCC 10 for Ubuntu 20.04
sudo apt install software-properties-common sudo add-apt-repository ppa:ubuntu-toolchain-r/ppa sudo apt update sudo apt install gcc-10 g++-10
Or install GCC 10 for Arch Linux/Manjaro
yay -S gcc10 ```
Install QT4 for PSAIA:
```bash
Install QT4 for Ubuntu 20.04:
sudo add-apt-repository ppa:rock-core/qt4 sudo apt update sudo apt install libqt4* libqtcore4 libqtgui4 libqtwebkit4 qt4* libxext-dev
Or install QT4 for Arch Linux/Manjaro
yay -S qt4 ```
Compile PSAIA from source:
```bash
Select the location to install the software:
MY_LOCAL=~/Programs
Download and extract PSAIA's source code:
mkdir "$MYLOCAL" cd "$MYLOCAL" wget http://complex.zesoi.fer.hr/data/PSAIA-1.0-source.tar.gz tar -xvzf PSAIA-1.0-source.tar.gz
Compile PSAIA (i.e., a GUI for PSA):
cd PSAIA1.0source/make/linux/psaia/ qmake-qt4 psaia.pro make
Compile PSA (i.e., the protein structure analysis (PSA) program):
cd ../psa/ qmake-qt4 psa.pro make
Compile PIA (i.e., the protein interaction analysis (PIA) program):
cd ../pia/ qmake-qt4 pia.pro make
Test run any of the above-compiled programs:
cd "$MYLOCAL"/PSAIA1.0_source/bin/linux
Test run PSA inside a GUI:
./psaia/psaia
Test run PIA through a terminal:
./pia/pia
Test run PSA through a terminal:
./psa/psa ```
Finally, substitute your absolute filepath for DeepInteract
(i.e., where on your local storage device you downloaded the
repository to) anywhere DeepInteract's local repository is
referenced in project/datasets/builder/psaia_config_file_input.txt.
Training
Download training and cross-validation DGLGraphs
To train, fine-tune, or test DeepInteract models using CASP-CAPRI, DB5-Plus, or DIPS-Plus targets, we first need to download the preprocessed DGLGraphs from Zenodo:
```bash
Download and extract preprocessed DGLGraphs for CASP-CAPRI, DB5-Plus, and DIPS-Plus
Requires ~55GB of free space
Download CASP-CAPRI
mkdir -p project/datasets/CASPCAPRI/final cd project/datasets/CASPCAPRI/final wget https://zenodo.org/record/6671582/files/finalrawcaspcapri.tar.gz wget https://zenodo.org/record/6671582/files/finalprocessedcaspcapri.tar.gz
Extract CASP-CAPRI
tar -xzf finalrawcaspcapri.tar.gz tar -xzf finalprocessedcaspcapri.tar.gz rm finalrawcaspcapri.tar.gz finalprocessedcaspcapri.tar.gz
Download DB5-Plus
mkdir -p ../../DB5/final cd ../../DB5/final wget https://zenodo.org/record/6671582/files/finalrawdb5.tar.gz wget https://zenodo.org/record/6671582/files/finalprocesseddb5.tar.gz
Extract DB5-Plus
tar -xzf finalrawdb5.tar.gz tar -xzf finalprocesseddb5.tar.gz rm finalrawdb5.tar.gz finalprocesseddb5.tar.gz
Download DIPS-Plus
mkdir -p ../../DIPS/final cd ../../DIPS/final wget https://zenodo.org/record/6671582/files/finalrawdips.tar.gz wget https://zenodo.org/record/6671582/files/finalprocesseddips.tar.gz.partaa wget https://zenodo.org/record/6671582/files/finalprocesseddips.tar.gz.partab
First, reassemble all processed DGLGraphs
We split the (tar.gz) archive into two separate parts with
'split -b 4096M finalprocesseddips.tar.gz "finalprocesseddips.tar.gz.part"'
to upload it to Zenodo, so to recover the original archive:
cat finalprocesseddips.tar.gz.parta* >finalprocesseddips.tar.gz
Extract DIPS-Plus
tar -xzf finalrawdips.tar.gz tar -xzf finalprocesseddips.tar.gz rm finalprocesseddips.tar.gz.parta* finalrawdips.tar.gz finalprocesseddips.tar.gz ```
Navigate to the project directory and run the training script with the parameters desired:
```bash
Hint: Run python3 lit_model_train.py --help to see all available CLI arguments
cd project python3 litmodeltrain.py --lr 1e-3 --weight_decay 1e-2 cd .. ```
Inference
Download trained model checkpoints
```bash
Return to root directory of DeepInteract repository
cd "$DI_DIR"
Download our trained model checkpoints
mkdir -p project/checkpoints wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet.ckpt wget -P project/checkpoints https://zenodo.org/record/6671582/files/LitGINI-GeoTran-DilResNet-DB5-Fine-Tuned.ckpt ```
Predict interface contact probability maps
Navigate to the project directory and run the prediction script with the filenames of the left and right PDB chains.
bash
# Hint: Run `python3 lit_model_predict.py --help` to see all available CLI arguments
cd project
python3 lit_model_predict.py --left_pdb_filepath "$DI_DIR"/project/test_data/4heq_l_u.pdb --right_pdb_filepath "$DI_DIR"/project/test_data/4heq_r_u.pdb --ckpt_dir "$DI_DIR"/project/checkpoints --ckpt_name LitGINI-GeoTran-DilResNet.ckpt --hhsuite_db ~/Data/Databases/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt
cd ..
This script will generate and (as NumPy array files - e.g., test_data/4heq_contact_prob_map.npy)
save to the given input directory the predicted interface contact map as well as the
Geometric Transformer's learned node and edge representations for both chain graphs.
Acknowledgements
DeepInteract communicates with and/or references the following separate libraries and packages:
We thank all their contributors and maintainers!
License and Disclaimer
Copyright 2021 University of Missouri-Columbia Bioinformatics & Machine Learning (BML) Lab.
DeepInteract Code License
Licensed under the GNU Public License, Version 3.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.gnu.org/licenses/gpl-3.0.en.html.
Third-party software
Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
Owner
- Name: BioinfoMachineLearning
- Login: BioinfoMachineLearning
- Kind: organization
- Repositories: 29
- Profile: https://github.com/BioinfoMachineLearning
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2
Committers
Last synced: almost 3 years ago
All Time
- Total Commits: 54
- Total Committers: 2
- Avg Commits per committer: 27.0
- Development Distribution Score (DDS): 0.019
Top Committers
| Name | Commits | |
|---|---|---|
| Alex Morehead | a****d@g****m | 53 |
| Jianlin Cheng | j****g@g****m | 1 |
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 18
- Total pull requests: 3
- Average time to close issues: about 2 months
- Average time to close pull requests: less than a minute
- Total issue authors: 9
- Total pull request authors: 1
- Average comments per issue: 4.06
- Average comments per pull request: 0.0
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- terry-r123 (5)
- onlyonewater (5)
- peter5842 (2)
- KiAkize (1)
- XuBlack (1)
- gabrielepozzati (1)
- amorehead (1)
- olkidel (1)
Pull Request Authors
- amorehead (3)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 16 last-month
- Total dependent packages: 0
- Total dependent repositories: 1
- Total versions: 9
- Total maintainers: 1
pypi.org: deepinteract
A geometric deep learning pipeline for predicting protein interface contacts.
- Homepage: https://github.com/BioinfoMachineLearning/DeepInteract
- Documentation: https://deepinteract.readthedocs.io/
- License: GNU Public License, Version 3.0
-
Latest release: 1.0.9
published about 4 years ago
Rankings
Maintainers (1)
Dependencies
- aria2 1.34.0
- biopython 1.78
- cudatoolkit 11.2.*
- dssp 3.0.0
- hhsuite 3.3.0
- msms 2.6.1
- numpy 1.21.2
- pandas 1.4.2
- pip 21.1.2
- python 3.8
- pytorch 1.7.1
- requests 2.26.0
- scikit-learn 0.24.2
- scipy 1.4.1
- torchaudio 0.7.2
- torchvision 0.8.2
- absl-py ==0.13.0
- docker ==5.0.2
- Sphinx ==4.0.1
- atom3-py3 ==0.1.9.8
- biopandas ==0.2.9
- click ==8.0.1
- dill ==0.3.4
- easy-parallel-py3 ==0.1.6.4
- fairscale ==0.4.0
- networkx ==2.6.2
- pytorch-lightning ==1.4.8
- setuptools ==57.4.0
- timm ==0.4.12
- torchmetrics ==0.5.1
- tqdm ==4.62.0
- wandb ==0.12.2
- Sphinx ==4.0.1
- atom3-py3 ==0.1.9.8
- biopandas ==0.2.9
- click ==8.0.1
- dill ==0.3.4
- easy-parallel-py3 ==0.1.6.4
- fairscale ==0.4.0
- networkx ==2.6.2
- pytorch-lightning ==1.4.8
- setuptools ==57.4.0
- timm ==0.4.12
- torchmetrics ==0.5.1
- tqdm ==4.62.0
- wandb ==0.12.2