ppiformer

Learning to design protein-protein interactions with enhanced generalization (ICLR 2024)

https://github.com/anton-bushuiev/ppiformer

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (14.9%) to scientific vocabulary

Keywords

equivariant-representations machine-learning protein-design protein-protein-interactions proteins

Last synced: 6 months ago · JSON representation ·

Repository

Learning to design protein-protein interactions with enhanced generalization (ICLR 2024)

Basic Info

Host: GitHub
Owner: anton-bushuiev
License: mit
Language: Jupyter Notebook
Default Branch: main
Homepage: https://arxiv.org/abs/2310.18515
Size: 2.16 MB

Statistics

Stars: 47
Watchers: 5
Forks: 4
Open Issues: 0
Releases: 1

Topics

equivariant-representations machine-learning protein-design protein-protein-interactions proteins

Created over 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

# PPIformer [![arXiv badge](https://img.shields.io/badge/arXiv-2310.18515-b31b1b.svg)](https://arxiv.org/abs/2310.18515) [![Zenodo badge](https://zenodo.org/badge/DOI/10.5281/zenodo.12789167.svg)](https://doi.org/10.5281/zenodo.12789167) [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/anton-bushuiev/PPIformer) [![Open in Spaces](https://huggingface.co/datasets/huggingface/badges/resolve/main/open-in-hf-spaces-sm-dark.svg)](https://huggingface.co/spaces/anton-bushuiev/PPIformer-CPU)

PPIformer is a state-of-the-art predictor of the effects of mutations on protein-protein interactions (PPIs), as quantified by the binding energy changes (ddG). The model was pre-trained on the PPIRef dataset via a coarse-grained structural masked modeling and fine-tuned on SKEMPI v2.0 via log odds. PPIformer was shown to successfully identify known favorable mutations of the staphylokinase thrombolytic and a human antibody against the SARS-CoV-2 spike protein. Please see more details in our paper.

Please do not hesitate to contact us or create an issue/PR if you have any questions or suggestions. ✌️

Web on Hugging Face Spaces 🤗

The preview of PPIformer is available via an interactive user interface on Hugging Face Spaces:

The Hugging Face Space above is running using Zero GPU, which is currently in beta. If you experience any issues, please try the CPU-only version below:

Installation

Step 1. To install PPIformer locally, clone this repository and install the environment (you may need to adjust the versions of the PyTorch-based packages in the script depending on your system):

bash conda create -n ppiformer python==3.10 -y conda activate ppiformer git clone https://github.com/anton-bushuiev/PPIformer && pip install -e PPIformer

Step 2. (Optional) After installation, you may need to adapt PyTorch to your system. Please see the official PyTorch installation guide for details. For example, if you are using AMD GPUs, you may need to install PyTorch for ROCm:

bash pip install -U torch --index-url https://download.pytorch.org/whl/rocm6.0

Step 3. (Optional) If you are planning to re-train the model or reproduce test results (see Training and testing below), please clone and install the ppiref and mutils packages locally to download the necessary data (i.e., data split files and ddG-labeled datasets):

bash git clone https://github.com/anton-bushuiev/PPIRef && pip install -e PPIRef git clone https://github.com/anton-bushuiev/mutils && pip install -e mutils

Inference

```python import torch from ppiformer.tasks.node import DDGPPIformer from ppiformer.utils.api import downloadfromzenodo, predictddg, embed from ppiformer.definitions import PPIFORMERWEIGHTSDIR, PPIFORMERTESTDATADIR

Download the weights

downloadfromzenodo('weights.zip') ```

Predict ddG for a PPI upon mutation

PPIformer was fine-tuned on the SKEMPI v2.0 dataset via log odds. The fine-tuned models can be used to predict the binding energy changes (ddG) for a PPI upon mutation.

```python

Load the ensamble of fine-tuned models

device = 'cuda' if torch.cuda.isavailable() else 'cpu' models = [DDGPPIformer.loadfromcheckpoint(PPIFORMERWEIGHTSDIR / f'ddgregression/{i}.ckpt', map_location=torch.device('cpu')).eval() for i in range(3)] models = [model.to(device) for model in models]

Specify input

ppipath = PPIFORMERTESTDATADIR / '1buiAC.pdb' # PDB or PPIRef file (see https://ppiref.readthedocs.io/en/latest/extracting_ppis.html) muts = ['SC16A', 'FC47A', 'SC16A,FC47A'] # List of single- or multi-point mutations

Predict

ddg = predictddg(models, ppipath, muts) ddg

tensor([-0.3708, 1.5188, 1.1482]) ```

Embed PPI

PPIformer was pre-trained using structural masked modeling. The pre-trained model can be used to obtain PPI embeddings, similar to BERT embeddings in natural language processing.

```python

Load the pre-trained model

device = 'cuda' if torch.cuda.isavailable() else 'cpu' model = PPIformer.loadfromcheckpoint(PPIFORMERWEIGHTSDIR / 'maskedmodeling.ckpt', map_location=torch.device('cpu')) model = model.to(device).eval()

Specify input

ppipath = PPIFORMERTESTDATADIR / '1buiAC.pdb' # PDB or PPIRef file (see https://ppiref.readthedocs.io/en/latest/extracting_ppis.html)

Embed (get the final type-0 features). Here, 128-dimensional embedding for each of 124 amino acids in the PPI

embedding = embed(model, ppi_path) embedding.shape

torch.Size([124, 128]) ```

Training and testing

To train and validate PPIformer, please see PPIformer/scripts/README.md. To test the model and reproduce the results from the paper, please see PPIformer/notebooks/test.ipynb.

How it works

The model was pre-trained on the PPIRef dataset via a coarse-grained structural masked modeling and fine-tuned on the SKEMPI v2.0 dataset via log odds.

A single pre-training step starts with randomly sampling a protein-protein interaction $\mathbf{c}$ (in this example figure, the staphylokinase dimer A-B from the PDB entry 1C78) from PPIRef. Next, randomly selected residues $M$ are masked to obtain the masked interaction $\mathbf{c}_{\setminus M}$ . After that, the interaction is converted into a graph representation $(G,\mathbf{X},\mathbf{E},\mathbf{F}_0,\mathbf{F}_1)$ with masked nodes $M$ (black circles). The model subsequently learns to classify the types of masked amino acids by acquiring $SE(3)$ -invariant hidden representation $\mathbf{H}$ of the whole interface via the encoder $f$ and classifier $g$ (red arrows). On the downstream task of ddG prediction, mutated amino acids are masked, and the probabilities of possible substitutions $\mathbf{P}_{M,:}$ are jointly inferred with the pre-trained model. Finally, the estimate $\widehat{\Delta \Delta G}$ is obtained using the predicted probabilities $p$ of the wild-type $c_i$ and the mutant $m_i$ amino acids via log odds (blue arrows).

TODO

[x] Pre-training and fine-tuning examples with scripts/run.py
[x] Installation script examples for AMD GPUs and NVIDIA GPUs
[x] SSL-pretrained weights (without fine-tuning)

References

If you find this repository useful, please cite our paper: @article{ bushuiev2024learning, title={Learning to design protein-protein interactions with enhanced generalization}, author={Anton Bushuiev and Roman Bushuiev and Petr Kouba and Anatolii Filkin and Marketa Gabrielova and Michal Gabriel and Jiri Sedlar and Tomas Pluskal and Jiri Damborsky and Stanislav Mazurenko and Josef Sivic}, booktitle={The Twelfth International Conference on Learning Representations}, year={2024} }

Owner

Name: Anton Bushuiev
Login: anton-bushuiev
Kind: user
Location: Prague
Company: Czech Technical University in Prague

Twitter: AntonBushuiev
Repositories: 23
Profile: https://github.com/anton-bushuiev

PhD student. Machine learning / computational biology 🤖🌱

Citation (citation.bib)

@article{
  bushuiev2024learning,
  title={Learning to design protein-protein interactions with enhanced generalization},
  author={Anton Bushuiev and Roman Bushuiev and Petr Kouba and Anatolii Filkin and Marketa Gabrielova and Michal Gabriel and Jiri Sedlar and Tomas Pluskal and Jiri Damborsky and Stanislav Mazurenko and Josef Sivic},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024}
}

GitHub Events

Total

Create event: 1
Release event: 1
Issues event: 4
Watch event: 6
Issue comment event: 3
Push event: 6
Fork event: 1

Last Year

Create event: 1
Release event: 1
Issues event: 4
Watch event: 6
Issue comment event: 3
Push event: 6
Fork event: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: about 5 hours
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: about 5 hours
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 3.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

amin-sagar (2)
AmingWu (1)
edikedik (1)
twidatalla (1)
paoslaos (1)

ppiformer

Science Score: 54.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Web on Hugging Face Spaces 🤗

Installation

Inference

Download the weights

Predict ddG for a PPI upon mutation

Load the ensamble of fine-tuned models

Specify input

Predict

Embed PPI

Load the pre-trained model

Specify input

Embed (get the final type-0 features). Here, 128-dimensional embedding for each of 124 amino acids in the PPI

Training and testing

How it works

TODO

References

Owner

Citation (citation.bib)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels