masif-neosurf

MaSIF-neosurf: surface-based protein design for ternary complexes.

https://github.com/lpdi-epfl/masif-neosurf

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

drug-design geometric-deep-learning molecular-surface protein-design
Last synced: 6 months ago · JSON representation ·

Repository

MaSIF-neosurf: surface-based protein design for ternary complexes.

Basic Info
  • Host: GitHub
  • Owner: LPDI-EPFL
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 17.6 MB
Statistics
  • Stars: 139
  • Watchers: 5
  • Forks: 19
  • Open Issues: 11
  • Releases: 0
Topics
drug-design geometric-deep-learning molecular-surface protein-design
Created about 2 years ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

MaSIF-neosurf – Surface-based protein design for ternary complexes

Code repository for "Targeting protein-ligand neosurfaces with a generalizable deep learning tool".

DOI

Table of Contents

Description

Molecular recognition events between proteins drive biological processes in living systems. However, higher levels of mechanistic regulation have emerged, where protein-protein interactions are conditioned to small molecules. Here, we present a computational strategy for the design of proteins that target neosurfaces, i.e. surfaces arising from protein-ligand complexes. To do so, we leveraged a deep learning approach based on learned molecular surface representations and experimentally validated binders against three drug-bound protein complexes. Remarkably, surface fingerprints trained only on proteins can be applied to neosurfaces emerging from small molecules, serving as a powerful demonstration of generalizability that is uncommon in deep learning approaches. The designed chemically-induced protein interactions hold the potential to expand the sensing repertoire and the assembly of new synthetic pathways in engineered cells.

Method overview

MaSIF-neosurf overview and pipeline

System requirements

Hardware

MaSIF-seed has been tested on Linux, and it is recommended to run on an x86-based linux Docker container. It is possible to run on an M1 Apple environment but it runs much more slowly. To reproduce the experiments in the paper, the entire datasets for all proteins consume several terabytes.

Currently, MaSIF takes a few seconds to preprocess every protein. We find the main bottleneck to be the APBS computation for surface charges, which can likely be optimized. Nevertheless, we recommend a distributed cluster to preprocess the data for large datasets of proteins.

Software

MaSIF relies on external software/libraries to handle protein databank files and surface files, to compute chemical/geometric features and coordinates, and to perform neural network calculations. The following is the list of required libraries and programs, as well as the version on which it was tested (in parentheses). * Python (3.6) * reduce (3.23). To add protons to proteins. * MSMS (2.6.1). To compute the surface of proteins. * BioPython (1.66). To parse PDB files. * PyMesh (0.1.14). To handle ply surface files, attributes, and to regularize meshes. * PDB2PQR (2.1.1), multivalue, and APBS (1.5). These programs are necessary to compute electrostatics charges. * Open3D (0.5.0.0). Mainly used for RANSAC alignment. * Tensorflow (1.9). Use to model, train, and evaluate the actual neural networks. Models were trained and evaluated on a NVIDIA Tesla K40 GPU. * StrBioInfo. Used for parsing PDB files and generate biological assembly for MaSIF-ligand. * Dask (2.2.0). Run function calls on multiple threads (optional for reproducing some benchmarks). * Pymol (2.5.0). This optional program allows one to visualize surface files. * RDKit (2021.9.4). For handling small molecules, especially the proton donors and acceptors. * OpenBabel (3.1.1.7). For handling small molecules, especially the conversion into MOL2 files for APBS. * ProDy (2.0). For handling small molecules, especially the ligand extraction from a PDB.

Installation with Docker

MaSIF is written in Python and does not require compilation. Since MaSIF relies on a few external programs (MSMS, APBS) and libraries (PyMesh, Tensorflow, Scipy, Open3D), we strongly recommend you use the Dockerfile and Docker container. Setting up the environment should take a few minutes only. bash git clone https://github.com/LPDI-EPFL/masif-neosurf.git cd masif-neosurf docker build . -t masif-neosurf docker run -it -v $PWD:/home/$(basename $PWD) masif-neosurf

Preprocess a PDB file

Before we can search for complementary binding sites/seeds, we need to triangulate the molecular surface and compute the initial surface features. The script preprocess_pdb.sh takes two required positional arguments: the PDB file and a definition of the chain(s) that will be included. If a small molecule is part of the molecular surface, we need to tell MaSIF-neosurf where to find it in the PDB file (three letter code + chain) using the -l flag. Optionally, we can also provide an SDF file with the -s flag that will be used to infer the correct connectivity information (i.e. bond types). This SDF file can be downloaded from the PDB website for example. Finally, we must specify an output directory with the -o flag, in which all the preprocessed files will be saved.

```bash chmod +x ./preprocess_pdb.sh

with ligand

./preprocesspdb.sh example/1a7x.pdb 1A7XA -l FKAB -s example/1a7xC_FKA.sdf -o example/output/

without ligand

./preprocesspdb.sh example/1a7x.pdb 1A7XA -o example/output/ ```

PyMOL plugin

The PyMOL plugin can be used to visualize preprocessed surface files (.ply file extension). To install it, open the plugin manager in PyMOL, select Install New Plugin -> Install from local file and choose the masif_pymol_plugin.py file. Once installed you can load MaSIF surface files in PyMOL with the following command: bash loadply 1ABC.ply

Computational binder recovery benchmark

For more details on the binder recovery benchmark, please consult the relevant README. The preprocessed dataset can be downloaded from Zenodo.

Running a seed search

For more details on the seed search procedure, please consult the relevant README

Running a seed refinement and grafting

For more details on the seed refinement and grafting procedure, please consult the relevant README

License

MaSIF-seed is released under an Apache v2.0 license

Reference

@article{marchand2025, author={Marchand, Anthony and Buckley, Stephen and Schneuing, Arne and Pacesa, Martin and Elia, Maddalena and Gainza, Pablo and Elizarova, Evgenia and Neeser, Rebecca M. and Lee, Pao-Wan and Reymond, Luc and Miao, Yangyang and Scheller, Leo and Georgeon, Sandrine and Schmidt, Joseph and Schwaller, Philippe and Maerkl, Sebastian J. and Bronstein, Michael and Correia, Bruno E.}, title={Targeting protein-ligand neosurfaces with a generalizable deep learning tool}, journal={Nature}, year={2025}, month={Jan}, day={15}, issn={1476-4687}, doi={10.1038/s41586-024-08435-4}, url={https://doi.org/10.1038/s41586-024-08435-4} }

Owner

  • Name: Laboratory of Protein Design and Immunoengineering
  • Login: LPDI-EPFL
  • Kind: organization

Citation (citation.bib)

@article{marchand2025,
  author={Marchand, Anthony and Buckley, Stephen and Schneuing, Arne and Pacesa, Martin and Elia, Maddalena and Gainza, Pablo and Elizarova, Evgenia and Neeser, Rebecca M. and Lee, Pao-Wan and Reymond, Luc and Miao, Yangyang and Scheller, Leo and Georgeon, Sandrine and Schmidt, Joseph and Schwaller, Philippe and Maerkl, Sebastian J. and Bronstein, Michael and Correia, Bruno E.},
  title={Targeting protein-ligand neosurfaces with a generalizable deep learning tool},
  journal={Nature},
  year={2025},
  month={Jan},
  day={15},
  issn={1476-4687},
  doi={10.1038/s41586-024-08435-4},
  url={https://doi.org/10.1038/s41586-024-08435-4}
}

GitHub Events

Total
  • Issues event: 17
  • Watch event: 101
  • Delete event: 3
  • Member event: 1
  • Issue comment event: 15
  • Push event: 29
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 17
  • Create event: 4
Last Year
  • Issues event: 17
  • Watch event: 101
  • Delete event: 3
  • Member event: 1
  • Issue comment event: 15
  • Push event: 29
  • Pull request review event: 1
  • Pull request event: 6
  • Fork event: 17
  • Create event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 11
  • Total pull requests: 4
  • Average time to close issues: 11 days
  • Average time to close pull requests: about 7 hours
  • Total issue authors: 11
  • Total pull request authors: 2
  • Average comments per issue: 0.18
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 11
  • Pull requests: 4
  • Average time to close issues: 11 days
  • Average time to close pull requests: about 7 hours
  • Issue authors: 11
  • Pull request authors: 2
  • Average comments per issue: 0.18
  • Average comments per pull request: 0.0
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • mjb84 (3)
  • Areszie (1)
  • amber4mint (1)
  • steven0seagal (1)
  • jbderoo (1)
  • mokurin000 (1)
  • bdabykov (1)
  • arontier-sjw (1)
  • hzy317 (1)
  • yxl4567 (1)
  • minami45 (1)
  • Du-Minghui (1)
Pull Request Authors
  • rneeser (2)
  • arneschneuing (1)
  • knawel (1)
Top Labels
Issue Labels
Pull Request Labels