ppiref
Dataset and package for working with protein-protein interactions in 3D
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, pubmed.ncbi, ncbi.nlm.nih.gov, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (15.7%) to scientific vocabulary
Keywords
Repository
Dataset and package for working with protein-protein interactions in 3D
Basic Info
- Host: GitHub
- Owner: anton-bushuiev
- License: mit
- Language: Jupyter Notebook
- Default Branch: main
- Homepage: https://ppiref.readthedocs.io
- Size: 15.1 MB
Statistics
- Stars: 91
- Watchers: 4
- Forks: 8
- Open Issues: 4
- Releases: 5
Topics
Metadata Files
README.md
PPIRef
PPIRef is a Python package for working with 3D structures of protein-protein interactions (PPIs). It is based on the PPIRef dataset, comprising all PPIs from the Protein Data Bank (PDB). The package aims to provide standard data and tools for machine learning and data science applications involving protein-protein interaction structures. PPIRef includes the following functionalities:
- ⭐ Extracting protein-protein interfaces from .pdb files.
- ⭐ Visualizing and analyzing the properties of PPIs.
- ⭐ Comparing, deduplicating and clustering PPI interfaces.
- ⭐ Retrieving similar PPIs from PDB by similar interface structure or sequence.
- ⭐ Downloading, splitting and subsampling prepared PPIs for machine learning applications.
Please see the documentation for usage examples and API reference. See also our paper for additional details.
Quick start 🚀
Install the PPIRef package.
bash
conda create -n ppiref python=3.10
conda activate ppiref
git clone https://github.com/anton-bushuiev/PPIRef.git
cd PPIRef; pip install -e .
Download the dataset using the package (in Python).
```python from ppiref.utils.misc import downloadfromzenodo from ppiref.split import readfold from ppiref.utils.ppi import PPI downloadfromzenodo('ppi6A.zip') # or for example 'pdbredoppi_10A.zip' for all 10-Angstrom PPIs from PDB-REDO
Downloading: 100%|██████████| 6.94G/6.94G [10:19<00:00, 11.2MiB/s] Extracting: 100%|██████████| 831382/831382 [02:36<00:00, 5313.49files/s] ```
Read the data fold/subset you need (whole PPIRef50K in the example).
```python ppipaths = readfold('ppiref6Afilteredclustered04', 'whole') print('Dataset size:', len(ppi_paths))
Dataset size: 51755 ```
Now you are ready to work with the PPIRef dataset! Example of a sample:
```python ppi = PPI(ppi_paths[0]) print('Path:', ppi.path) print('Statistics:', ppi.stats) ppi.visualize()
Path: /Users/anton/dev/PPIRef/ppiref/data/ppiref/ppi6A/hc/3hchA_B.pdb Statistics: {'KIND': 'heavy', 'EXTRACTION RADIUS': 6.0, 'EXPANSION RADIUS': 0.0, 'RESOLUTION': 2.1, 'STRUCTURE METHOD': 'x-ray diffraction', 'DEPOSITION DATE': '2009-05-06', 'RELEASE DATE': '2009-10-13', 'BSA': 682.5337386399999} ```
Further, the PPIRef package provides utilities for comparing, deduplicating, and clustering PPI interfaces, as well as for retrieving similar PPIs from PDB by similar interface structure or sequence. Please see the documentation for more details.
TODO
The repository is under development. Please do not hesitate to contact us or create an issue/PR if you have any questions or suggestions ✌️.
Technical
- [x] PPIRef (6A interfaces) on Zenodo
- [x] PPIRef (10A interfaces) on Zenodo (expected in June 2024)
- [x] PPIRef version based on the PDB-REDO database for higher-quality side chains in the structures (expected in June 2024)
- [x] Docstrings
Enhancements
- [ ] Cluster all PPIs to sample from clusters rather than removing near duplicates completely (similar to UniRef seeds)
- [ ] Add RASA values to classify residues according to Levy 2010
- [ ] Classify PPIs according to Ofran2003
References
If you find this repository useful, please cite our paper:
bibtex
@article{bushuiev2024learning,
title={Learning to design protein-protein interactions with enhanced generalization},
author={Anton Bushuiev and Roman Bushuiev and Petr Kouba and Anatolii Filkin and Marketa Gabrielova and Michal Gabriel and Jiri Sedlar and Tomas Pluskal and Jiri Damborsky and Stanislav Mazurenko and Josef Sivic},
booktitle={ICLR 2024 (The Twelfth International Conference on Learning Representations)},
url={https://doi.org/10.48550/arXiv.2310.18515},
year={2024}
}
If relevant, please also cite the corresponding paper on data leakage in protein interaction benchmarks:
bibtex
@article{bushuiev2024revealing,
title={Revealing data leakage in protein interaction benchmarks},
author={Anton Bushuiev and Roman Bushuiev and Jiri Sedlar and Tomas Pluskal and Jiri Damborsky and Stanislav Mazurenko and Josef Sivic},
booktitle={ICLR 2024 Workshop on Generative and Experimental Perspectives for Biomolecular Design},
url={https://doi.org/10.48550/arXiv.2404.10457},
year={2024}
}
If you find any of the external software useful, please cite the corresponding papers (see PPIRef/external/README.md).
Owner
- Name: Anton Bushuiev
- Login: anton-bushuiev
- Kind: user
- Location: Prague
- Company: Czech Technical University in Prague
- Twitter: AntonBushuiev
- Repositories: 23
- Profile: https://github.com/anton-bushuiev
PhD student. Machine learning / computational biology 🤖🌱
Citation (citation.bib)
@article{
bushuiev2024learning,
title={Learning to design protein-protein interactions with enhanced generalization},
author={Anton Bushuiev and Roman Bushuiev and Petr Kouba and Anatolii Filkin and Marketa Gabrielova and Michal Gabriel and Jiri Sedlar and Tomas Pluskal and Jiri Damborsky and Stanislav Mazurenko and Josef Sivic},
booktitle={The Twelfth International Conference on Learning Representations},
year={2024}
}
GitHub Events
Total
- Create event: 2
- Release event: 2
- Issues event: 7
- Watch event: 11
- Issue comment event: 15
- Push event: 11
- Fork event: 1
Last Year
- Create event: 2
- Release event: 2
- Issues event: 7
- Watch event: 11
- Issue comment event: 15
- Push event: 11
- Fork event: 1