plinder
Protein Ligand INteraction Dataset and Evaluation Resource
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 3 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.0%) to scientific vocabulary
Repository
Protein Ligand INteraction Dataset and Evaluation Resource
Basic Info
- Host: GitHub
- Owner: plinder-org
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://plinder.sh
- Size: 39.9 MB
Statistics
- Stars: 227
- Watchers: 10
- Forks: 16
- Open Issues: 13
- Releases: 0
Metadata Files
README.md
The Protein Ligand INteractions Dataset and Evaluation Resource
📚 About
PLINDER, short for protein ligand interactions dataset and evaluation resource, is a comprehensive, annotated, high quality dataset and resource for training and evaluation of protein-ligand docking algorithms:
- > 400k PLI systems across > 11k SCOP domains and > 50k unique small molecules
- 750+ annotations for each system, including protein and ligand properties, quality, matched molecular series and more
- Automated curation pipeline to keep up with the PDB
- 14 PLI metrics and over 20 billion similarity scores
- Unbound (apo) and predicted Alphafold2 structures linked to holo systems
- train-val-test splits and ability to tune splitting based on the learning task
- Robust evaluation harness to simplify and standard performance comparison between models.
The PLINDER project is a community effort, launched by the University of Basel, SIB Swiss Institute of Bioinformatics, VantAI, NVIDIA, MIT CSAIL, and will be regularly updated.
To accelerate community adoption, PLINDER will be used as the field’s new Protein-Ligand interaction dataset standard as part of an exciting competition at the upcoming 2024 Machine Learning in Structural Biology (MLSB) Workshop at NeurIPS, one of the field's premiere academic gatherings. More details about the competition and other helpful practical tips can be found at our recent workshop repo: Moving Beyond Memorization.
👋 Join the P(L)INDER user group Discord Server!
🔢 Plinder versions
We version the plinder dataset with two controls:
PLINDER_RELEASE: the month stamp of the last RCSB syncPLINDER_ITERATION: value that enables iterative development within a release
We version the plinder application using an automated semantic
versioning scheme based on the git commit history.
The plinder.data package is responsible for generating a dataset
release and the plinder.core package makes it easy to interact
with the dataset.
🐛🐛🐛 Known bugs:
- Source dataset contains incorrect
entry_release_datedates, please, usequery_indexto get correct dates patched. - Complexes containing nucleic acid receptors may not be saved corectly.
ligand_binding_affinityqueries have been disabled due to a bug found parsing BindingDB
Changelog:
2024-06/v2 (Current):
- New systems added based on the 2024-06 RCSB sync
- Updated system definition to be more stable and depend only on ligand distance rather than PLIP
- Added annotations for crystal contacts
- Improved ligand handling and saving to fix some bond order issues
- Improved covalency detection and annotation to reference each bond explicitly
- Added linked apo/pred structures to v2/links and v2/linked_structures
Added binding affinity annotations from BindingDB(see known bugs!)- Added statistics requirement and other changes in the split to enrich test set diversity
2024-04/v1: Version described in the preprint, with updated redundancy removal by protein pocket and ligand similarity.
2024-04/v0: Version used to re-train DiffDock in the paper, with redundancy removal based on <pdbid>_<ligand ccd codes>
🏅 Gold standard benchmark sets
As part of PLINDER resource we provide train, validation and test splits that are
curated to minimize the information leakage based on protein-ligand interaction
similarity.
In addition, we have prioritized the systems that has a linked experimental apo
structure or matched molecular series to support realistic inference scenarios for hit
discovery and optimization.
Finally, a particular care is taken for test set that is further prioritized to contain
high quality structures to provide unambiguous ground-truths for performance
benchmarking.
Moreover, as we enticipate this resource to be used for benchmarking a wide range of methods, including those simultaneously predicting protein structure (aka. co-folding) or those generating novel ligand structures, we further stratified test (by novel ligand, pocket, protein or all) to cover a wide range of tasks.
👨💻 Getting Started
The PLINDER dataset is provided in two ways:
- You can either use the files from the dataset directly using your preferred tooling by downloading the data from the public bucket,
- or you can utilize the dedicated
plinderPython package for interfacing the data.
Downloading the dataset
The dataset can be downloaded from the bucket with gsutil.
console
$ export PLINDER_RELEASE=2024-06 # Current release
$ export PLINDER_ITERATION=v2 # Current iteration
$ mkdir -p ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/
$ gsutil -m cp -r "gs://plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/*" ~/.local/share/plinder/${PLINDER_RELEASE}/${PLINDER_ITERATION}/
For details on the sub-directories, see Documentation.
Installing the Python package
plinder is available on PyPI.
pip install plinder
License
Data curated by PLINDER are made available under the Apache License 2.0. All data curated by BindingDB staff are provided under the Creative Commons Attribution 4.0 License. Data imported from ChEMBL are provided under their Creative Commons Attribution-Share Alike 4.0 Unported License.
📝 Documentation
A more detailed description is available on the documentation website.
📃 Citation
Durairaj, Janani, Yusuf Adeshina, Zhonglin Cao, Xuejin Zhang, Vladas Oleinikovas, Thomas Duignan, Zachary McClure, Xavier Robin, Gabriel Studer, Daniel Kovtun, Emanuele Rossi, Guoqing Zhou, Srimukh Prasad Veccham, Clemens Isert, Yuxing Peng, Prabindh Sundareson, Mehmet Akdel, Gabriele Corso, Hannes Stärk, Gerardo Tauriello, Zachary Wayne Carpenter, Michael M. Bronstein, Emine Kucukbenli, Torsten Schwede, Luca Naef. 2024. “PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource.” bioRxiv ICML'24 ML4LMS
Please see the citation file for details.
Owner
- Name: plinder-org
- Login: plinder-org
- Kind: organization
- Repositories: 1
- Profile: https://github.com/plinder-org
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Durairaj"
given-names: "Janani"
- family-names: "Adeshina"
given-names: "Yusuf"
- family-names: "Cao"
given-names: "Zhonglin"
- family-names: "Zhang"
given-names: "Xuejin"
- family-names: "Oleinikovas"
given-names: "Vladas"
- family-names: "Duignan"
given-names: "Thomas"
- family-names: "McClure"
given-names: "Zachary"
- family-names: "Robin"
given-names: "Xavier"
- family-names: "Studer"
given-names: "Gabriel"
- family-names: "Kovtun"
given-names: "Daniel"
- family-names: "Rossi"
given-names: "Emanuele"
- family-names: "Zhou"
given-names: "Guoqing"
- family-names: "Veccham"
given-names: "Srimukh"
- family-names: "Isert"
given-names: "Clemens"
- family-names: "Peng"
given-names: "Yuxing"
- family-names: "Sundareson"
given-names: "Prabindh"
- family-names: "Akdel"
given-names: "Mehmet"
- family-names: "Corso"
given-names: "Gabriele"
- family-names: "Stärk"
given-names: "Hannes"
- family-names: "Tauriello"
given-names: "Gerardo"
- family-names: "Carpenter"
given-names: "Zachary"
- family-names: "Bronstein"
given-names: "Michael"
- family-names: "Kucukbenli"
given-names: "Emine"
- family-names: "Schwede"
given-names: "Torsten"
- family-names: "Naef"
given-names: "Luca"
title: "PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource"
doi: 10.1101/2024.07.17.603955
version: 0.0.1
date-released: 2024-07-17
url: "https://github.com/plinder-org/plinder"
preferred-citation:
type: conference-paper
authors:
- family-names: "Durairaj"
given-names: "Janani"
- family-names: "Adeshina"
given-names: "Yusuf"
- family-names: "Cao"
given-names: "Zhonglin"
- family-names: "Zhang"
given-names: "Xuejin"
- family-names: "Oleinikovas"
given-names: "Vladas"
- family-names: "Duignan"
given-names: "Thomas"
- family-names: "McClure"
given-names: "Zachary"
- family-names: "Robin"
given-names: "Xavier"
- family-names: "Studer"
given-names: "Gabriel"
- family-names: "Kovtun"
given-names: "Daniel"
- family-names: "Rossi"
given-names: "Emanuele"
- family-names: "Zhou"
given-names: "Guoqing"
- family-names: "Veccham"
given-names: "Srimukh"
- family-names: "Isert"
given-names: "Clemens"
- family-names: "Peng"
given-names: "Yuxing"
- family-names: "Sundareson"
given-names: "Prabindh"
- family-names: "Akdel"
given-names: "Mehmet"
- family-names: "Corso"
given-names: "Gabriele"
- family-names: "Stärk"
given-names: "Hannes"
- family-names: "Tauriello"
given-names: "Gerardo"
- family-names: "Carpenter"
given-names: "Zachary"
- family-names: "Bronstein"
given-names: "Michael"
- family-names: "Kucukbenli"
given-names: "Emine"
- family-names: "Schwede"
given-names: "Torsten"
- family-names: "Naef"
given-names: "Luca"
doi: "10.1101/2024.07.17.603955"
journal: "bioRxiv"
eventtitle: "Machine Learning for Life and Material Science, ICML 2024"
month: 7
title: "PLINDER: The Protein-Ligand Interactions Dataset and Evaluation Resource"
year: 2024
GitHub Events
Total
- Issues event: 20
- Watch event: 81
- Delete event: 15
- Issue comment event: 37
- Push event: 122
- Pull request review comment event: 16
- Pull request review event: 36
- Pull request event: 40
- Fork event: 9
- Create event: 29
Last Year
- Issues event: 20
- Watch event: 81
- Delete event: 15
- Issue comment event: 37
- Push event: 122
- Pull request review comment event: 16
- Pull request review event: 36
- Pull request event: 40
- Fork event: 9
- Create event: 29
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 11
- Total pull requests: 5
- Average time to close issues: 14 days
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 3
- Average comments per issue: 0.73
- Average comments per pull request: 0.6
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 11
- Pull requests: 5
- Average time to close issues: 14 days
- Average time to close pull requests: 3 days
- Issue authors: 7
- Pull request authors: 3
- Average comments per issue: 0.73
- Average comments per pull request: 0.6
- Merged pull requests: 3
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- danielzeng-gt (2)
- OleinikovasV (2)
- rachitk (2)
- AnjaConev (2)
- patrickbryant1 (2)
- Getiann (2)
- DreRnc (2)
- eunos-1128 (1)
- echen1214 (1)
- leelasd (1)
- Endyff (1)
- AliSaadatV (1)
- PatWalters (1)
- Jakub-11 (1)
- tomcastigl (1)
Pull Request Authors
- yusuf1759 (18)
- OleinikovasV (17)
- tjduigna (15)
- padix-key (12)
- Ninjani (7)
- amorehead (2)
- frgoe003 (2)
- DreRnc (2)
- maciejwisniewski-drugdiscovery (1)
- naefl (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
-
Total downloads:
- pypi 14,301 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 30
- Total maintainers: 1
pypi.org: plinder
PLINDER: The Protein-Ligand INteraction Dataset and Evaluation Resource
- Homepage: https://github.com/plinder-org/plinder
- Documentation: https://plinder.readthedocs.io/
- License: Apache Software License
-
Latest release: 0.2.25
published 11 months ago
Rankings
Maintainers (1)
Dependencies
- actions/checkout v4 composite
- actions/setup-python v5 composite
- actions/cache v4 composite
- actions/checkout v4 composite
- actions/setup-python v5 composite
- py-cov-action/python-coverage-comment-action v3 composite
- tj-actions/changed-files v41 composite
- ${IMAGE_REPO -ghcr.io/plinder-org}/plinder-base
- ${IMAGE_REPO -ghcr.io/plinder-org}/plinder
- mambaorg/micromamba git-c160e88-jammy build
- ${BASE_IMAGE} ${BASE_TAG} build
- ${BASE_IMAGE} ${BASE_TAG} build
- 136 dependencies
- biotite == 0.39.0
- cloudpathlib *
- duckdb *
- eval_type_backport *
- gcsfs *
- gemmi *
- google-cloud-storage *
- hydride *
- mmcif *
- mmpdb @ git+https://github.com/rdkit/mmpdb.git
- molecular-rectifier *
- nbformat *
- networkit >= 11.0
- numpy *
- oddt *
- omegaconf *
- pandas *
- pdb-validation @ git+https://git.scicore.unibas.ch/schwede/ligand-validation.git
- pdb2pqr *
- pdbeccdutils *
- plotly *
- posebusters *
- pyarrow *
- pydantic *
- pytest *
- rdkit ==2023.9.5
- tabulate *
- tqdm *
- typing_extensions *
- actions/checkout v4 composite
- actions/deploy-pages v4 composite
- actions/upload-pages-artifact v3 composite
- mamba-org/setup-micromamba v1 composite
- networkit >=11.0
- tabulate *
- keyrings.google-artifactregistry-auth ==1.1.2