benchscofi
Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (17.4%) to scientific vocabulary
Keywords
Repository
Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.
Basic Info
- Host: GitHub
- Owner: RECeSS-EU-Project
- License: mit
- Language: Python
- Default Branch: master
- Homepage: https://recess-eu-project.github.io/stanscofi
- Size: 3.38 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 2
- Open Issues: 1
- Releases: 4
Topics
Metadata Files
README.md

BENCHmark for drug Screening with COllaborative FIltering (benchscofi) Python Package
This repository is a part of the EU-funded RECeSS project (#101102016), and hosts the implementations and / or wrappers to published implementations of collaborative filtering-based algorithms for easy benchmarking.
Statement of need
As of 2022, current drug development pipelines last around 10 years, costing $2billion in average, while drug commercialization failure rates go up to 90%. These issues can be mitigated by drug repurposing, where chemical compounds are screened for new therapeutic indications in a systematic fashion. In prior works, this approach has been implemented through collaborative filtering. This semi-supervised learning framework leverages known drug-disease matchings in order to recommend new ones.
There is no standard pipeline to train, validate and compare collaborative filtering-based repurposing methods, which considerably limits the impact of this research field. In benchscofi, the estimated improvement over the state-of-the-art (implemented in the package) can be measured through adequate and quantitative metrics tailored to the problem of drug repurposing across a large set of publicly available drug repurposing datasets.
Install the latest release
The fastest way to get access to all functionalities of benchscofi is to run the following command:
```bash
Using the Docker image: will open a container
docker push recessproject/benchscofi:1.0.1 ``` Documentation about benchscofi (and a manual installation) can be found at this page. The complete list of dependencies for benchscofi can be found at requirements.txt (pip).
Licence
This repository is under an OSI-approved MIT license.
Citation
If you use benchscofi in academic research, please cite it as follows
@article{reda2024stanscofi,
title={stanscofi and benchscofi: a new standard for drug repurposing by collaborative filtering},
author={R{\'e}da, Cl{\'e}mence and Vie, Jill-J{\^e}nn and Wolkenhauer, Olaf},
journal={Journal of Open Source Software},
volume={9},
number={93},
pages={5973},
year={2024}
}
Community guidelines with respect to contributions, issue reporting, and support
You are more than welcome to add your own algorithm to the package!
1. Add a novel implementation / algorithm
Add a new Python file (extension .py) in src/benchscofi/ named <model> (where model is the name of the algorithm), which contains a subclass of stanscofi.models.BasicModel which has the same name as your Python file. At least implement methods preprocessing, model_fit, model_predict_proba, and a default set of parameters (which is used for testing purposes). Please have a look at the placeholder file Constant.py which implements a classification algorithm which labels all datapoints as positive. It is highly recommended to provide a proper documentation of your class, along with its methods. When pushing a new algorithm to benchscofi, it is automatically tested (see tests/test_models.py and TemplateTest.py which are run). In order to run this test locally, please run in the tests/ folder:
bash
python3 -m test_models <model> <dataset:default=Synthetic>
2. Rules for contributors
Pull requests and issue flagging are welcome, and can be made through the GitHub interface. Support can be provided by reaching out to recess-project[at]proton.me. However, please note that contributors and users must abide by the Code of Conduct.
Benchmark AUC and NDCG@items values (default parameters, single random training/testing set split) [updated 08/11/23]
These values (rounded to the closest 3rd decimal place) can be reproduced using the following command in folder tests/
bash
python3 -m test_models <algorithm> <dataset:default=Synthetic> <batch_ratio:default=1>
:noentry:'s represent failure to train or to predict. N/A's have not been tested yet. When present, percentage in parentheses is the considered value of batchratio (to avoid memory crash on some of the datasets).
[mem]: memory crash
Algorithm (global AUC) | Synthetic* | TRANSCRIPT [a] | Gottlieb [b] | Cdataset [c] | PREDICT [d] | LRSSL [e] |
-------------------------- | ------------- | ----------------- | ------------- | ------------ | -------------- | --------- |
PMF | 0.922 | 0.579 | 0.598 | 0.604 | 0.656 | 0.611 |
PulearnWrapper | 1.000 | :noentry: | N/A | :noentry: | :noentry: | :noentry:|
ALSWR | 0.971 | 0.507 | 0.677 | 0.724 | 0.693 | 0.685 |
FastaiCollabWrapper | 1.000 | 0.876 | 0.856 | 0.837 | 0.835 | 0.851 |
SimplePULearning | 0.995 | 0.949 (0.4) |:noentry:err|:noentry:err| 0.994 (4%) | :noentry:|
SimpleBinaryClassifier | 0.876 | :noentry:[mem] | 0.855 | 0.938 (40%) | 0.998 (1%) | :noentry:|
NIMCGCN | 0.907 | 0.854 | 0.843 | 0.841 | 0.914 (60%) | 0.873 |
FFMWrapper | 0.924 | :noentry:[mem] | 1.000 (40%) | 1.000 (20%) |:noentry:[mem] | :noentry:|
VariationalWrapper |:noentry:err| :noentry:err | 0.851 | 0.851 |:noentry:err | :noentry:|
DRRS |:noentry:err| 0.662 | 0.838 | 0.878 |:noentry:err | 0.892 |
SCPMF | 0.853 | 0.680 | 0.548 | 0.538 |:noentry:err | 0.708 |
BNNR | 1.000 | 0.922 | 0.949 | 0.959 | 0.990 (1%) | 0.972 |
LRSSL | 0.127 | 0.581 (90%) | 0.159 | 0.846 | 0.764 (1%) | 0.665 |
MBiRW | 1.000 | 0.913 | 0.954 | 0.965 |:noentry:err | 0.975 |
LibMFWrapper | 1.000 | 0.919 | 0.892 | 0.912 | 0.923 | 0.873 |
LogisticMF | 1.000 | 0.910 | 0.941 | 0.955 | 0.953 | 0.933 |
PSGCN | 0.767 | :noentry:err | 0.802 | 0.888 | :noentry: | 0.887 |
DDASKF | 0.779 | 0.453 | 0.544 | 0.264 (20%) | 0.591 | 0.542 |
HAN | 1.000 | 0.870 | 0.909 | 0.905 | 0.904 | 0.923 |
PUextraTrees (``nestimators=10) | 0.045 (50%) | 0.325 (50%) | 0.246 (20%) |:no_entry:[mem] | 0.309 (5%)|
XGBoost (n_estimators=100``) | 0.500 | 0.500 (20%) | 0.500 | 0.500 | 0.500 (1%) | 0.500 (60%) |
The NDCG score is computed across all diseases (global), at k=#items.
Algorithm (global NDCG@k) | Synthetic@300*| TRANSCRIPT@613[a] |Gottlieb@593[b]|Cdataset@663[c]|PREDICT@1577[d]|LRSSL@763[e]|
-------------------------- | ------------- | ----------------- | ------------- | ------------ | -------------- | --------- |
PMF | 0.070 | 0.019 | 0.015 | 0.011 | 0.005 | 0.007 |
PulearnWrapper | N/A | :noentry: | N/A | :noentry: | :noentry: | :noentry:|
ALSWR | 0.000 | 0.177 | 0.236 | 0.406 | 0.193 | 0.424 |
FastaiCollabWrapper | 1.000 | 0.035 | 0.012 | 0.003 | 0.001 | 0.000 |
SimplePULearning | 1.000 | 0.059 (40%) |:noentry:err|:noentry:err| 0.025 (4%) |:noentry:err|
SimpleBinaryClassifier | 0.000 | :noentry:[mem] | 0.002 | 0.005 (40%) | 0.070 (1%) |:noentry:err|
NIMCGCN | 0.568 | 0.022 | 0.006 | 0.005 | 0.007 (60%) | 0.014 |
FFMWrapper | 1.000 | :noentry:[mem] | 1.000 (40%) | 1.000 (20%) |:noentry:[mem] | :noentry:|
VariationalWrapper |:noentry:err| :noentry:err | 0.011 | 0.010 |:noentry:err | :noentry:|
DRRS |:noentry:err| 0.484 | 0.301 | 0.426 |:noentry:err | 0.182 |
SCPMF | 0.528 | 0.102 | 0.025 | 0.011 |:noentry:err | 0.008 |
BNNR | 1.000 | 0.466 | 0.417 | 0.572 | 0.217 (1%) | 0.508 |
LRSSL | 0.206 | 0.032 (90%) | 0.009 | 0.004 | 0.103 (1%) | 0.012 |
MBiRW | 1.000 | 0.085 | 0.267 | 0.352 |:noentry:err | 0.457 |
LibMFWrapper | 1.000 | 0.419 | 0.431 | 0.605 | 0.502 | 0.430 |
LogisticMF | 1.000 | 0.323 | 0.106 | 0.101 | 0.076 | 0.078 |
PSGCN | 0.969 | :noentry:err | 0.074 | 0.052 |:noentry:err | 0.110 |
DDASKF | 1.000 | 0.039 | 0.069 | 0.078 (20%) | 0.065 | 0.069 |
HAN | 1.000 | 0.075 | 0.007 | 0.000 | 0.001 | 0.002 |
PUextraTrees (``nestimators=10) | 0.000 (50%) | 0.198 (50%) | 0.162 (20%) |:no_entry:[mem] | 0.235 (5%)|
XGBoost (n_estimators=100``) | 0.061 | 0.000 (20%) | 0.002 | 0.000 | 0.000 (1%) | 0.000 (60%) |
:no_entry: Note that results from ``LibMFWrapper'' are not reproducible, and the resulting metrics might slightly vary across iterations.
:no_entry: XGBoost and SimpleBinaryClassifier do not take into account unlabeled points (they assume they are negative points).
Datasets
*Synthetic dataset created with function generate_dummy_dataset in stanscofi.datasets and the following arguments:
python
npositive=200 #number of positive pairs
nnegative=100 #number of negative pairs
nfeatures=50 #number of pair features
mean=0.5 #mean for the distribution of positive pairs, resp. -mean for the negative pairs
std=1 #standard deviation for the distribution of positive and negative pairs
random_seed=124565 #random seed
[a] Réda, Clémence. (2023). TRANSCRIPT drug repurposing dataset (2.0.0) [Data set]. Zenodo. doi:10.5281/zenodo.7982976
[b] Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular systems biology, 7(1), 496.
[c] Luo, H., Li, M., Wang, S., Liu, Q., Li, Y., & Wang, J. (2018). Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics, 34(11), 1904-1912.
[d] Réda, Clémence. (2023). PREDICT drug repurposing dataset (2.0.1) [Data set]. Zenodo. doi:10.5281/zenodo.7983090
[e] Liang, X., Zhang, P., Yan, L., Fu, Y., Peng, F., Qu, L., … & Chen, Z. (2017). LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics, 33(8), 1187-1196.
Owner
- Name: RECeSS EU project
- Login: RECeSS-EU-Project
- Kind: user
- Location: Rostock, Germany
- Company: Universität Rostock
- Website: http://recess-eu-project.github.io/
- Twitter: recess_eu_proj
- Repositories: 1
- Profile: https://github.com/RECeSS-EU-Project
The RECeSS (Robust Explainable Controllable Standard for drug Screening) project is funded by a Marie Skłodowska-Curie Postdoctoral Fellowship 2022.
Citation (CITATION.cff)
cff-version: "1.1.0"
authors:
- family-names: Réda
given-names: Clémence
orcid: "https://orcid.org/0000-0003-3238-0258"
- family-names: Vie
given-names: Jill-Jênn
orcid: "https://orcid.org/0000-0002-9304-2220"
- family-names: Wolkenhauer
given-names: Olaf
orcid: "https://orcid.org/0000-0001-6105-2937"
doi: 10.5281/zenodo.10561760
message: If you use this software, please cite our article in the
Journal of Open Source Software.
preferred-citation:
authors:
- family-names: Réda
given-names: Clémence
orcid: "https://orcid.org/0000-0003-3238-0258"
- family-names: Vie
given-names: Jill-Jênn
orcid: "https://orcid.org/0000-0002-9304-2220"
- family-names: Wolkenhauer
given-names: Olaf
orcid: "https://orcid.org/0000-0001-6105-2937"
date-published: 2024-01-25
doi: 10.21105/joss.05973
issn: 2475-9066
issue: 93
journal: Journal of Open Source Software
publisher:
name: Open Journals
start: 5973
title: "stanscofi and benchscofi: a new standard for drug repurposing
by collaborative filtering"
type: article
url: "https://joss.theoj.org/papers/10.21105/joss.05973"
volume: 9
title: "stanscofi and benchscofi: a new standard for drug repurposing by
collaborative filtering"
CodeMeta (codemeta.json)
{
"@context": "https://doi.org/10.5063/schema/codemeta-2.0",
"@type": "SoftwareSourceCode",
"license": "https://spdx.org/licenses/MIT",
"codeRepository": "git+https://github.com/RECeSS-EU-Project/benchscofi.git",
"dateCreated": "2023-08-11",
"datePublished": "2023-08-11",
"dateModified": "2023-08-11",
"downloadUrl": "https://github.com/RECeSS-EU-Project/benchscofi/archive/refs/heads/master.zip",
"issueTracker": "https://github.com/RECeSS-EU-Project/benchscofi/issues",
"name": "benchscofi",
"version": "1.0.0",
"identifier": "10.5281/zenodo.8241505",
"description": "Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.",
"applicationCategory": "Drug development",
"releaseNotes": "",
"funding": "RECeSS - Robust Explainable Controllable Standard for drug Screening (101102016)",
"developmentStatus": "active",
"isPartOf": "https://recess-eu-project.github.io/",
"readme": "https://github.com/RECeSS-EU-Project/benchscofi/blob/master/README.md",
"softwareVersion": "1.0.0",
"funder": {
"@type": "Organization",
"name": "European Union's Horizon 2020 research and innovation programme"
},
"keywords": [
"Python",
"collaborative filtering",
"benchmark",
"drug repurposing",
"science reproducibility"
],
"programmingLanguage": [
"Python 3"
],
"softwareRequirements": [
"stanscofi>=1.0.1",
"numpy>=1.19.4"
],
"author": [
{
"@type": "Person",
"@id": "https://orcid.org/0000-0003-3238-0258",
"givenName": "Clmence",
"familyName": "Rda",
"email": "clemence.reda@uni-rostock.de",
"affiliation": {
"@type": "Organization",
"name": "Systems Biology and Informatics, University of Rostock, Rostock, Germany"
}
}
],
"maintainer": [
{
"@type": "Person",
"@id": "https://orcid.org/0000-0003-3238-0258",
"givenName": "Clmence",
"familyName": "Rda",
"email": "clemence.reda@uni-rostock.de",
"affiliation": {
"@type": "Organization",
"name": "Systems Biology and Informatics, University of Rostock, Rostock, Germany"
}
}
]
}
GitHub Events
Total
- Release event: 1
- Push event: 2
- Create event: 1
Last Year
- Release event: 1
- Push event: 2
- Create event: 1
Packages
- Total packages: 1
-
Total downloads:
- pypi 55 last-month
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 5
- Total maintainers: 1
pypi.org: benchscofi
Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.
- Homepage: https://github.com/RECeSS-EU-Project/benchscofi
- Documentation: https://benchscofi.readthedocs.io/
- License: MIT License
-
Latest release: 2.0.1
published 12 months ago