benchscofi

Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.

https://github.com/recess-eu-project/benchscofi

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (17.4%) to scientific vocabulary

Keywords

benchmark collaborative-filtering drug-repurposing python science-reproducibility
Last synced: 6 months ago · JSON representation ·

Repository

Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.

Basic Info
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 4
Topics
benchmark collaborative-filtering drug-repurposing python science-reproducibility
Created over 2 years ago · Last pushed 12 months ago
Metadata Files
Readme License Citation Codemeta Zenodo

README.md

funding logo

Python Version PyPI version Zenodo version License: MIT Build Status Codecov JOSS

BENCHmark for drug Screening with COllaborative FIltering (benchscofi) Python Package

This repository is a part of the EU-funded RECeSS project (#101102016), and hosts the implementations and / or wrappers to published implementations of collaborative filtering-based algorithms for easy benchmarking.

Statement of need

As of 2022, current drug development pipelines last around 10 years, costing $2billion in average, while drug commercialization failure rates go up to 90%. These issues can be mitigated by drug repurposing, where chemical compounds are screened for new therapeutic indications in a systematic fashion. In prior works, this approach has been implemented through collaborative filtering. This semi-supervised learning framework leverages known drug-disease matchings in order to recommend new ones.

There is no standard pipeline to train, validate and compare collaborative filtering-based repurposing methods, which considerably limits the impact of this research field. In benchscofi, the estimated improvement over the state-of-the-art (implemented in the package) can be measured through adequate and quantitative metrics tailored to the problem of drug repurposing across a large set of publicly available drug repurposing datasets.

Install the latest release

The fastest way to get access to all functionalities of benchscofi is to run the following command:

```bash

Using the Docker image: will open a container

docker push recessproject/benchscofi:1.0.1 ``` Documentation about benchscofi (and a manual installation) can be found at this page. The complete list of dependencies for benchscofi can be found at requirements.txt (pip).

Licence

This repository is under an OSI-approved MIT license.

Citation

If you use benchscofi in academic research, please cite it as follows

@article{reda2024stanscofi, title={stanscofi and benchscofi: a new standard for drug repurposing by collaborative filtering}, author={R{\'e}da, Cl{\'e}mence and Vie, Jill-J{\^e}nn and Wolkenhauer, Olaf}, journal={Journal of Open Source Software}, volume={9}, number={93}, pages={5973}, year={2024} }

Community guidelines with respect to contributions, issue reporting, and support

You are more than welcome to add your own algorithm to the package!

1. Add a novel implementation / algorithm

Add a new Python file (extension .py) in src/benchscofi/ named <model> (where model is the name of the algorithm), which contains a subclass of stanscofi.models.BasicModel which has the same name as your Python file. At least implement methods preprocessing, model_fit, model_predict_proba, and a default set of parameters (which is used for testing purposes). Please have a look at the placeholder file Constant.py which implements a classification algorithm which labels all datapoints as positive. It is highly recommended to provide a proper documentation of your class, along with its methods. When pushing a new algorithm to benchscofi, it is automatically tested (see tests/test_models.py and TemplateTest.py which are run). In order to run this test locally, please run in the tests/ folder:

bash python3 -m test_models <model> <dataset:default=Synthetic>

2. Rules for contributors

Pull requests and issue flagging are welcome, and can be made through the GitHub interface. Support can be provided by reaching out to recess-project[at]proton.me. However, please note that contributors and users must abide by the Code of Conduct.

Benchmark AUC and NDCG@items values (default parameters, single random training/testing set split) [updated 08/11/23]

These values (rounded to the closest 3rd decimal place) can be reproduced using the following command in folder tests/

bash python3 -m test_models <algorithm> <dataset:default=Synthetic> <batch_ratio:default=1>

:noentry:'s represent failure to train or to predict. N/A's have not been tested yet. When present, percentage in parentheses is the considered value of batchratio (to avoid memory crash on some of the datasets). [mem]: memory crash

Algorithm (global AUC) | Synthetic* | TRANSCRIPT [a] | Gottlieb [b] | Cdataset [c] | PREDICT [d] | LRSSL [e] | -------------------------- | ------------- | ----------------- | ------------- | ------------ | -------------- | --------- | PMF | 0.922 | 0.579 | 0.598 | 0.604 | 0.656 | 0.611 | PulearnWrapper | 1.000 | :noentry: | N/A | :noentry: | :noentry: | :noentry:| ALSWR | 0.971 | 0.507 | 0.677 | 0.724 | 0.693 | 0.685 | FastaiCollabWrapper | 1.000 | 0.876 | 0.856 | 0.837 | 0.835 | 0.851 | SimplePULearning | 0.995 | 0.949 (0.4) |:noentry:err|:noentry:err| 0.994 (4%) | :noentry:| SimpleBinaryClassifier | 0.876 | :noentry:[mem] | 0.855 | 0.938 (40%) | 0.998 (1%) | :noentry:| NIMCGCN | 0.907 | 0.854 | 0.843 | 0.841 | 0.914 (60%) | 0.873 | FFMWrapper | 0.924 | :noentry:[mem] | 1.000 (40%) | 1.000 (20%) |:noentry:[mem] | :noentry:| VariationalWrapper |:noentry:err| :noentry:err | 0.851 | 0.851 |:noentry:err | :noentry:| DRRS |:noentry:err| 0.662 | 0.838 | 0.878 |:noentry:err | 0.892 | SCPMF | 0.853 | 0.680 | 0.548 | 0.538 |:noentry:err | 0.708 | BNNR | 1.000 | 0.922 | 0.949 | 0.959 | 0.990 (1%) | 0.972 | LRSSL | 0.127 | 0.581 (90%) | 0.159 | 0.846 | 0.764 (1%) | 0.665 | MBiRW | 1.000 | 0.913 | 0.954 | 0.965 |:noentry:err | 0.975 | LibMFWrapper | 1.000 | 0.919 | 0.892 | 0.912 | 0.923 | 0.873 | LogisticMF | 1.000 | 0.910 | 0.941 | 0.955 | 0.953 | 0.933 | PSGCN | 0.767 | :noentry:err | 0.802 | 0.888 | :noentry: | 0.887 | DDASKF | 0.779 | 0.453 | 0.544 | 0.264 (20%) | 0.591 | 0.542 | HAN | 1.000 | 0.870 | 0.909 | 0.905 | 0.904 | 0.923 | PUextraTrees (``nestimators=10) | 0.045 (50%) | 0.325 (50%) | 0.246 (20%) |:no_entry:[mem] | 0.309 (5%)| XGBoost (n_estimators=100``) | 0.500 | 0.500 (20%) | 0.500 | 0.500 | 0.500 (1%) | 0.500 (60%) |

The NDCG score is computed across all diseases (global), at k=#items.

Algorithm (global NDCG@k) | Synthetic@300*| TRANSCRIPT@613[a] |Gottlieb@593[b]|Cdataset@663[c]|PREDICT@1577[d]|LRSSL@763[e]| -------------------------- | ------------- | ----------------- | ------------- | ------------ | -------------- | --------- | PMF | 0.070 | 0.019 | 0.015 | 0.011 | 0.005 | 0.007 | PulearnWrapper | N/A | :noentry: | N/A | :noentry: | :noentry: | :noentry:| ALSWR | 0.000 | 0.177 | 0.236 | 0.406 | 0.193 | 0.424 | FastaiCollabWrapper | 1.000 | 0.035 | 0.012 | 0.003 | 0.001 | 0.000 | SimplePULearning | 1.000 | 0.059 (40%) |:noentry:err|:noentry:err| 0.025 (4%) |:noentry:err| SimpleBinaryClassifier | 0.000 | :noentry:[mem] | 0.002 | 0.005 (40%) | 0.070 (1%) |:noentry:err| NIMCGCN | 0.568 | 0.022 | 0.006 | 0.005 | 0.007 (60%) | 0.014 | FFMWrapper | 1.000 | :noentry:[mem] | 1.000 (40%) | 1.000 (20%) |:noentry:[mem] | :noentry:| VariationalWrapper |:noentry:err| :noentry:err | 0.011 | 0.010 |:noentry:err | :noentry:| DRRS |:noentry:err| 0.484 | 0.301 | 0.426 |:noentry:err | 0.182 | SCPMF | 0.528 | 0.102 | 0.025 | 0.011 |:noentry:err | 0.008 | BNNR | 1.000 | 0.466 | 0.417 | 0.572 | 0.217 (1%) | 0.508 | LRSSL | 0.206 | 0.032 (90%) | 0.009 | 0.004 | 0.103 (1%) | 0.012 | MBiRW | 1.000 | 0.085 | 0.267 | 0.352 |:noentry:err | 0.457 | LibMFWrapper | 1.000 | 0.419 | 0.431 | 0.605 | 0.502 | 0.430 | LogisticMF | 1.000 | 0.323 | 0.106 | 0.101 | 0.076 | 0.078 | PSGCN | 0.969 | :noentry:err | 0.074 | 0.052 |:noentry:err | 0.110 | DDASKF | 1.000 | 0.039 | 0.069 | 0.078 (20%) | 0.065 | 0.069 | HAN | 1.000 | 0.075 | 0.007 | 0.000 | 0.001 | 0.002 | PUextraTrees (``nestimators=10) | 0.000 (50%) | 0.198 (50%) | 0.162 (20%) |:no_entry:[mem] | 0.235 (5%)| XGBoost (n_estimators=100``) | 0.061 | 0.000 (20%) | 0.002 | 0.000 | 0.000 (1%) | 0.000 (60%) |

:no_entry: Note that results from ``LibMFWrapper'' are not reproducible, and the resulting metrics might slightly vary across iterations.

:no_entry: XGBoost and SimpleBinaryClassifier do not take into account unlabeled points (they assume they are negative points).

Datasets

*Synthetic dataset created with function generate_dummy_dataset in stanscofi.datasets and the following arguments: python npositive=200 #number of positive pairs nnegative=100 #number of negative pairs nfeatures=50 #number of pair features mean=0.5 #mean for the distribution of positive pairs, resp. -mean for the negative pairs std=1 #standard deviation for the distribution of positive and negative pairs random_seed=124565 #random seed

[a] Réda, Clémence. (2023). TRANSCRIPT drug repurposing dataset (2.0.0) [Data set]. Zenodo. doi:10.5281/zenodo.7982976

[b] Gottlieb, A., Stein, G. Y., Ruppin, E., & Sharan, R. (2011). PREDICT: a method for inferring novel drug indications with application to personalized medicine. Molecular systems biology, 7(1), 496.

[c] Luo, H., Li, M., Wang, S., Liu, Q., Li, Y., & Wang, J. (2018). Computational drug repositioning using low-rank matrix approximation and randomized algorithms. Bioinformatics, 34(11), 1904-1912.

[d] Réda, Clémence. (2023). PREDICT drug repurposing dataset (2.0.1) [Data set]. Zenodo. doi:10.5281/zenodo.7983090

[e] Liang, X., Zhang, P., Yan, L., Fu, Y., Peng, F., Qu, L., … & Chen, Z. (2017). LRSSL: predict and interpret drug–disease associations based on data integration using sparse subspace learning. Bioinformatics, 33(8), 1187-1196.

Owner

  • Name: RECeSS EU project
  • Login: RECeSS-EU-Project
  • Kind: user
  • Location: Rostock, Germany
  • Company: Universität Rostock

The RECeSS (Robust Explainable Controllable Standard for drug Screening) project is funded by a Marie Skłodowska-Curie Postdoctoral Fellowship 2022.

Citation (CITATION.cff)

cff-version: "1.1.0"
authors:
- family-names: Réda
  given-names: Clémence
  orcid: "https://orcid.org/0000-0003-3238-0258"
- family-names: Vie
  given-names: Jill-Jênn
  orcid: "https://orcid.org/0000-0002-9304-2220"
- family-names: Wolkenhauer
  given-names: Olaf
  orcid: "https://orcid.org/0000-0001-6105-2937"
doi: 10.5281/zenodo.10561760
message: If you use this software, please cite our article in the
  Journal of Open Source Software.
preferred-citation:
  authors:
  - family-names: Réda
    given-names: Clémence
    orcid: "https://orcid.org/0000-0003-3238-0258"
  - family-names: Vie
    given-names: Jill-Jênn
    orcid: "https://orcid.org/0000-0002-9304-2220"
  - family-names: Wolkenhauer
    given-names: Olaf
    orcid: "https://orcid.org/0000-0001-6105-2937"
  date-published: 2024-01-25
  doi: 10.21105/joss.05973
  issn: 2475-9066
  issue: 93
  journal: Journal of Open Source Software
  publisher:
    name: Open Journals
  start: 5973
  title: "stanscofi and benchscofi: a new standard for drug repurposing
    by collaborative filtering"
  type: article
  url: "https://joss.theoj.org/papers/10.21105/joss.05973"
  volume: 9
title: "stanscofi and benchscofi: a new standard for drug repurposing by
  collaborative filtering"

CodeMeta (codemeta.json)

{
  "@context": "https://doi.org/10.5063/schema/codemeta-2.0",
  "@type": "SoftwareSourceCode",
  "license": "https://spdx.org/licenses/MIT",
  "codeRepository": "git+https://github.com/RECeSS-EU-Project/benchscofi.git",
  "dateCreated": "2023-08-11",
  "datePublished": "2023-08-11",
  "dateModified": "2023-08-11",
  "downloadUrl": "https://github.com/RECeSS-EU-Project/benchscofi/archive/refs/heads/master.zip",
  "issueTracker": "https://github.com/RECeSS-EU-Project/benchscofi/issues",
  "name": "benchscofi",
  "version": "1.0.0",
  "identifier": "10.5281/zenodo.8241505",
  "description": "Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.",
  "applicationCategory": "Drug development",
  "releaseNotes": "",
  "funding": "RECeSS - Robust Explainable Controllable Standard for drug Screening (101102016)",
  "developmentStatus": "active",
  "isPartOf": "https://recess-eu-project.github.io/",
  "readme": "https://github.com/RECeSS-EU-Project/benchscofi/blob/master/README.md",
  "softwareVersion": "1.0.0",
  "funder": {
    "@type": "Organization",
    "name": "European Union's Horizon 2020 research and innovation programme"
  },
  "keywords": [
    "Python",
    "collaborative filtering",
    "benchmark",
    "drug repurposing",
    "science reproducibility"
  ],
  "programmingLanguage": [
    "Python 3"
  ],
  "softwareRequirements": [
    "stanscofi>=1.0.1",
    "numpy>=1.19.4"
  ],
  "author": [
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0003-3238-0258",
      "givenName": "Clmence",
      "familyName": "Rda",
      "email": "clemence.reda@uni-rostock.de",
      "affiliation": {
        "@type": "Organization",
        "name": "Systems Biology and Informatics, University of Rostock, Rostock, Germany"
      }
    }
  ],
  "maintainer": [
    {
      "@type": "Person",
      "@id": "https://orcid.org/0000-0003-3238-0258",
      "givenName": "Clmence",
      "familyName": "Rda",
      "email": "clemence.reda@uni-rostock.de",
      "affiliation": {
        "@type": "Organization",
        "name": "Systems Biology and Informatics, University of Rostock, Rostock, Germany"
      }
    }
  ]
}

GitHub Events

Total
  • Release event: 1
  • Push event: 2
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 2
  • Create event: 1

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 55 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 5
  • Total maintainers: 1
pypi.org: benchscofi

Package which contains implementations of published collaborative filtering-based algorithms for drug repurposing.

  • Versions: 5
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 55 Last month
Rankings
Dependent packages count: 7.5%
Downloads: 18.1%
Forks count: 30.2%
Average: 32.9%
Stargazers count: 39.1%
Dependent repos count: 69.8%
Maintainers (1)
Last synced: 6 months ago