drug-repurposing-datasets - Jupyter notebooks v2.0.0

Drug Repurposing Datasets for Collaborative Filtering Methods. Notebooks and code to generate datasets for collaborative filtering-based drug repurposing.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

drug-repurposing-datasets - Jupyter Notebooks v1.0.0 v2.0.0

Drug Repurposing Datasets for Collaborative Filtering Methods. Notebooks and code to generate datasets for collaborative filtering-based drug repurposing.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

drug-repurposing-datasets - PREDICT v2.0.0

Version 2.0.0 (05/29/2023)

This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de at Universität Rostock, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It uses the same type of data than the dataset compiled by Gottlieb et al., 2011 [PMID: 21654673]. The sparsity number is the percentage of nonzero values in the association matrix.

drugs | # diseases | Sparsity number | # positive associations | # negative associations

------- | ---------- | --------------- | ----------------------- | ----------------------- 1,351 | 1,066 | 0.34% | 5,624 | 152

Note that not all drugs (resp., diseases) might have an associated feature vector (that is, all drugs and diseases in the similarity matrix appear in the association matrix, but not necessarily the other way around). Note that there are missing values (NaN) in this dataset (31% of values in the full drug feature matrix, 60% of values in the full disease feature matrix).

This dataset consists of five .CSV files:

Drug-Disease Association Matrix

"ratings_mat.csv"

This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.

Drug-Drug Similarity Matrix

All drug-drug similarity matrices have drugs in their columns and rows, identified by their DrugBank IDs or PubChem CIDs.

"sePREDICTmatrix.csv"

Jaccard score similarity between one-hot encodings of the side effects reported for drugs.

"signaturePREDICTmatrix.csv"

Jaccard score similarity between drug signatures (from the CREEDS or the LINCS L1000 databases), that is, vectors reporting the genewise change in activity due to treatment.

Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.

Disease-Disease Similarity Matrix

All disease-disease similarity matrices have diseases in their columns and rows, identified by their MedGen Concept IDs.

"diseasesemanticPREDICT_matrix.csv"

Resnik semantic similarity between onthology nodes associated with diseases (from the HPO database).

"diseasephenotypePREDICT_matrix.csv"

Jaccard score similarity between disease phenotypes (from the CREEDS database), that is, vectors reporting the genewise change in activity due to the disease.

Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.

For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

drug-repurposing-datasets - TRANSCRIPT v2.0.0

Version 2.0.0 (05/29/2023)

This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de at Universität Rostock, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It only uses transcriptomic data (i.e., gene activity/expression). The sparsity number is the percentage of nonzero values in the association matrix.

drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes

------- | ---------- | --------------- | ----------------------- | ----------------------- | ------- 204 | 116 | 0.44% | 401 | 11 | 12,096

All drugs (resp., diseases) are associated with a gene expression feature vector of length 12,096 (that is, all drugs and diseases in the feature matrices appear in the association matrix, and vice versa).

This dataset consists of three .CSV files:

Drug-Disease Association Matrix

"ratings_mat.csv"

This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.

Drug Feature Matrix

"items.csv"

This matrix has drugs in its columns, identified by their DrugBank IDs or PubChem CIDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by drug treatment, from the CREEDS or the LINCS L1000 databases.

Disease Feature Matrix

"users.csv"

This matrix has diseases in its columns, identified by their MedGen Concept IDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by the disease, from the CREEDS database.

Further information about the generation of those matrices is available by running the Jupyter notebook TRANSCRIPT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets. For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

drug-repurposing-datasets - PREDICT v1.0.0

Version 1.0.0 (12/28/2022)

This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It uses the same type of data than the dataset compiled by Gottlieb et al., 2011 [PMID: 21654673]. The sparsity number is the percentage of nonzero values in the association matrix.

drugs | # diseases | Sparsity number | # positive associations | # negative associations

------- | ---------- | --------------- | ----------------------- | ----------------------- 1,395 | 1,501 | 0.38% | 8,240 | 295

Note that not all drugs (resp., diseases) might have an associated feature vector (that is, all drugs and diseases in the similarity matrix appear in the association matrix, but not necessarily the other way around). Note that there are missing values (NaN) in this dataset.

This dataset consists of five .CSV files:

Drug-Disease Association Matrix

"ratings_mat.csv"

This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.

Drug-Drug Similarity Matrix

All drug-drug similarity matrices have drugs in their columns and rows, identified by their DrugBank IDs or PubChem CIDs.

"sePREDICTmatrix.csv"

Jaccard score similarity between one-hot encodings of the side effects reported for drugs.

"signaturePREDICTmatrix.csv"

Jaccard score similarity between drug signatures (from the CREEDS or the LINCS L1000 databases), that is, vectors reporting the genewise change in activity due to treatment.

Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICTdatasetv1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.

Disease-Disease Similarity Matrix

All disease-disease similarity matrices have diseases in their columns and rows, identified by their MedGen Concept IDs.

"diseasesemanticPREDICT_matrix.csv"

Resnik semantic similarity between onthology nodes associated with diseases (from the HPO database).

"diseasephenotypePREDICT_matrix.csv"

Jaccard score similarity between disease phenotypes (from the CREEDS database), that is, vectors reporting the genewise change in activity due to the disease.

Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICTdatasetv1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.

For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

drug-repurposing-datasets - TRANSCRIPT v1.0.0

Version 1.0.0 (12/28/2022)

This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It only uses transcriptomic data (i.e., gene activity/expression). The sparsity number is the percentage of nonzero values in the association matrix.

drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes

------- | ---------- | --------------- | ----------------------- | ----------------------- | ------- 871 | 144 | 0.76% | 773 | 181 | 10,811

All drugs (resp., diseases) are associated with a gene expression feature vector of length 10,811 (that is, all drugs and diseases in the feature matrices appear in the association matrix, and vice versa). However, some diseases/drugs are not necessarily involved in negative or positive associations (meaning that all pairs with those items have an association value of 0).

This dataset consists of three .CSV files:

Drug-Disease Association Matrix

"ratings_mat.csv"

This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.

Drug Feature Matrix

"items.csv"

This matrix has drugs in its columns, identified by their DrugBank IDs or PubChem CIDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by drug treatment, from the CREEDS or the LINCS L1000 databases.

Disease Feature Matrix

"users.csv"

This matrix has diseases in its columns, identified by their MedGen Concept IDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by the disease, from the CREEDS database.

Further information about the generation of those matrices is available by running the Jupyter notebook TRANSCRIPT_dataset-v1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets. For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.

- Jupyter Notebook
Published by RECeSS-EU-Project about 3 years ago

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

Recent Releases of drug-repurposing-datasets

drug-repurposing-datasets - Jupyter notebooks v2.0.0

drug-repurposing-datasets - Jupyter Notebooks v1.0.0 v2.0.0

drug-repurposing-datasets - PREDICT v2.0.0

drugs | # diseases | Sparsity number | # positive associations | # negative associations

drug-repurposing-datasets - TRANSCRIPT v2.0.0

drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes

drug-repurposing-datasets - PREDICT v1.0.0

drugs | # diseases | Sparsity number | # positive associations | # negative associations

drug-repurposing-datasets - TRANSCRIPT v1.0.0

drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes