Recent Releases of drug-repurposing-datasets
drug-repurposing-datasets - Jupyter notebooks v2.0.0
Drug Repurposing Datasets for Collaborative Filtering Methods. Notebooks and code to generate datasets for collaborative filtering-based drug repurposing.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago
drug-repurposing-datasets - Jupyter Notebooks v1.0.0 v2.0.0
Drug Repurposing Datasets for Collaborative Filtering Methods. Notebooks and code to generate datasets for collaborative filtering-based drug repurposing.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago
drug-repurposing-datasets - PREDICT v2.0.0
Version 2.0.0 (05/29/2023)
This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de at Universität Rostock, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It uses the same type of data than the dataset compiled by Gottlieb et al., 2011 [PMID: 21654673]. The sparsity number is the percentage of nonzero values in the association matrix.
drugs | # diseases | Sparsity number | # positive associations | # negative associations
------- | ---------- | --------------- | ----------------------- | ----------------------- 1,351 | 1,066 | 0.34% | 5,624 | 152
Note that not all drugs (resp., diseases) might have an associated feature vector (that is, all drugs and diseases in the similarity matrix appear in the association matrix, but not necessarily the other way around). Note that there are missing values (NaN) in this dataset (31% of values in the full drug feature matrix, 60% of values in the full disease feature matrix).
This dataset consists of five .CSV files:
- Drug-Disease Association Matrix
- "ratings_mat.csv"
This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.
- Drug-Drug Similarity Matrix
All drug-drug similarity matrices have drugs in their columns and rows, identified by their DrugBank IDs or PubChem CIDs.
- "sePREDICTmatrix.csv"
Jaccard score similarity between one-hot encodings of the side effects reported for drugs.
- "signaturePREDICTmatrix.csv"
Jaccard score similarity between drug signatures (from the CREEDS or the LINCS L1000 databases), that is, vectors reporting the genewise change in activity due to treatment.
Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.
- Disease-Disease Similarity Matrix
All disease-disease similarity matrices have diseases in their columns and rows, identified by their MedGen Concept IDs.
- "diseasesemanticPREDICT_matrix.csv"
Resnik semantic similarity between onthology nodes associated with diseases (from the HPO database).
- "diseasephenotypePREDICT_matrix.csv"
Jaccard score similarity between disease phenotypes (from the CREEDS database), that is, vectors reporting the genewise change in activity due to the disease.
Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.
For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago
drug-repurposing-datasets - TRANSCRIPT v2.0.0
Version 2.0.0 (05/29/2023)
This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de at Universität Rostock, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It only uses transcriptomic data (i.e., gene activity/expression). The sparsity number is the percentage of nonzero values in the association matrix.
drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes
------- | ---------- | --------------- | ----------------------- | ----------------------- | ------- 204 | 116 | 0.44% | 401 | 11 | 12,096
All drugs (resp., diseases) are associated with a gene expression feature vector of length 12,096 (that is, all drugs and diseases in the feature matrices appear in the association matrix, and vice versa).
This dataset consists of three .CSV files:
- Drug-Disease Association Matrix
- "ratings_mat.csv"
This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.
- Drug Feature Matrix
- "items.csv"
This matrix has drugs in its columns, identified by their DrugBank IDs or PubChem CIDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by drug treatment, from the CREEDS or the LINCS L1000 databases.
- Disease Feature Matrix
- "users.csv"
This matrix has diseases in its columns, identified by their MedGen Concept IDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by the disease, from the CREEDS database.
Further information about the generation of those matrices is available by running the Jupyter notebook TRANSCRIPT_dataset.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets. For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago
drug-repurposing-datasets - PREDICT v1.0.0
Version 1.0.0 (12/28/2022)
This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It uses the same type of data than the dataset compiled by Gottlieb et al., 2011 [PMID: 21654673]. The sparsity number is the percentage of nonzero values in the association matrix.
drugs | # diseases | Sparsity number | # positive associations | # negative associations
------- | ---------- | --------------- | ----------------------- | ----------------------- 1,395 | 1,501 | 0.38% | 8,240 | 295
Note that not all drugs (resp., diseases) might have an associated feature vector (that is, all drugs and diseases in the similarity matrix appear in the association matrix, but not necessarily the other way around). Note that there are missing values (NaN) in this dataset.
This dataset consists of five .CSV files:
- Drug-Disease Association Matrix
- "ratings_mat.csv"
This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.
- Drug-Drug Similarity Matrix
All drug-drug similarity matrices have drugs in their columns and rows, identified by their DrugBank IDs or PubChem CIDs.
- "sePREDICTmatrix.csv"
Jaccard score similarity between one-hot encodings of the side effects reported for drugs.
- "signaturePREDICTmatrix.csv"
Jaccard score similarity between drug signatures (from the CREEDS or the LINCS L1000 databases), that is, vectors reporting the genewise change in activity due to treatment.
Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICTdatasetv1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.
- Disease-Disease Similarity Matrix
All disease-disease similarity matrices have diseases in their columns and rows, identified by their MedGen Concept IDs.
- "diseasesemanticPREDICT_matrix.csv"
Resnik semantic similarity between onthology nodes associated with diseases (from the HPO database).
- "diseasephenotypePREDICT_matrix.csv"
Jaccard score similarity between disease phenotypes (from the CREEDS database), that is, vectors reporting the genewise change in activity due to the disease.
Other types of similarity matrices and further information about the generation of those matrices are available by running the Jupyter notebook PREDICTdatasetv1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets.
For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago
drug-repurposing-datasets - TRANSCRIPT v1.0.0
Version 1.0.0 (12/28/2022)
This is a drug repurposing dataset under MIT licence, compiled by Dr. Clémence Réda clemence.reda@uni-rostock.de, comprising a drug-disease association matrix, and several drug-drug and disease-disease similarity matrices. It only uses transcriptomic data (i.e., gene activity/expression). The sparsity number is the percentage of nonzero values in the association matrix.
drugs | # diseases | Sparsity number | # positive associations | # negative associations | # genes
------- | ---------- | --------------- | ----------------------- | ----------------------- | ------- 871 | 144 | 0.76% | 773 | 181 | 10,811
All drugs (resp., diseases) are associated with a gene expression feature vector of length 10,811 (that is, all drugs and diseases in the feature matrices appear in the association matrix, and vice versa). However, some diseases/drugs are not necessarily involved in negative or positive associations (meaning that all pairs with those items have an association value of 0).
This dataset consists of three .CSV files:
- Drug-Disease Association Matrix
- "ratings_mat.csv"
This matrix contains values in {-1,0,1} where -1 stands for a negative association (i.e., the drug failed for some reason to treat the considered disease: e.g., lack of accrual in the associated clinical trial, or proven toxicity), 1 for a positive association (i.e., the drug was shown to treat the disease), and 0 for unknown associated status. The columns are diseases, identified by their MedGen Concept ID, whereas rows are drugs, identified by their DrugBank IDs or PubChem CIDs.
- Drug Feature Matrix
- "items.csv"
This matrix has drugs in its columns, identified by their DrugBank IDs or PubChem CIDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by drug treatment, from the CREEDS or the LINCS L1000 databases.
- Disease Feature Matrix
- "users.csv"
This matrix has diseases in its columns, identified by their MedGen Concept IDs, and genes in its rows, identified by their HUGO Gene Symbol. Genewise transcriptomic variation induced by the disease, from the CREEDS database.
Further information about the generation of those matrices is available by running the Jupyter notebook TRANSCRIPT_dataset-v1.0.0.ipynb on the following GitHub repository: https://github.com/RECeSS-EU-Project/drug-repurposing-datasets. For any questions, please contact the author at clemence.reda@uni-rostock.de or the RECeSS project contributors at recess-project@proton.me.
- Jupyter Notebook
Published by RECeSS-EU-Project over 2 years ago