https://github.com/bachi55/deepccs
CCS prediction using deep neural network
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 3 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Last synced: 10 months ago
·
JSON representation
Repository
CCS prediction using deep neural network
Basic Info
- Host: GitHub
- Owner: bachi55
- License: gpl-3.0
- Language: Python
- Default Branch: master
- Size: 52.9 MB
Statistics
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of plpla/DeepCCS
Created over 7 years ago
· Last pushed over 7 years ago
https://github.com/bachi55/DeepCCS/blob/master/
CCS prediction from SMILES using deep neural network. ## For the impatients After installation, go to the `DeepCCS/interface/` directory. DeepCCS predict -i INPUT - **INPUT** is the input file with at least a SMILES and an Adducts column The default model and encoders will be used. See the [predict](https://github.com/plpla/DeepCCS#predict) section bellow for more options. ## Installation DeepCCS was tested and works under Python 3.6. We recommend to use [conda](https://conda.io/docs/user-guide/install/download.html). Package required: * Numpy * Pandas * Scikit-learn * Tensorflow * Keras To install, go to the `core` directory and simply perform the following command using a terminal: python setup.py install ## Functionalities *On Windows operating system, the symbolic link `DeepCCS` will not work and user should use the command_line_tool.py script in order to use DeepCCS* ### Predict Predict CCS using a SMILES and an adduct. DeepCCS predict -mp MODEL_DIR -ap ADDUCTS_ENCODER_DIR -sp SMILES_ENCODER_DIR -i INPUT_F -o OUTPUT_F *Required args :* - i : The input file, with at least a SMILES and a Adducts columns *Optionnal args :* - mp : Directory containing the model.h5 file (default="../saved_models/default/") - ap : Directory containing the adducts_encoder.json file (default="../saved_models/default/") - sp : Directory containing the smiles_encoder.json file (default="../saved_models/default/") - o : Desired name for the output file(ex: MyFile.csv), if none stdout will be used. ### Compare Compare provided CCS values to the ones contained in every dataset used to train and test DeepCCS (no predictions involved). DeepCCS compare -f H5_F -i REFERENCE_F -d S_DATASET1,S_DATASET2,... -o OUTPUT_P *Required args :* - i : The reference file, with at least SMILES, Adducts and CCS columns - f : The hdf5 file containing all the source datasets *Optionnal args :* - d : Names of the source datasets (as a list without ) to use for comparison, if none they are all considered. - Choices are : `MetCCS_pos`, `MetCCS_neg`, `Agilent_pos`, `Agilent_neg`, `Waters_pos`, `Waters_neg`, `PNL`, `McLean`, `CBM` - o : Desired prefix for the output files (ex: compare_to_MetCCS_), because there is one output file per compared source dataset ### Evaluate Perform CCS predictions and evaluate the model using measured values DeepCCS evaluate -mp MODEL_DIR -ap ADDUCTS_ENCODER_DIR -sp SMILES_ENCODER_DIR -i REFERENCE_F -o OUTPUT_F *Required args :* - i : Input reference file. Must contain at least SMILES, Adducts and CCS columns *Optionnal args :* - mp : Directory containing the model.h5 file (default="../saved_models/default/") - ap : Directory containing the adducts_encoder.json file (default="../saved_models/default/") - sp : Directory containing the smiles_encoder.json file (default="../saved_models/default/") - o : Desired name for the output file (ex: MyFile.csv), if not specified stdout will be used. ### Train Train a new model including your own measurements with or without the available datasets. DeepCCS train -f H5_F -ap ADDUCTS_ENCODER_DIR -sp SMILES_ENCODER_DIR -mtrain -pnnl -cbm -mclean -o OUTPUT_DIR -nd NEW_D1 -nepochs 150 *Required args :* - f : The hdf5 file containing all the source datasets *Optionnal args :* - ap : Directory containing the adducts_encoder.json file. "d" will make the model train with the default encoder. If argument is not used a new encoder will be created (default = None) - sp : Directory containing the smiles_encoder.json file. "d" will make the model train with the default encoder. If argument is not used, a new encoder will be created (default = None) - mtrain : Use the MetCCS_pos and MetCCS_neg datasets as training data (default = false) - mtestA : Use the Agilent_pos and Agilent_neg test datasets from MetCCS as training data (default = false) - mtestW : Use the Waters_pos and Waters_neg test datasets from MetCCS as training data (default = false) - pnnl : Use the PNNL dataset as training data (default = false) - cbm : Use the CBM2018 dataset as training data (default = false) - mclean : Use the McLean Lab dataset as training data (default = false) - nd : New datasets to create the model. If multiple files, as a list seperated by "," (default = None) - test: Proportion of each dataset that must be kept in the testing set (default: 0.2) - o : Existing directory to ouput model and mappers (default = current directory) - nepochs : Number of epochs to use for the models training (default = 150) At least one dataset is required to train a new model. Datasets selected will be splited between the training and testing set according to the `test` argument value except for `mtrain` which is always completly in the training set. ### Additional notes * The `Adducts` column of the input file must contain adducts as: `M+H`, `M+Na`, `M-H` and `M-2H`. * The `SMILES` column accept any SMILES format but isomeric SMILES are recommended. * The package includes a `DeepCCSModel` class that can be used directly in python without the command line tool. ## References DeepCCS relies heavily on datasets that were previously published by others: * Zhou Z, Shen X, Tu J, Zhu ZJ. Large-Scale Prediction of Collision Cross-Section Values for Metabolites in Ion Mobility-Mass Spectrometry. Anal Chem. 2016 Nov 15;88(22):11084-11091. Epub 2016 Nov 1. PubMed PMID: 27768289. * Zheng X, Aly NA, Zhou Y, Dupuis KT, Bilbao A, Paurus VL, Orton DJ, Wilson R, Payne SH, Smith RD, Baker ES. A structural examination and collision cross section database for over 500 metabolites and xenobiotics using drift tube ion mobility spectrometry. Chem Sci. 2017 Nov 1;8(11):7724-7736. doi: 10.1039/c7sc03464d. Epub 2017 Sep 28. PubMed PMID: 29568436; PubMed Central PMCID: PMC5853271. * May JC, Goodwin CR, Lareau NM, Leaptrot KL, Morris CB, Kurulugama RT, Mordehai A, Klein C, Barry W, Darland E, Overney G, Imatani K, Stafford GC, Fjeldsted JC, McLean JA. Conformational ordering of biomolecules in the gas phase: nitrogen collision cross sections measured on a prototype high resolution drift tube ion mobility-mass spectrometer. Anal Chem. 2014 Feb 18;86(4):2107-16. doi: 10.1021/ac4038448. Epub 2014 Feb 4. PubMed PMID: 24446877; PubMed Central PMCID: PMC3931330. * Mollerup CB, Mardal M, Dalsgaard PW, Linnet K, Barron LP. Prediction of collision cross section and retention time for broad scope screening in gradient reversed-phase liquid chromatography-ion mobility-high resolution accurate mass spectrometry. J Chromatogr A. 2018 Mar 23;1542:82-88. doi: 10.1016/j.chroma.2018.02.025. Epub 2018 Feb 15. PubMed PMID: 29472071.
Owner
- Name: Eric Bach
- Login: bachi55
- Kind: user
- Location: Espoo, Finnland
- Company: Aalto University
- Website: https://www.linkedin.com/in/eric-bach-ml/
- Repositories: 10
- Profile: https://github.com/bachi55
Doctoral student in the field of Machine Learning, Bioinformatics and Computational Metabolomics.