Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.3%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: chimie-paristech-CTM
  • License: other
  • Language: Python
  • Default Branch: main
  • Size: 52.7 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 1
  • Open Issues: 0
  • Releases: 0
Created almost 2 years ago · Last pushed 7 months ago
Metadata Files
Readme Contributing License Citation

README.md

thermo_GNN

License Python PyPI version Downloads

This is the repository containing the code associated with the paper "Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture". Code is provided "as-is". Minor edits may be required to tailor the scripts for different computational systems.

Table of Contents

Features

  • Atom-fingerprint:
  • Mol-feature:
  • Ringcount feature:
  • MLP_Trigonometric:
  • KAN_Trigonometric:

More information on the meaning of the individual features, we refer to the associated manuscript.

Requirements

To use CPUs, Suitable for x86 and ARM platforms. To use GPUs, you will need: * cuda >= 8.0 * cuDNN

Installation

To download the code git clone https://github.com/chimie-paristech-CTM/tree/main cd thermo_GNN To set up the thermoGNN conda environment: conda env create -f environment.yml To install the thermoGNN package, activate the thermoGNN environment and run the following command within the thermoGNN directory: conda activate thermo_GNN pip install -e .

Updates

  • ☑ add Atom fingerprint.
  • ☑ add Mol-feature.
  • ☑ add Ringcount feature.
  • ☑ add Mlp_Trigonometric.
  • ☑ add KAN_Trigonometric.

Quick Start

Folder Structure

. ├── README.md ├── dataset/smalldataset │ ├── data/ │ │ ├── *data*.csv │ ... └── chemprop/ . ├── *data*.csv column |smiles | target_label| data: │smiles1| target_label1| │smiles2| target_label2| │smiles3| target_label3| ... └──

Train

To train a model, run:

python train.py --data_path <path> --dataset_type <type> --save_dir <dir> --epochs <epoch> --input_features_type <input_features_type> --aggregation <norm> --output_fingerprint <output_fingerprint> --model <model> where: 1. <path> is the csv file path not the dir path. 2. <aggregation> containing [sum, mean, norm] controlling the output type of outputhead. 3. `<inputfeaturestype>` containing [chemprop, jpca, moleculelevelfeature] controling the type of input feature. 4. `<outputfingerprint>containing [atom, mol] controlling the type of output fingerprint. 5.` containing [dpmnn, kantrigonometric, mlptrigonometric] controlling the type of output fingerprint.

For example: python train.py --data_path ./dataset/singledata/lipo_train.csv --dataset_type regression --output_fingerprint atom --save_dir ./lipo/checkpoint --epochs 2 --input_features_type molecule_level_feature --aggregation norm

A full list of available command-line arguments can be found in chemprop/args.py.

If installed from source, python train.py can be replaced with chemprop_train.

Notes: * The default metric for classification is AUC and the default metric for regression is RMSE. Other metrics may be specified with --metric <metric>. * --save_dir may be left out if you don't want to save model checkpoints. * --quiet can be added to reduce the amount of debugging information printed to the console. Both a quiet and verbose version of the logs are saved in the save_dir.

Script

the folder 'dataset_preparation' contains all scripts to process the original datasets.

Dataset

This link datasets contains qm9, paton, qmugs, pc9, and qmugs1.1 datasets.

Citation

If (parts of) this work are used as part of a publication, please cite the paper: @article{***, title={Graph-based deep learning models for thermodynamic property prediction: The interplay between target definition, data distribution, featurization, and model architecture}, author={Bowen Deng ,Thijs Stuyver}, journal={ChemRxiv}, year={2024} } Furthermore, since the work is based on chemprop, please also cite the paper in which this code was originally presented: @article{***, title={Chemprop: A Machine Learning Package for Chemical Property Prediction}, author={Esther Heid ,Kevin P. Greenman ,Yunsie Chung ,Shih-Cheng Li ,David E. Graff ,Florence H. Vermeire ,Haoyang Wu ,William H. Green ,Charles J. McGill}, journal={ChemRxiv}, year={2023} }

Acknowledgement

Owner

  • Name: chimie-paristech-CTM
  • Login: chimie-paristech-CTM
  • Kind: organization

Citation (CITATIONS.bib)

# this was downloaded from ACS: https://pubs.acs.org/doi/10.1021/acs.jcim.9b00237
@article{chemprop_theory,
    author = {Yang, Kevin and Swanson, Kyle and Jin, Wengong and Coley, Connor and Eiden, Philipp and Gao, Hua and Guzman-Perez, Angel and Hopper, Timothy and Kelley, Brian and Mathea, Miriam and Palmer, Andrew and Settels, Volker and Jaakkola, Tommi and Jensen, Klavs and Barzilay, Regina},
    title = {Analyzing Learned Molecular Representations for Property Prediction},
    journal = {Journal of Chemical Information and Modeling},
    volume = {59},
    number = {8},
    pages = {3370-3388},
    year = {2019},
    doi = {10.1021/acs.jcim.9b00237},
        note ={PMID: 31361484},
    URL = { 
            https://doi.org/10.1021/acs.jcim.9b00237
    },
    eprint = { 
            https://doi.org/10.1021/acs.jcim.9b00237
    }
}

# this was downloaded from ACS: https://pubs.acs.org/doi/10.1021/acs.jcim.3c01250
@article{chemprop_software,
    author = {Heid, Esther and Greenman, Kevin P. and Chung, Yunsie and Li, Shih-Cheng and Graff, David E. and Vermeire, Florence H. and Wu, Haoyang and Green, William H. and McGill, Charles J.},
    title = {Chemprop: A Machine Learning Package for Chemical Property Prediction},
    journal = {Journal of Chemical Information and Modeling},
    volume = {64},
    number = {1},
    pages = {9-17},
    year = {2024},
    doi = {10.1021/acs.jcim.3c01250},
        note ={PMID: 38147829},
    URL = { 
            https://doi.org/10.1021/acs.jcim.3c01250
    },
    eprint = {     
            https://doi.org/10.1021/acs.jcim.3c01250
    }
}

GitHub Events

Total
  • Watch event: 5
  • Delete event: 3
  • Push event: 9
  • Public event: 1
  • Fork event: 1
Last Year
  • Watch event: 5
  • Delete event: 3
  • Push event: 9
  • Public event: 1
  • Fork event: 1

Dependencies

Dockerfile docker
  • mambaorg/micromamba 0.23.0 build
setup.py pypi
  • Werkzeug <3
  • descriptastorus >=2.6.1
  • descriptastorus <2.6.1
  • flask >=1.1.2,<=2.1.3
  • hyperopt >=0.2.3
  • matplotlib >=3.1.3
  • numpy >=1.18.1
  • pandas >=1.0.3
  • pandas-flavor >=0.2.0
  • rdkit >=2020.03.1.0
  • scikit-learn >=0.22.2.post1
  • scipy <1.11
  • scipy >=1.9
  • sphinx >=3.1.2
  • sphinx-rtd-theme >=2.0.0
  • tensorboardX >=2.0
  • torch >=1.4.0
  • tqdm >=4.45.0
  • typed-argument-parser >=1.6.1
environment.yml pypi
  • typed-argument-parser >=1.6.1