Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.2%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: mcgillresearchgroup
  • License: mit
  • Language: Python
  • Default Branch: main
  • Size: 122 KB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme License Citation

README.md

EquiNet: Predicting Vapor-Liquid Equilibrium with Physics-Informed Neural Networks

EquiNet is a deep learning framework based on the Chemprop architecture, designed for predicting vapor-liquid equilibrium (VLE) properties of binary mixtures. It incorporates physicochemical constraints using physics-informed neural networks (PINNs) to enhance thermodynamic consistency and accuracy.


Installation & Setup Instructions

1. Install Anaconda

Download and install Anaconda.
If using a Mac with an M1/M2 chip, go to “Other Installers” to choose the correct architecture.

2. Clone This Repository

Use one of several available methods (GitHub documentation)to clone this repository. bash git clone https://github.com/mcgillresearchgroup/equinet.git

3. Create a New Conda Environment

If using Windows, open the Anaconda Prompt (not a regular terminal).

Navigate to the project directory:

bash cd path/to/equinet

Create the environment:

bash conda env create -f environment.yml If the solution is slow or hangs, you can use the mamba solver. This is the default on newer Anaconda installations, but has to be manually selected on older ones:

bash conda env create -f environment.yml --solver=libmamba Activate the environment:

bash conda activate chemprop Complete EquiNet setup locally:

```bash pip install -e .

`` Note the trailing.` in the command, it is important.

4. Dataset Preparation

The data needs to be split into two .csv files, a targets file and a features file.

For the Targets File (in training): - The targets file must contain columns in the following order: 'SMILE 1', 'SMILE 2', 'y1', 'y2', 'log10P', 'lngamma1', 'lngamma2', 'log10P1sat', 'log10P2sat'. - If targets are not needed for all these options, the number of columns can be truncated, but columns in the middle cannot be skipped. Typical training associated with the paper involves these columns: 'SMILE 1', 'SMILE 2', 'y1', 'y2', 'log10P'. - All targets are not needed for training. If individual targets are not known, they should be left blank in the csv. - SMILE 1 and SMILE 2 are the SMILES representations of the two components in the binary mixture. SMILE 1 and SMILE 2 should be valid RDKit-compliant SMILES strings. - y1 and y2 are the mole fractions of components 1 and 2, respectively, and must be in the range [0, 1]. They must sum to 1. - log10P is the logarithm (base 10) of the total pressure in Pascals (Pa). - lngamma1 and lngamma2 are the natural logarithm (base e) of the activity coefficients. - log10P1sat and log10P2sat are the logarithm (base 10) of the component vapor pressures and must be in Pascals (Pa).

For the Targets File (in prediction): - During prediction, the targets file must contain columns 'SMILE 1' and 'SMILE 2' in the first two columns. No other columns are necessary and will be ignored. - SMILE 1 and SMILE 2 are the SMILES representations of the two components in the binary mixture. SMILE 1 and SMILE 2 should be valid RDKit-compliant SMILES strings.

For the Features File (both training and prediction): - The features file must contain the following columns: 'x1', 'x2', 'T(K)', 'log10P1sat', 'log10P2sat' - Unlike the targets tile, none of these values can be left blank or columns omitted. - x1 and x2 are the mole fractions of components 1 and 2, respectively, and must be in the range [0, 1]. They must sum to 1. - T(K) is the temperature in Kelvin. - log10P1sat and log10P2sat are the base-10 logarithms of the pure component saturation pressures, also in Pascals (Pa). - If internal vapor pressure prediction is being used, then the provided values for log10P1sat and log10P2sat will not be referenced. They do still have to be provided and can be filled with nan as their value if desired.

Ensure both files are aligned row-wise and contain corresponding data points for training or prediction and are CSV files.

5. Running EquiNet

🧪 Training & Prediction on HPC (Bash Script Setup)

To run training and prediction on an HPC server with SLURM, a typical bash script looks like the following:

```bash datadir= \yourpath\to\data resultsdir=\yourpath\to\results chemprop_path=\yourpath\to\chemprop

python $chemproppath/train.py \ --datapath $datadir/targets.csv \ --featurespath $datadir/features.csv \ --datasettype regression \ --epochs 30 \ --savedir $resultsdir \ --splittype randombinarypairs \ --vle activity \ --vp antoine \ --binaryequivariant \ --selfactivitycorrection \ --configpath config.json \ --aggregation norm \ --savesmiles_splits

python $chemproppath/predict.py \ --testpath $resultsdir/fold0/testfull.csv \ --featurespath $resultsdir/fold0/testfeatures.csv \ --predspath $resultsdir/testpreds.csv \ --checkpointdir $resultsdir \ --numberofmolecules 2 \ --dropextracolumns

python $chemproppath/parameters.py \ --testpath $resultsdir/fold0/testfull.csv \ --featurespath $resultsdir/fold0/testfeatures.csv \ --predspath $resultsdir/testparams.csv \ --checkpointdir $resultsdir \ --numberofmolecules 2 \ --dropextracolumns ```

Switching Between Model Types

EquiNet supports multiple model types for VLE prediction via the --vle and --vp flags:

--vle sets the activity coefficient model. Options include: - basic – no thermodynamic constraints - activity – activity-based PINN model - nrtl – Non-Random Two-Liquid model - nrtl-wohl – NRTL with Wohl interaction form - wohl – full Wohl expansion (3rd–5th order depending on config)

--wohl_order – Wohl expansion with specified order (e.g., 3, 4, or 5) for the Wohl expansion, if Wohl or NRTL-Wohl methods are used.

--vp sets the vapor pressure prediction method: - Leave empty (omit --vp) → tabulated vapor pressure from features file is used - Set --vp antoine → model internally predicts vapor pressure using Antoine equation

Owner

  • Name: mcgillresearchgroup
  • Login: mcgillresearchgroup
  • Kind: organization

Citation (CITATIONS.bib)

# Original paper for the message passing algorithm used in Chemprop architecture.
# this was downloaded from ACS: https://pubs.acs.org/doi/10.1021/acs.jcim.9b00237
@article{chemprop_theory,
    author = {Yang, Kevin and Swanson, Kyle and Jin, Wengong and Coley, Connor and Eiden, Philipp and Gao, Hua and Guzman-Perez, Angel and Hopper, Timothy and Kelley, Brian and Mathea, Miriam and Palmer, Andrew and Settels, Volker and Jaakkola, Tommi and Jensen, Klavs and Barzilay, Regina},
    title = {Analyzing Learned Molecular Representations for Property Prediction},
    journal = {Journal of Chemical Information and Modeling},
    volume = {59},
    number = {8},
    pages = {3370-3388},
    year = {2019},
    doi = {10.1021/acs.jcim.9b00237},
        note ={PMID: 31361484},
    URL = { 
            https://doi.org/10.1021/acs.jcim.9b00237
    },
    eprint = { 
            https://doi.org/10.1021/acs.jcim.9b00237
    }
}

# Paper for Chemprop software.
# this was downloaded from ACS: https://pubs.acs.org/doi/10.1021/acs.jcim.3c01250
@article{chemprop_software,
    author = {Heid, Esther and Greenman, Kevin P. and Chung, Yunsie and Li, Shih-Cheng and Graff, David E. and Vermeire, Florence H. and Wu, Haoyang and Green, William H. and McGill, Charles J.},
    title = {Chemprop: A Machine Learning Package for Chemical Property Prediction},
    journal = {Journal of Chemical Information and Modeling},
    volume = {64},
    number = {1},
    pages = {9-17},
    year = {2024},
    doi = {10.1021/acs.jcim.3c01250},
        note ={PMID: 38147829},
    URL = { 
            https://doi.org/10.1021/acs.jcim.3c01250
    },
    eprint = {     
            https://doi.org/10.1021/acs.jcim.3c01250
    }
}

GitHub Events

Total
  • Watch event: 1
  • Push event: 2
  • Create event: 1
Last Year
  • Watch event: 1
  • Push event: 2
  • Create event: 1

Dependencies

environment.yml pypi
  • typed-argument-parser >=1.6.1
setup.py pypi
  • Werkzeug <3
  • flask >=1.1.2,<=2.1.3
  • hyperopt >=0.2.3
  • matplotlib >=3.1.3
  • numpy >=1.18.1
  • pandas >=1.0.3
  • pandas-flavor >=0.2.0
  • rdkit >=2020.03.1.0
  • scikit-learn >=0.22.2.post1
  • scipy >=1.9
  • scipy <1.11
  • sphinx >=3.1.2
  • tensorboardX >=2.0
  • torch >=1.4.0
  • tqdm >=4.45.0
  • typed-argument-parser >=1.6.1