https://github.com/aspuru-guzik-group/janus

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

https://github.com/aspuru-guzik-group/janus

Science Score: 10.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"

Basic Info
  • Host: GitHub
  • Owner: aspuru-guzik-group
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 17 MB
Statistics
  • Stars: 84
  • Watchers: 8
  • Forks: 15
  • Open Issues: 1
  • Releases: 0
Created almost 5 years ago · Last pushed almost 4 years ago
Metadata Files
Readme License

README.md

JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design

This repository contains code for the paper: JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design.

Originally by: AkshatKumar Nigam, Robert Pollice, Alán Aspuru-Guzik

Updated by: Gary Tom

Prerequsites:

Use Python 3.7 or up.

You will need to separately install RDKit version >= 2020.03.1. The easiest is to do this on conda.

JANUS uses SELFIES version 1.0.3. If you want to use a different version, pip install your desired version; this package will still be compatible. Note that you will have to change your input alphabets to work with other versions of SELFIES.

Major changes:

  • Support the use of any version of SELFIES (please check your installation).
  • Improved multiprocessing. Fitness function is not parallelized, in the case that the function already spawns multiple processes.
  • GPU acceleration of neural networks.
  • Early stopping for classifier.
  • Included SMILES filtering option.
  • Additional hyperparameters for controlling JANUS. Defaults used in paper are given in tests directory.

How to run:

Install JANUS using

bash pip install janus-ga

Example script of how to use JANUS is found in tests/example.py:

```python from janus import JANUS, utils from rdkit import Chem, RDLogger from rdkit.Chem import AllChem, RDConfig, Descriptors RDLogger.DisableLog("rdApp.*")

import selfies

def fitness_function(smi: str) -> float: """ User-defined function that takes in individual smiles and outputs a fitness value. """ # logP fitness return Descriptors.MolLogP(Chem.MolFromSmiles(smi))

def custom_filter(smi: str): """ Function that takes in a smile and returns a boolean. True indicates the smiles PASSES the filter. """ # smiles length filter if len(smi) > 81 or len(smi) == 0: return False else: return True

all parameters to be set, below are defaults

params_dict = { # Number of iterations that JANUS runs for "generations": 200,

# The number of molecules for which fitness calculations are done, 
# exploration and exploitation each have their own population
"generation_size": 5000,

# Number of molecules that are exchanged between the exploration and exploitation
"num_exchanges": 5,

# Callable filtering function (None defaults to no filtering)
"custom_filter": custom_filter,

# Fragments from starting population used to extend alphabet for mutations
"use_fragments": True,

# An option to use a classifier as selection bias
"use_classifier": True,

}

Set your SELFIES constraints (below used for manuscript)

defaultconstraints = selfies.getsemanticconstraints() newconstraints = defaultconstraints newconstraints['S'] = 2 newconstraints['P'] = 3 selfies.setsemanticconstraints(newconstraints) # update constraints

Create JANUS object.

agent = JANUS( workdir = 'RESULTS', # where the results are saved fitnessfunction = fitnessfunction, # user-defined fitness for given smiles startpopulation = "./DATA/samplestartsmiles.txt", # file with starting smiles population **params_dict )

Alternatively, you can get hyperparameters from a yaml file

Descriptions for all parameters are found in default_params.yml

paramsdict = utils.fromyaml( workdir = 'RESULTS',
fitness
function = fitnessfunction, startpopulation = "./DATA/samplestartsmiles.txt", yamlfile = 'defaultparams.yml', # default yaml file with parameters params_dict # overwrite yaml parameters with dictionary ) agent = JANUS(params_dict)

Run according to parameters

agent.run() # RUN IT! ```

Within this file are examples for: 1. A function for calculting property values (see function fitness_function). 2. Custom filtering of SMILES (see function custom_filter). 3. Initializing JANUS from dictionary of parameters. 4. Generating hyperparameters from provided yaml file (see function janus.utils.from_yaml).

You can run the file with provided test files

bash cd tests python ./example.py

Important parameters the user should provide: - work_dir: directory for outputting results - fitness_function: fitness function defined for an input smiles that will be maximized - start_population: path to text file of starting smiles one each new line - generations: number if evolution iterations to perform - generation_size: number of molecules in the populations per generation - custom_filter: filter function checked after mutation and crossover, returns True for accepted molecules - use_fragments: toggle adding fragments from starting population to mutation alphabet - use_classifier: toggle using classifier for selection bias

See tests/default_params.yml for detailed description of adjustable parameters.

Outputs:

All results from running JANUS will be stored in specified work_dir.

The following files will be created: 1. fitnessexplore.txt: Fitness values for all molecules from the exploration component of JANUS.
2. fitness
localsearch.txt: Fitness values for all molecules from the exploitation component of JANUS. 3. generationallbest.txt: Smiles and fitness value for the best molecule encountered in every generation (iteration). 4. initmols.txt: List of molecules used to initialte JANUS. 5. populationexplore.txt: SMILES for all molecules from the exploration component of JANUS. 6. populationlocal_search.txt: SMILES for all molecules from the exploitation component of JANUS. 7. hparams.json: Hyperparameters used for initializing JANUS.

Paper Results/Reproducibility:

Our code and results for each experiment in the paper can be found here: * Experiment 4.1: https://drive.google.com/file/d/1rscIyzpTvtyiEkoP1WsF-XtSHJGQStUU/view?usp=sharing * Experiment 4.3: https://drive.google.com/file/d/1tlIdfSWwzVeJ5kZ98l8G6osE9zf9wP1f/view?usp=sharing * GuacaMol: https://drive.google.com/file/d/1FqetwNg6VVc-C3eiPoosGZ4-47WpYBAt/view?usp=sharing

Questions, problems?

Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat[DOT]nigam[AT]mail[DOT]utoronto[DOT]ca, rob[DOT]pollice[AT]utoronto[DOT]ca)

License

Apache License 2.0

Owner

  • Name: Aspuru-Guzik group repo
  • Login: aspuru-guzik-group
  • Kind: organization

GitHub Events

Total
  • Issues event: 1
  • Watch event: 7
  • Pull request event: 1
  • Fork event: 3
Last Year
  • Issues event: 1
  • Watch event: 7
  • Pull request event: 1
  • Fork event: 3