https://github.com/aspuru-guzik-group/janus
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"
Science Score: 10.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.9%) to scientific vocabulary
Repository
Code for the paper "JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design"
Basic Info
Statistics
- Stars: 84
- Watchers: 8
- Forks: 15
- Open Issues: 1
- Releases: 0
Metadata Files
README.md
JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design
This repository contains code for the paper: JANUS: Parallel Tempered Genetic Algorithm Guided by Deep Neural Networks for Inverse Molecular Design.
Originally by: AkshatKumar Nigam, Robert Pollice, Alán Aspuru-Guzik
Updated by: Gary Tom

Prerequsites:
Use Python 3.7 or up.
You will need to separately install RDKit version >= 2020.03.1. The easiest is to do this on conda.
JANUS uses SELFIES version 1.0.3. If you want to use a different version, pip install your desired version; this package will still be compatible. Note that you will have to change your input alphabets to work with other versions of SELFIES.
Major changes:
- Support the use of any version of SELFIES (please check your installation).
- Improved multiprocessing. Fitness function is not parallelized, in the case that the function already spawns multiple processes.
- GPU acceleration of neural networks.
- Early stopping for classifier.
- Included SMILES filtering option.
- Additional hyperparameters for controlling JANUS. Defaults used in paper are given in
testsdirectory.
How to run:
Install JANUS using
bash
pip install janus-ga
Example script of how to use JANUS is found in tests/example.py:
```python from janus import JANUS, utils from rdkit import Chem, RDLogger from rdkit.Chem import AllChem, RDConfig, Descriptors RDLogger.DisableLog("rdApp.*")
import selfies
def fitness_function(smi: str) -> float: """ User-defined function that takes in individual smiles and outputs a fitness value. """ # logP fitness return Descriptors.MolLogP(Chem.MolFromSmiles(smi))
def custom_filter(smi: str): """ Function that takes in a smile and returns a boolean. True indicates the smiles PASSES the filter. """ # smiles length filter if len(smi) > 81 or len(smi) == 0: return False else: return True
all parameters to be set, below are defaults
params_dict = { # Number of iterations that JANUS runs for "generations": 200,
# The number of molecules for which fitness calculations are done,
# exploration and exploitation each have their own population
"generation_size": 5000,
# Number of molecules that are exchanged between the exploration and exploitation
"num_exchanges": 5,
# Callable filtering function (None defaults to no filtering)
"custom_filter": custom_filter,
# Fragments from starting population used to extend alphabet for mutations
"use_fragments": True,
# An option to use a classifier as selection bias
"use_classifier": True,
}
Set your SELFIES constraints (below used for manuscript)
defaultconstraints = selfies.getsemanticconstraints() newconstraints = defaultconstraints newconstraints['S'] = 2 newconstraints['P'] = 3 selfies.setsemanticconstraints(newconstraints) # update constraints
Create JANUS object.
agent = JANUS( workdir = 'RESULTS', # where the results are saved fitnessfunction = fitnessfunction, # user-defined fitness for given smiles startpopulation = "./DATA/samplestartsmiles.txt", # file with starting smiles population **params_dict )
Alternatively, you can get hyperparameters from a yaml file
Descriptions for all parameters are found in default_params.yml
paramsdict = utils.fromyaml(
workdir = 'RESULTS',
fitnessfunction = fitnessfunction,
startpopulation = "./DATA/samplestartsmiles.txt",
yamlfile = 'defaultparams.yml', # default yaml file with parameters
params_dict # overwrite yaml parameters with dictionary
)
agent = JANUS(params_dict)
Run according to parameters
agent.run() # RUN IT! ```
Within this file are examples for:
1. A function for calculting property values (see function fitness_function).
2. Custom filtering of SMILES (see function custom_filter).
3. Initializing JANUS from dictionary of parameters.
4. Generating hyperparameters from provided yaml file (see function janus.utils.from_yaml).
You can run the file with provided test files
bash
cd tests
python ./example.py
Important parameters the user should provide:
- work_dir: directory for outputting results
- fitness_function: fitness function defined for an input smiles that will be maximized
- start_population: path to text file of starting smiles one each new line
- generations: number if evolution iterations to perform
- generation_size: number of molecules in the populations per generation
- custom_filter: filter function checked after mutation and crossover, returns True for accepted molecules
- use_fragments: toggle adding fragments from starting population to mutation alphabet
- use_classifier: toggle using classifier for selection bias
See tests/default_params.yml for detailed description of adjustable parameters.
Outputs:
All results from running JANUS will be stored in specified work_dir.
The following files will be created:
1. fitnessexplore.txt:
Fitness values for all molecules from the exploration component of JANUS.
2. fitnesslocalsearch.txt:
Fitness values for all molecules from the exploitation component of JANUS.
3. generationallbest.txt:
Smiles and fitness value for the best molecule encountered in every generation (iteration).
4. initmols.txt:
List of molecules used to initialte JANUS.
5. populationexplore.txt:
SMILES for all molecules from the exploration component of JANUS.
6. populationlocal_search.txt:
SMILES for all molecules from the exploitation component of JANUS.
7. hparams.json:
Hyperparameters used for initializing JANUS.
Paper Results/Reproducibility:
Our code and results for each experiment in the paper can be found here: * Experiment 4.1: https://drive.google.com/file/d/1rscIyzpTvtyiEkoP1WsF-XtSHJGQStUU/view?usp=sharing * Experiment 4.3: https://drive.google.com/file/d/1tlIdfSWwzVeJ5kZ98l8G6osE9zf9wP1f/view?usp=sharing * GuacaMol: https://drive.google.com/file/d/1FqetwNg6VVc-C3eiPoosGZ4-47WpYBAt/view?usp=sharing
Questions, problems?
Make a github issue 😄. Please be as clear and descriptive as possible. Please feel free to reach out in person: (akshat[DOT]nigam[AT]mail[DOT]utoronto[DOT]ca, rob[DOT]pollice[AT]utoronto[DOT]ca)
License
Owner
- Name: Aspuru-Guzik group repo
- Login: aspuru-guzik-group
- Kind: organization
- Website: http://aspuru.chem.harvard.edu/
- Repositories: 30
- Profile: https://github.com/aspuru-guzik-group
GitHub Events
Total
- Issues event: 1
- Watch event: 7
- Pull request event: 1
- Fork event: 3
Last Year
- Issues event: 1
- Watch event: 7
- Pull request event: 1
- Fork event: 3