https://github.com/aspuru-guzik-group/stereogeneration
Testing generation of molecules with stereoisomers.
Science Score: 39.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.2%) to scientific vocabulary
Repository
Testing generation of molecules with stereoisomers.
Basic Info
- Host: GitHub
- Owner: aspuru-guzik-group
- License: mit
- Language: Python
- Default Branch: main
- Size: 150 MB
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Stereogeneration
Studying the effects of including stereisomeric information in generative models for molecules in optimizing stereochemistry-sensitive properties. We perform optimization on (1) rediscovery of R-albuterol and mestranol, (2) protein-ligand docking, and a stereochemistry-specific (3) CD peak spectra score.
Preprint found on ChemRxiv: Stereochemistry-aware string-based molecular generation. Data files are found on Zenodo
Getting started
Initialize a python environment, here we use conda, and install the required packages. ```bash git clone git@github.com:aspuru-guzik-group/stereogeneration.git cd stereogeneration
conda create -n stereogeneration python=3.8 conda activate stereogeneration pip install -r requirements.txt ```
Use of XTB
XTB will be installed in the requirements.txt files. Otherwise, you can install from source from xtb from the Grimme Lab. You can also install using conda. Use the following environment variables:
bash
export MKL_NUM_THREADS=1
export OMP_NUM_THREADS=1,1
export OMP_STACKSIZE=4G
ulimit -s unlimited
CD spectra setup
Use of CD spectra task will require stda and xtb4stda from the Grimme Lab. The binary files are found in the stereogeneration/stda directory. The files will have to be made executable, and added to the $PATH variable:
```bash
cd stereogeneration/stda
chmod +x gspec stdav1.6.3 xtb4stda
set file paths which will be used by stda
export PATH=$PATH:$PWD export XTB4STDAHOME=$PWD ```
Docking setup
Docking requires executable of the smina binary:
bash
chmod +x stereogeneration/docking/smina.static
Running the models
Scripts (main.py) for running each model are found in the respective folders: reinvent, janus, group-janus. The scripts have commandline arguments that control the fitness function task, and some of the parameters of the models.
bash
python main.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol} \ # specify task
--stereo # turn on stereo-awareness
Analysis of results
The experiments were repeated 10 times for each model each task. The result files are found in Zenodo. The individual runs for each task are saved in folders {i}_stereo and {i}_nonstereo for $i \in {0,...,9}$. The figures and statistics were generated using the analysis_all.py, which also requires the zinc.csv file (available in Zenodo) to be located in the repo directory:
bash
python analysis_all.py \
--target={1SYH, 1OYT, 6Y2F, cd, fp-albuterol, fp-mestranol}
--root_dir='.' # where the dataset and `stereogeneration` import are found
--label='1SYH' # name for target property label (defaults to 1SYH)
--horizontal # toggles horizontal subplots, exclude for vertical subplots
Owner
- Name: Aspuru-Guzik group repo
- Login: aspuru-guzik-group
- Kind: organization
- Website: http://aspuru.chem.harvard.edu/
- Repositories: 30
- Profile: https://github.com/aspuru-guzik-group
GitHub Events
Total
- Watch event: 1
- Push event: 4
- Public event: 1
Last Year
- Watch event: 1
- Push event: 4
- Public event: 1
Dependencies
- Cython ==0.29.27
- Jinja2 ==3.0.3
- Markdown ==3.4.1
- MarkupSafe ==2.1.1
- Pebble ==5.0.6
- Pillow ==9.0.1
- Pint ==0.0.0
- PyNaCl ==1.5.0
- PyYAML ==6.0
- Pygments ==2.11.2
- Send2Trash ==1.8.0
- Werkzeug ==2.2.1
- absl_py ==1.2.0
- aiohttp ==3.8.1
- aiosignal ==1.2.0
- arff ==0.9
- argon2_cffi ==21.3.0
- argon2_cffi_bindings ==21.2.0
- ase ==3.22.1
- async_generator ==1.10
- async_timeout ==4.0.2
- attrs ==21.4.0
- backcall ==0.2.0
- backports.shutil_get_terminal_size ==1.0.0
- backports_abc ==0.5
- bcrypt ==3.2.0
- bitstring ==3.1.9
- bleach ==4.1.0
- cachetools ==5.2.0
- certifi ==2021.10.8
- cffi ==1.15.0
- chardet ==4.0.0
- charset_normalizer ==2.0.11
- cryptography ==36.0.1
- cycler ==0.11.0
- deap ==1.0.1
- debugpy ==1.5.1
- decorator ==5.1.1
- defusedxml ==0.7.1
- dill ==0.3.6
- dnspython ==2.2.0
- ecdsa ==0.17.0
- entrypoints ==0.4
- fire ==0.4.0
- fonttools ==4.29.1
- frozenlist ==1.3.0
- fsspec ==2022.7.1
- funcsigs ==1.0.2
- global-chem ==1.8
- google_auth ==2.9.1
- google_auth_oauthlib ==0.4.6
- grpcio ==1.47.0
- idna ==3.3
- importlib_metadata ==4.10.1
- importlib_resources ==5.4.0
- ipykernel ==6.0.3
- ipython ==7.31.1
- ipython_genutils ==0.2.0
- ipywidgets ==7.6.5
- jedi ==0.18.1
- joblib ==1.1.0
- jsonschema ==4.4.0
- jupyter-client ==6.1.12
- jupyter_core ==4.9.1
- jupyterlab_pygments ==0.1.2
- jupyterlab_widgets ==1.0.2
- kiwisolver ==1.3.2
- lockfile ==0.12.2
- mapchiral ==0.0.5
- matplotlib ==3.5.1
- matplotlib_inline ==0.1.3
- mistune ==0.8.4
- mock ==4.0.3
- morfeus-ml ==0.7.1
- mpmath ==1.2.1
- multidict ==6.0.2
- nbclient ==0.5.10
- nbconvert ==6.4.2
- nbformat ==5.1.3
- nest_asyncio ==1.5.4
- netaddr ==0.8.0
- netifaces ==0.11.0
- networkx ==3.0
- nose ==1.3.7
- notebook ==6.4.8
- numpy ==1.22.2
- oauthlib ==3.2.0
- openbabel ==3.1.1.1
- packaging ==21.3
- pandas ==1.4.0
- pandocfilters ==1.5.0
- paramiko ==2.9.2
- parso ==0.8.3
- path ==16.3.0
- path.py ==12.5.0
- pathlib2 ==2.3.6
- paycheck ==1.0.2
- pbr ==5.8.1
- pexpect ==4.8.0
- pickleshare ==0.7.5
- prometheus_client ==0.13.1
- prompt_toolkit ==3.0.26
- protobuf ==3.19.4
- ptyprocess ==0.7.0
- pyDeprecate ==0.3.2
- pyasn1 ==0.4.8
- pyasn1-modules ==0.2.8
- pycparser ==2.21
- pydantic ==1.9.1
- pyparsing ==3.0.7
- pyrsistent ==0.18.1
- python-dateutil ==2.8.2
- pytorch-lightning ==1.7.0
- pytz ==2021.3
- pyzmq ==22.3.0
- qcelemental ==0.24.0
- rdkit ==2022.3.5
- requests ==2.27.1
- requests_oauthlib ==1.3.1
- rsa ==4.9
- scikit_learn ==1.1.1
- scipy ==1.8.0
- seaborn ==0.11.2
- selfies ==2.1.1
- simplegeneric ==0.8.1
- singledispatch ==3.7.0
- six ==1.16.0
- sympy ==1.9
- tensorboard ==2.9.1
- tensorboard-data-server ==0.6.1
- tensorboard_plugin_wit ==1.8.1
- termcolor ==1.1.0
- terminado ==0.13.1
- testpath ==0.5.0
- threadpoolctl ==3.1.0
- torch ==1.10.0
- torchmetrics ==0.9.3
- tornado ==6.1
- tqdm ==4.64.0
- traitlets ==5.0.5
- typing_extensions ==4.0.1
- urllib3 ==1.26.8
- wcwidth ==0.2.5
- webencodings ==0.5.1
- widgetsnbextension ==3.5.2
- xtb-python ==20.1
- yarl ==1.7.2
- zipp ==3.7.0