bio-diffusion

A geometry-complete diffusion generative model (GCDM) for 3D molecule generation and optimization. (Nature CommsChem)

https://github.com/bioinfomachinelearning/bio-diffusion

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 6 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, nature.com, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.0%) to scientific vocabulary

Keywords

computational-biology computational-chemistry deep-learning generative-model graph-neural-networks machine-learning
Last synced: 6 months ago · JSON representation ·

Repository

A geometry-complete diffusion generative model (GCDM) for 3D molecule generation and optimization. (Nature CommsChem)

Basic Info
  • Host: GitHub
  • Owner: BioinfoMachineLearning
  • License: other
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 42.8 MB
Statistics
  • Stars: 202
  • Watchers: 3
  • Forks: 27
  • Open Issues: 1
  • Releases: 1
Topics
computational-biology computational-chemistry deep-learning generative-model graph-neural-networks machine-learning
Created about 3 years ago · Last pushed 9 months ago
Metadata Files
Readme License Citation

README.md

# Bio-Diffusion PyTorch Lightning Config: Hydra [![Paper](http://img.shields.io/badge/arXiv-2302.04313-B31B1B.svg)](https://arxiv.org/abs/2302.04313) [![Datasets DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.7881981.svg)](https://doi.org/10.5281/zenodo.7881981) [![Checkpoints DOI](https://zenodo.org/badge/DOI/10.5281/zenodo.13375913.svg)](https://doi.org/10.5281/zenodo.13375913) ![Bio-Diffusion.png](./img/Bio-Diffusion.png)

Description

This is the official codebase of the paper

Geometry-Complete Diffusion for 3D Molecule Generation and Optimization, Nature CommsChem

[arXiv] [Nature CommsChem]

![Animation of diffusion model-generated 3D molecules visualized successively](img/GCDM_Sampled_Molecule_Trajectory.gif)

Contents

System requirements

OS requirements

This package supports Linux. The package has been tested on the following Linux system: Description: AlmaLinux release 8.9 (Midnight Oncilla)

Python dependencies

This package is developed and tested under Python 3.9.x. The primary Python packages and their versions are as follows. For more details, please refer to the environment.yaml file. python hydra-core=1.2.0 matplotlib-base=3.4.3 numpy=1.23.1 pyg=2.2.0=py39_torch_1.12.0_cu116 python=3.9.15 pytorch=1.12.1=py3.9_cuda11.6_cudnn8.3.2_0 pytorch-cluster=1.6.0=py39_torch_1.12.0_cu116 pytorch-scatter=2.1.0=py39_torch_1.12.0_cu116 pytorch-sparse=0.6.16=py39_torch_1.12.0_cu116 pytorch-lightning=1.7.7 scikit-learn=1.1.2 torchmetrics=0.10.2

Installation guide

Install mamba (~500 MB: ~1 minute)

bash wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh" bash Mambaforge-$(uname)-$(uname -m).sh # accept all terms and install to the default location rm Mambaforge-$(uname)-$(uname -m).sh # (optionally) remove installer after using it source ~/.bashrc # alternatively, one can restart their shell session to achieve the same result

Install dependencies (~15 GB: ~10 minutes)

```bash

clone project

git clone https://github.com/BioinfoMachineLearning/bio-diffusion cd bio-diffusion

create conda environment

mamba env create -f environment.yaml conda activate bio-diffusion # note: one still needs to use conda to (de)activate environments

install local project as package

pip3 install -e . ```

Download data (~100 GB extracted: ~4 hours) ```bash

fetch, extract, and clean-up preprocessed data

wget https://zenodo.org/record/7881981/files/EDM.tar.gz tar -xzf EDM.tar.gz rm EDM.tar.gz ```

Download checkpoints (~5 GB extracted: ~5 minutes)

Note: Make sure to be located in the project's root directory beforehand (e.g., ~/bio-diffusion/) ```bash

fetch and extract model checkpoints directory

wget https://zenodo.org/record/13375913/files/GCDMCheckpoints.tar.gz tar -xzf GCDMCheckpoints.tar.gz rm GCDMCheckpoints.tar.gz `` **Note**: EGNN molecular property prediction checkpoints are also included withinGCDMCheckpoints.tar.gz`, where three checkpoints per property were trained with random seeds (18 in total). Also included in this Zenodo model checkpoints record are trained GeoLDM (Xu et al. 2023) checkpoint files used to produce the benchmarking results in the accompanying GCDM manuscript.

Demo

Generate new unconditional 3D molecules (QM9)

Unconditionally generate small molecules similar to those contained within the QM9 dataset (~5 minutes)

bash python3 src/mol_gen_sample.py datamodule=edm_qm9 model=qm9_mol_gen_ddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckpt_path="checkpoints/QM9/Unconditional/model_1_epoch_979-EMA.ckpt" num_samples=250 num_nodes=19 all_frags=true sanitize=false relax=false num_resamplings=1 jump_length=1 num_timesteps=1000 output_dir="./" seed=123

NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_sample.py --help to view an exhaustive list of available input arguments.

CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).

Generate new property-conditional 3D molecules (QM9)

Property-conditionally generate small molecules similar to those contained within the QM9 dataset (~10 minutes)

```bash

alpha

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" property=alpha iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123

gap

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" property=gap iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123

homo

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" property=homo iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123

lumo

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" property=lumo iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123

mu

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" property=mu iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123

Cv

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" property=Cv iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123 ```

NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_eval_conditional_qm9.py --help to view an exhaustive list of available input arguments.

CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).

Generate new unconditional 3D molecules (GEOM-Drugs)

Unconditionally generate drug-size molecules similar to those contained within the GEOM-Drugs dataset (~15 minutes)

bash python3 src/mol_gen_sample.py datamodule=edm_geom model=geom_mol_gen_ddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckpt_path="checkpoints/GEOM/Unconditional/36hq94x5_model_1_epoch_76-EMA.ckpt" num_samples=250 num_nodes=44 all_frags=true sanitize=false relax=false num_resamplings=1 jump_length=1 num_timesteps=1000 output_dir="./" seed=123

NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_sample.py --help to view an exhaustive list of available input arguments.

CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).

Optimize 3D molecules for molecular stability and various molecular properties (QM9)

```bash

e.g., unconditionally generate a batch of samples to property-optimize

NOTE: alpha is listed here, but it will not be referenced for the (initial) unconditional molecule generation

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./molstooptimize/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=true usepregeneratedmolecules=false

optimize generated samples for specific molecular properties, where alpha is used in this example

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./molstooptimize/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregeneratedmolecules=true save_molecules=true ```

NOTE: Output .sdf files will be stored under ./outputs/. Run python3 src/mol_gen_eval_optimization_qm9.py --help to view an exhaustive list of available input arguments.

CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).

Instructions for use

How to train new models

Train model with default configuration

```bash

train on CPU

python src/train.py trainer=cpu

train on GPU

python src/train.py trainer=gpu ```

Train model with chosen experiment configuration from configs/experiment/

bash python src/train.py experiment=experiment_name.yaml

Train a model for unconditional small molecule generation with the QM9 dataset (QM9)

bash python3 src/train.py experiment=qm9_mol_gen_ddpm.yaml

Train a model for property-conditional small molecule generation with the QM9 dataset (QM9)

```bash

choose a value for model.module_cfg.conditioning from the properties [alpha, gap, homo, lumo, mu, Cv]

python3 src/train.py experiment=qm9molgenconditionalddpm.yaml model.module_cfg.conditioning=[alpha] ```

Train a model for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs)

bash python3 src/train.py experiment=geom_mol_gen_ddpm.yaml

Note: You can override any parameter from command line like this

bash python src/train.py trainer.max_epochs=20 datamodule.dataloader_cfg.batch_size=64

How to reproduce paper results

Reproduce paper results for unconditional small molecule generation with the QM9 dataset (QM9 Unconditional: ~2 hrs)

```bash

note: trainer.devices=[0] selects the CUDA device available at index 0 - customize as needed using e.g., nvidia-smi

python3 src/molgeneval.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/QM9/Unconditional/gcdmmodel1/

... repeat 5 times in total ...

python3 src/molgeneval.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/QM9/Unconditional/gcdmmodel5/ ```

NOTE: Refer to src/analysis/inference_analysis.py and src/analysis/molecule_analysis.py to manually enter and analyze the unconditional results reported by the commands above. Also keep in mind that molecule_analysis.py, in contrast to the rest of the codebase, uses OpenBabel to infer bonds for the XYZ files saved by mol_gen_eval.py. This distinction for bond inference considerably impacts the performance of each method as measured by this script.

Reproduce paper results for property-conditional small molecule generation with the QM9 dataset (QM9 Conditional: ~12 hrs)

```bash

alpha (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_alpha_seed_$SEED", where SEED=[1, 64, 83])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseedN" property=alpha iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_alpha/

gap (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_gap_$SEED", where SEED=[1, 471, 43149])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassgapseedN" property=gap iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_gap/

homo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_homo_$SEED", where SEED=[1, 4, 14])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasshomoseedN" property=homo iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_homo/

lumo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_lumo_$SEED", where SEED=[1, 427, 745])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasslumoseedN" property=lumo iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_lumo/

mu (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_mu_$SEED", where SEED=[1, 39, 86])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassmuseedN" property=mu iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_mu/

Cv (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_Cv_$SEED", where SEED=[1, 8, 89])

python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassCvseedN" property=Cv iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_Cv/ ```

NOTE: Refer to src/analysis/inference_analysis.py, src/analysis/molecule_analysis.py, and src/analysis/qm_analysis.py to manually enter and analyze the property-conditional results reported by the commands above.

Reproduce paper results for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs Unconditional: ~24 hrs)

```bash python3 src/molgeneval.py datamodule=edmgeom model=geommolgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/GEOM/Unconditional/36hq94x5model1epoch76-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/GEOM/Unconditional/gcdmmodel_1/

... repeat 5 times in total ...

python3 src/molgeneval.py datamodule=edmgeom model=geommolgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/GEOM/Unconditional/36hq94x5model1epoch76-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/GEOM/Unconditional/gcdmmodel_5/ ```

NOTE: Refer to src/analysis/inference_analysis.py, src/analysis/molecule_analysis.py, src/analysis/qm_analysis.py, and src/analysis/bust_analysis.py to manually enter and analyze the unconditional results reported by the commands above.

Reproduce paper results for property-specific small molecule optimization with the QM9 dataset (QM9 Guided: ~12 hrs)

```bash

unconditionally generate a batch of samples to property-optimize

NOTE: alpha is listed here, but it will not be referenced for the (initial) unconditional molecule generation

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./optimmols/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=true usepregenerated_molecules=false

optimize generated samples for specific molecular properties

alpha (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_alpha_seed_$SEED", where SEED=[1, 64, 83])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true

gap (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_gap_$SEED", where SEED=[1, 471, 43149])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassgapseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=gap iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true

homo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_homo_$SEED", where SEED=[1, 4, 14])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasshomoseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=homo iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true

lumo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_lumo_$SEED", where SEED=[1, 427, 745])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasslumoseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=lumo iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true

mu (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_mu_$SEED", where SEED=[1, 39, 86])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassmuseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=mu iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true

Cv (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_Cv_$SEED", where SEED=[1, 8, 89])

python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassCvseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=Cv iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true ```

NOTE: Refer to src/analysis/optimization_analysis.py to manually enter and plot the optimization results reported by the commands above.

Reproduce paper results for protein-conditional small molecule generation with the Binding MOAD and CrossDocked datasets (Binding MOAD & CrossDocked: ~5 days)

Please refer to the following dedicated GitHub repository for further details: https://github.com/BioinfoMachineLearning/GCDM-SBDD.

Docker

To run this project in a Docker container, you can use the following commands:

```bash

Build the image

docker build -t bio-diffusion .

Run the container (with GPUs and mounting the current directory)

docker run -it --gpus all -v .:/mnt --name bio-diffusion bio-diffusion `` __Note:__ You will still need to download the checkpoints and data as described in the installation guide. Then, update the Python commands to point to the desired local location of your files (e.g.,/mnt/checkpointsand/mnt/outputs`) once in the container.

Acknowledgements

Bio-Diffusion builds upon the source code and data from the following projects:

We thank all their contributors and maintainers!

License

This project is covered under the MIT License.

Citation

If you use the code or data associated with this package or otherwise find this work useful, please cite:

bibtex @article{morehead2024geometry, title={Geometry-complete diffusion for 3D molecule generation and optimization}, author={Morehead, Alex and Cheng, Jianlin}, journal={Communications Chemistry}, volume={7}, number={1}, pages={150}, year={2024}, publisher={Nature Publishing Group UK London} }

Owner

  • Name: BioinfoMachineLearning
  • Login: BioinfoMachineLearning
  • Kind: organization

Citation (citation.bib)

@article{morehead2024geometry,
  title={Geometry-complete diffusion for 3D molecule generation and optimization},
  author={Morehead, Alex and Cheng, Jianlin},
  journal={Communications Chemistry},
  volume={7},
  number={1},
  pages={150},
  year={2024},
  publisher={Nature Publishing Group UK London}
}

GitHub Events

Total
  • Issues event: 7
  • Watch event: 36
  • Issue comment event: 12
  • Push event: 1
  • Fork event: 4
Last Year
  • Issues event: 7
  • Watch event: 36
  • Issue comment event: 12
  • Push event: 1
  • Fork event: 4

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 4
  • Total pull requests: 1
  • Average time to close issues: 1 day
  • Average time to close pull requests: 6 months
  • Total issue authors: 4
  • Total pull request authors: 1
  • Average comments per issue: 0.75
  • Average comments per pull request: 2.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 4
  • Pull requests: 1
  • Average time to close issues: 1 day
  • Average time to close pull requests: 6 months
  • Issue authors: 4
  • Pull request authors: 1
  • Average comments per issue: 0.75
  • Average comments per pull request: 2.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • 18hfliu (4)
  • charlotte0104 (2)
  • cengc13 (1)
  • chengfengke (1)
  • Daisuke239 (1)
  • lfs119 (1)
Pull Request Authors
  • colbyford (2)
  • hotwa (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 3
  • Total downloads: unknown
  • Total dependent packages: 0
    (may contain duplicates)
  • Total dependent repositories: 0
    (may contain duplicates)
  • Total versions: 3
proxy.golang.org: github.com/BioinfoMachineLearning/Bio-Diffusion
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 5.5%
Average: 5.7%
Dependent repos count: 5.9%
Last synced: 6 months ago
proxy.golang.org: github.com/bioinfomachinelearning/bio-diffusion
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.9%
Last synced: 6 months ago
proxy.golang.org: github.com/BioinfoMachineLearning/bio-diffusion
  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
Rankings
Dependent packages count: 6.4%
Average: 6.6%
Dependent repos count: 6.9%
Last synced: 6 months ago