bio-diffusion
A geometry-complete diffusion generative model (GCDM) for 3D molecule generation and optimization. (Nature CommsChem)
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org, nature.com, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (9.0%) to scientific vocabulary
Keywords
Repository
A geometry-complete diffusion generative model (GCDM) for 3D molecule generation and optimization. (Nature CommsChem)
Basic Info
Statistics
- Stars: 202
- Watchers: 3
- Forks: 27
- Open Issues: 1
- Releases: 1
Topics
Metadata Files
README.md
Description
This is the official codebase of the paper
Geometry-Complete Diffusion for 3D Molecule Generation and Optimization, Nature CommsChem
Contents
- Bio-Diffusion
- Description
- Contents
- System requirements
- OS requirements
- Python dependencies
- Installation guide
- Demo
- Generate new unconditional 3D molecules (QM9)
- Generate new property-conditional 3D molecules (QM9)
- Generate new unconditional 3D molecules (GEOM-Drugs)
- Optimize 3D molecules for molecular stability and various molecular properties (QM9)
- Instructions for use
- How to train new models
- Train model with default configuration
- Train model with chosen experiment configuration from configs/experiment/
- Train a model for unconditional small molecule generation with the QM9 dataset (QM9)
- Train a model for property-conditional small molecule generation with the QM9 dataset (QM9)
- Train a model for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs)
- How to reproduce paper results
- Reproduce paper results for unconditional small molecule generation with the QM9 dataset (QM9 Unconditional: ~2 hrs)
- Reproduce paper results for property-conditional small molecule generation with the QM9 dataset (QM9 Conditional: ~12 hrs)
- Reproduce paper results for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs Unconditional: ~24 hrs)
- Reproduce paper results for property-specific small molecule optimization with the QM9 dataset (QM9 Guided: ~12 hrs)
- Reproduce paper results for protein-conditional small molecule generation with the Binding MOAD and CrossDocked datasets (Binding MOAD & CrossDocked: ~5 days)
- Docker
- Acknowledgements
- License
- Citation
System requirements
OS requirements
This package supports Linux. The package has been tested on the following Linux system:
Description: AlmaLinux release 8.9 (Midnight Oncilla)
Python dependencies
This package is developed and tested under Python 3.9.x. The primary Python packages and their versions are as follows. For more details, please refer to the environment.yaml file.
python
hydra-core=1.2.0
matplotlib-base=3.4.3
numpy=1.23.1
pyg=2.2.0=py39_torch_1.12.0_cu116
python=3.9.15
pytorch=1.12.1=py3.9_cuda11.6_cudnn8.3.2_0
pytorch-cluster=1.6.0=py39_torch_1.12.0_cu116
pytorch-scatter=2.1.0=py39_torch_1.12.0_cu116
pytorch-sparse=0.6.16=py39_torch_1.12.0_cu116
pytorch-lightning=1.7.7
scikit-learn=1.1.2
torchmetrics=0.10.2
Installation guide
Install mamba (~500 MB: ~1 minute)
bash
wget "https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-$(uname)-$(uname -m).sh"
bash Mambaforge-$(uname)-$(uname -m).sh # accept all terms and install to the default location
rm Mambaforge-$(uname)-$(uname -m).sh # (optionally) remove installer after using it
source ~/.bashrc # alternatively, one can restart their shell session to achieve the same result
Install dependencies (~15 GB: ~10 minutes)
```bash
clone project
git clone https://github.com/BioinfoMachineLearning/bio-diffusion cd bio-diffusion
create conda environment
mamba env create -f environment.yaml
conda activate bio-diffusion # note: one still needs to use conda to (de)activate environments
install local project as package
pip3 install -e . ```
Download data (~100 GB extracted: ~4 hours) ```bash
fetch, extract, and clean-up preprocessed data
wget https://zenodo.org/record/7881981/files/EDM.tar.gz tar -xzf EDM.tar.gz rm EDM.tar.gz ```
Download checkpoints (~5 GB extracted: ~5 minutes)
Note: Make sure to be located in the project's root directory beforehand (e.g., ~/bio-diffusion/)
```bash
fetch and extract model checkpoints directory
wget https://zenodo.org/record/13375913/files/GCDMCheckpoints.tar.gz
tar -xzf GCDMCheckpoints.tar.gz
rm GCDMCheckpoints.tar.gz
``
**Note**: EGNN molecular property prediction checkpoints are also included withinGCDMCheckpoints.tar.gz`, where three checkpoints per property were trained with random seeds (18 in total). Also included in this Zenodo model checkpoints record are trained GeoLDM (Xu et al. 2023) checkpoint files used to produce the benchmarking results in the accompanying GCDM manuscript.
Demo
Generate new unconditional 3D molecules (QM9)
Unconditionally generate small molecules similar to those contained within the QM9 dataset (~5 minutes)
bash
python3 src/mol_gen_sample.py datamodule=edm_qm9 model=qm9_mol_gen_ddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckpt_path="checkpoints/QM9/Unconditional/model_1_epoch_979-EMA.ckpt" num_samples=250 num_nodes=19 all_frags=true sanitize=false relax=false num_resamplings=1 jump_length=1 num_timesteps=1000 output_dir="./" seed=123
NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_sample.py --help to view an exhaustive list of available input arguments.
CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).
Generate new property-conditional 3D molecules (QM9)
Property-conditionally generate small molecules similar to those contained within the QM9 dataset (~10 minutes)
```bash
alpha
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" property=alpha iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123
gap
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" property=gap iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123
homo
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" property=homo iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123
lumo
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" property=lumo iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123
mu
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" property=mu iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123
Cv
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" property=Cv iterations=100 batchsize=100 sweeppropertyvalues=true numsweeps=10 output_dir="./" seed=123 ```
NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_eval_conditional_qm9.py --help to view an exhaustive list of available input arguments.
CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).
Generate new unconditional 3D molecules (GEOM-Drugs)
Unconditionally generate drug-size molecules similar to those contained within the GEOM-Drugs dataset (~15 minutes)
bash
python3 src/mol_gen_sample.py datamodule=edm_geom model=geom_mol_gen_ddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckpt_path="checkpoints/GEOM/Unconditional/36hq94x5_model_1_epoch_76-EMA.ckpt" num_samples=250 num_nodes=44 all_frags=true sanitize=false relax=false num_resamplings=1 jump_length=1 num_timesteps=1000 output_dir="./" seed=123
NOTE: Output .sdf files will be stored in the current working directory by default. Specify this using output_dir. Run python3 src/mol_gen_sample.py --help to view an exhaustive list of available input arguments.
CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).
Optimize 3D molecules for molecular stability and various molecular properties (QM9)
```bash
e.g., unconditionally generate a batch of samples to property-optimize
NOTE: alpha is listed here, but it will not be referenced for the (initial) unconditional molecule generation
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./molstooptimize/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=true usepregeneratedmolecules=false
optimize generated samples for specific molecular properties, where alpha is used in this example
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./molstooptimize/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregeneratedmolecules=true save_molecules=true ```
NOTE: Output .sdf files will be stored under ./outputs/. Run python3 src/mol_gen_eval_optimization_qm9.py --help to view an exhaustive list of available input arguments.
CONSIDER: Running bust MY_GENERATED_MOLS.sdf to determine which of the generated molecules are valid according to the PoseBusters software suite (~3 minutes).
Instructions for use
How to train new models
Train model with default configuration
```bash
train on CPU
python src/train.py trainer=cpu
train on GPU
python src/train.py trainer=gpu ```
Train model with chosen experiment configuration from configs/experiment/
bash
python src/train.py experiment=experiment_name.yaml
Train a model for unconditional small molecule generation with the QM9 dataset (QM9)
bash
python3 src/train.py experiment=qm9_mol_gen_ddpm.yaml
Train a model for property-conditional small molecule generation with the QM9 dataset (QM9)
```bash
choose a value for model.module_cfg.conditioning from the properties [alpha, gap, homo, lumo, mu, Cv]
python3 src/train.py experiment=qm9molgenconditionalddpm.yaml model.module_cfg.conditioning=[alpha] ```
Train a model for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs)
bash
python3 src/train.py experiment=geom_mol_gen_ddpm.yaml
Note: You can override any parameter from command line like this
bash
python src/train.py trainer.max_epochs=20 datamodule.dataloader_cfg.batch_size=64
How to reproduce paper results
Reproduce paper results for unconditional small molecule generation with the QM9 dataset (QM9 Unconditional: ~2 hrs)
```bash
note: trainer.devices=[0] selects the CUDA device available at index 0 - customize as needed using e.g., nvidia-smi
python3 src/molgeneval.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/QM9/Unconditional/gcdmmodel1/
... repeat 5 times in total ...
python3 src/molgeneval.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/QM9/Unconditional/gcdmmodel5/ ```
NOTE: Refer to src/analysis/inference_analysis.py and src/analysis/molecule_analysis.py to manually enter and analyze the unconditional results reported by the commands above. Also keep in mind that molecule_analysis.py, in contrast to the rest of the codebase, uses OpenBabel to infer bonds for the XYZ files saved by mol_gen_eval.py. This distinction for bond inference considerably impacts the performance of each method as measured by this script.
Reproduce paper results for property-conditional small molecule generation with the QM9 dataset (QM9 Conditional: ~12 hrs)
```bash
alpha (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_alpha_seed_$SEED", where SEED=[1, 64, 83])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseedN" property=alpha iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_alpha/
gap (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_gap_$SEED", where SEED=[1, 471, 43149])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassgapseedN" property=gap iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_gap/
homo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_homo_$SEED", where SEED=[1, 4, 14])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasshomoseedN" property=homo iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_homo/
lumo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_lumo_$SEED", where SEED=[1, 427, 745])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasslumoseedN" property=lumo iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_lumo/
mu (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_mu_$SEED", where SEED=[1, 39, 86])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassmuseedN" property=mu iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_mu/
Cv (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_Cv_$SEED", where SEED=[1, 8, 89])
python3 src/molgenevalconditionalqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false generatormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassCvseedN" property=Cv iterations=100 batchsize=100 savemolecules=True outputdir=output/QM9/Conditional/gcdmmodel1_Cv/ ```
NOTE: Refer to src/analysis/inference_analysis.py, src/analysis/molecule_analysis.py, and src/analysis/qm_analysis.py to manually enter and analyze the property-conditional results reported by the commands above.
Reproduce paper results for unconditional drug-size molecule generation with the GEOM-Drugs dataset (GEOM-Drugs Unconditional: ~24 hrs)
```bash python3 src/molgeneval.py datamodule=edmgeom model=geommolgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/GEOM/Unconditional/36hq94x5model1epoch76-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/GEOM/Unconditional/gcdmmodel_1/
... repeat 5 times in total ...
python3 src/molgeneval.py datamodule=edmgeom model=geommolgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] ckptpath="checkpoints/GEOM/Unconditional/36hq94x5model1epoch76-EMA.ckpt" datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false numsamples=10000 samplingbatchsize=100 numtestpasses=5 savemolecules=True outputdir=output/GEOM/Unconditional/gcdmmodel_5/ ```
NOTE: Refer to src/analysis/inference_analysis.py, src/analysis/molecule_analysis.py, src/analysis/qm_analysis.py, and src/analysis/bust_analysis.py to manually enter and analyze the unconditional results reported by the commands above.
Reproduce paper results for property-specific small molecule optimization with the QM9 dataset (QM9 Guided: ~12 hrs)
```bash
unconditionally generate a batch of samples to property-optimize
NOTE: alpha is listed here, but it will not be referenced for the (initial) unconditional molecule generation
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseed1" numsamples=1000 samplingoutputdir="./optimmols/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=true usepregenerated_molecules=false
optimize generated samples for specific molecular properties
alpha (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_alpha_seed_$SEED", where SEED=[1, 64, 83])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/alphamodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassalphaseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=alpha iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true
gap (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_gap_$SEED", where SEED=[1, 471, 43149])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/gapmodelepoch1659-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassgapseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=gap iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true
homo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_homo_$SEED", where SEED=[1, 4, 14])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/homomodelepoch1879-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasshomoseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=homo iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true
lumo (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_lumo_$SEED", where SEED=[1, 427, 745])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/lumomodelepoch1619-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclasslumoseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=lumo iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true
mu (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_mu_$SEED", where SEED=[1, 39, 86])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/mumodelepoch1859-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassmuseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=mu iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true
Cv (repeat for classifier_model_dir="checkpoints/QM9/Property_Classifiers/exp_class_Cv_$SEED", where SEED=[1, 8, 89])
python3 src/molgenevaloptimizationqm9.py datamodule=edmqm9 model=qm9molgenddpm logger=csv trainer.accelerator=gpu trainer.devices=[0] datamodule.dataloadercfg.numworkers=1 model.diffusioncfg.sampleduringtraining=false unconditionalgeneratormodelfilepath="checkpoints/QM9/Unconditional/model1epoch979-EMA.ckpt" conditionalgeneratormodelfilepath="checkpoints/QM9/Conditional/Cvmodelepoch1539-EMA.ckpt" classifiermodeldir="checkpoints/QM9/PropertyClassifiers/expclassCvseedN" numsamples=1000 samplingoutputdir="./optimmols/" property=Cv iterations=10 numoptimizationtimesteps=100 returnframes=1 generatemoleculesonly=false usepregenerated_molecules=true ```
NOTE: Refer to src/analysis/optimization_analysis.py to manually enter and plot the optimization results reported by the commands above.
Reproduce paper results for protein-conditional small molecule generation with the Binding MOAD and CrossDocked datasets (Binding MOAD & CrossDocked: ~5 days)
Please refer to the following dedicated GitHub repository for further details: https://github.com/BioinfoMachineLearning/GCDM-SBDD.
Docker
To run this project in a Docker container, you can use the following commands:
```bash
Build the image
docker build -t bio-diffusion .
Run the container (with GPUs and mounting the current directory)
docker run -it --gpus all -v .:/mnt --name bio-diffusion bio-diffusion
``
__Note:__ You will still need to download the checkpoints and data as described in the installation guide. Then, update the Python commands to point to the desired local location of your files (e.g.,/mnt/checkpointsand/mnt/outputs`) once in the container.
Acknowledgements
Bio-Diffusion builds upon the source code and data from the following projects:
- ClofNet
- DiffSBDD
- e3diffusionfor_molecules
- GBPNet
- GCPNet
- gvp-pytorch
- lightning-hydra-template
- PoseBusters
We thank all their contributors and maintainers!
License
This project is covered under the MIT License.
Citation
If you use the code or data associated with this package or otherwise find this work useful, please cite:
bibtex
@article{morehead2024geometry,
title={Geometry-complete diffusion for 3D molecule generation and optimization},
author={Morehead, Alex and Cheng, Jianlin},
journal={Communications Chemistry},
volume={7},
number={1},
pages={150},
year={2024},
publisher={Nature Publishing Group UK London}
}
Owner
- Name: BioinfoMachineLearning
- Login: BioinfoMachineLearning
- Kind: organization
- Repositories: 29
- Profile: https://github.com/BioinfoMachineLearning
Citation (citation.bib)
@article{morehead2024geometry,
title={Geometry-complete diffusion for 3D molecule generation and optimization},
author={Morehead, Alex and Cheng, Jianlin},
journal={Communications Chemistry},
volume={7},
number={1},
pages={150},
year={2024},
publisher={Nature Publishing Group UK London}
}
GitHub Events
Total
- Issues event: 7
- Watch event: 36
- Issue comment event: 12
- Push event: 1
- Fork event: 4
Last Year
- Issues event: 7
- Watch event: 36
- Issue comment event: 12
- Push event: 1
- Fork event: 4
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 4
- Total pull requests: 1
- Average time to close issues: 1 day
- Average time to close pull requests: 6 months
- Total issue authors: 4
- Total pull request authors: 1
- Average comments per issue: 0.75
- Average comments per pull request: 2.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 4
- Pull requests: 1
- Average time to close issues: 1 day
- Average time to close pull requests: 6 months
- Issue authors: 4
- Pull request authors: 1
- Average comments per issue: 0.75
- Average comments per pull request: 2.0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- 18hfliu (4)
- charlotte0104 (2)
- cengc13 (1)
- chengfengke (1)
- Daisuke239 (1)
- lfs119 (1)
Pull Request Authors
- colbyford (2)
- hotwa (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 3
- Total downloads: unknown
-
Total dependent packages: 0
(may contain duplicates) -
Total dependent repositories: 0
(may contain duplicates) - Total versions: 3
proxy.golang.org: github.com/BioinfoMachineLearning/Bio-Diffusion
- Documentation: https://pkg.go.dev/github.com/BioinfoMachineLearning/Bio-Diffusion#section-documentation
- License: other
-
Latest release: v0.0.1
published over 1 year ago
Rankings
proxy.golang.org: github.com/bioinfomachinelearning/bio-diffusion
- Documentation: https://pkg.go.dev/github.com/bioinfomachinelearning/bio-diffusion#section-documentation
- License: other
-
Latest release: v0.0.1
published over 1 year ago
Rankings
proxy.golang.org: github.com/BioinfoMachineLearning/bio-diffusion
- Documentation: https://pkg.go.dev/github.com/BioinfoMachineLearning/bio-diffusion#section-documentation
- License: other
-
Latest release: v0.0.1
published over 1 year ago