https://github.com/aspuru-guzik-group/kreed
Code for Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
✓Institutional organization owner
Organization aspuru-guzik-group has institutional domain (aspuru.chem.harvard.edu) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (5.7%) to scientific vocabulary
Repository
Code for Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance
Basic Info
Statistics
- Stars: 5
- Watchers: 4
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
kreed
KREED: Kraitchman REflection-Equivariant Diffusion
Code for "Determining 3D structure from molecular formula and isotopologue rotational spectra in natural abundance with reflection-equivariant diffusion"
For a ready-to-use demonstration of the trained model, check out the Colab notebook
Training the model
Follow instructions in SETUP.md to setup the QM9 and GEOM datasets. Preprocessed datasets and generated samples can be found here
Setting up conda environment:
conda env create -f environment.yml
Command for training QM9:
python -m src.experimental.train --accelerator=gpu --devices=1 --num_workers=12 --dataset=qm9 --enable_wandb --wandb_run_id qm9_run --enable_progress_bar --check_samples_every_n_epoch 50
Command for training GEOM:
python -m src.experimental.train --accelerator=gpu --devices=1 --num_workers=12 --dataset=geom --enable_wandb --wandb_run_id geom_run --enable_progress_bar --check_samples_every_n_epoch 1 --batch_size 32 --max_epochs=100 --lr=2e-4
Running the same command with the same runid will resume from the last checkpoint for that runid.
Evaluation
Setup for running baseline:
python scripts/eval/make_baseline_jobs.py
sbatch scripts/eval/run_baseline_jobs.py
This prepares and then submits an array of GNU parallel jobs, each of which run jobs that look like:
python -m src.experimental.evaluate_baseline --save_dir=where_to_save --split=test --enable_save_samples_and_examples --num_chunks=128 --chunk_id=0 --dataset=geom
The dataset is evaluated in multiple chunks for parallelism. Each job makes checkpoints and can continue from preemption by running the same command.
Scripts for evaluating the diffusion model:
- scripts/eval/eval_p0.sh - with all naturally abundant substitution coordinates
- scripts/eval/eval_p10.sh - with 10% dropout of substitution coordinates (QM9-C, GEOM-C)
- scripts/eval/eval_rot_only.sh - with no substitution coordinates, only moments
Owner
- Name: Aspuru-Guzik group repo
- Login: aspuru-guzik-group
- Kind: organization
- Website: http://aspuru.chem.harvard.edu/
- Repositories: 30
- Profile: https://github.com/aspuru-guzik-group
GitHub Events
Total
- Watch event: 2
Last Year
- Watch event: 2