https://github.com/aspuru-guzik-group/kreed

Code for Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance

https://github.com/aspuru-guzik-group/kreed

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 2 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
    Organization aspuru-guzik-group has institutional domain (aspuru.chem.harvard.edu)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (5.7%) to scientific vocabulary
Last synced: 6 months ago · JSON representation

Repository

Code for Reflection-Equivariant Diffusion for 3D Structure Determination from Isotopologue Rotational Spectra in Natural Abundance

Basic Info
  • Host: GitHub
  • Owner: aspuru-guzik-group
  • License: mit
  • Language: Jupyter Notebook
  • Default Branch: main
  • Homepage:
  • Size: 409 MB
Statistics
  • Stars: 5
  • Watchers: 4
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

kreed

KREED: Kraitchman REflection-Equivariant Diffusion

Paper and arXiv

Code for "Determining 3D structure from molecular formula and isotopologue rotational spectra in natural abundance with reflection-equivariant diffusion"

For a ready-to-use demonstration of the trained model, check out the Colab notebook

Training the model

Follow instructions in SETUP.md to setup the QM9 and GEOM datasets. Preprocessed datasets and generated samples can be found here

Setting up conda environment: conda env create -f environment.yml

Command for training QM9: python -m src.experimental.train --accelerator=gpu --devices=1 --num_workers=12 --dataset=qm9 --enable_wandb --wandb_run_id qm9_run --enable_progress_bar --check_samples_every_n_epoch 50

Command for training GEOM: python -m src.experimental.train --accelerator=gpu --devices=1 --num_workers=12 --dataset=geom --enable_wandb --wandb_run_id geom_run --enable_progress_bar --check_samples_every_n_epoch 1 --batch_size 32 --max_epochs=100 --lr=2e-4

Running the same command with the same runid will resume from the last checkpoint for that runid.

Evaluation

Setup for running baseline: python scripts/eval/make_baseline_jobs.py sbatch scripts/eval/run_baseline_jobs.py This prepares and then submits an array of GNU parallel jobs, each of which run jobs that look like: python -m src.experimental.evaluate_baseline --save_dir=where_to_save --split=test --enable_save_samples_and_examples --num_chunks=128 --chunk_id=0 --dataset=geom The dataset is evaluated in multiple chunks for parallelism. Each job makes checkpoints and can continue from preemption by running the same command.

Scripts for evaluating the diffusion model: - scripts/eval/eval_p0.sh - with all naturally abundant substitution coordinates - scripts/eval/eval_p10.sh - with 10% dropout of substitution coordinates (QM9-C, GEOM-C) - scripts/eval/eval_rot_only.sh - with no substitution coordinates, only moments

Owner

  • Name: Aspuru-Guzik group repo
  • Login: aspuru-guzik-group
  • Kind: organization

GitHub Events

Total
  • Watch event: 2
Last Year
  • Watch event: 2