https://github.com/atomicarchitects/dens

[TMLR 2024] Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields

Keywords

computational-chemistry denoising-autoencoders drug-discovery e3nn equivariant-graph-neural-network force-fields geometric-deep-learning graph-neural-networks molecular-dynamics self-supervised-learning

Last synced: 5 months ago · JSON representation

Repository

[TMLR 2024] Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields

Basic Info

Host: GitHub
Owner: atomicarchitects
License: mit
Language: Python
Default Branch: main
Homepage: https://arxiv.org/abs/2403.09549
Size: 11.4 MB

Statistics

Stars: 31
Watchers: 4
Forks: 2
Open Issues: 0
Releases: 0

Topics

computational-chemistry denoising-autoencoders drug-discovery e3nn equivariant-graph-neural-network force-fields geometric-deep-learning graph-neural-networks molecular-dynamics self-supervised-learning

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme License

README.md

Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields

Paper | OpenReview

This repository contains the official PyTorch implementation of the work "Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields" (TMLR 2024). We show that force encoding enables generalizing denoising to non-equilibrium structures and propose to use DeNS (Denoising Non-Equilibrium Structures) as an auxiliary task to improve the performance on energy and force predictions.

We provide the code for training EquiformerV2 with DeNS on OC20 and OC22 datasets here and training Equiformer with DeNS on MD17 in this repository.

photo not available

As demonstrated in OMat24 paper, EquiformerV2 + DeNS achieves state-of-the-art results on Matbench Discovery leaderboard as of October 18, 2024.

photo not available

Environment Setup

Environment

See here for setting up the environment.

OC20

Please first set up the environment and file structures (placing this repository under ocp and rename it to experimental) following the above Environment section.

The OC20 S2EF dataset can be downloaded by following instructions in their GitHub repository.

For example, we can download the OC20 S2EF-2M dataset by running: cd ocp python scripts/download_data.py --task s2ef --split "2M" --num-workers 8 --ref-energy We also need to download the "val_id" data split to run training.

After downloading, the datasets should be under ocp/data.

To train on different splits like All and All+MD, we can follow the same link above to download the datasets.

OC22

Please first set up the environment and file structures (placing this repository under ocp and rename it to experimental) following the above Environment section.

Similar to OC20, the OC22 dataset can be downloaded by following instructions in their GitHub repository.

MD17

Please refer to this repository for training Equiformer with DeNS on MD17.

File Structure

configs contains config files for training with DeNS on different datasets.
datasets contains LMDB dataset class that can distinguish whether structures in OC20 come from All split or MD split.
model contains EquiformerV2 and eSCN models capable of training with DeNS.
scripts contains the scripts for launching training based on config files.
trainers contains the code for training models for S2EF and with DeNS.

Training

OC20

Modify the paths to datasets before launching training. For example, we need to modify the path to the training set as here and the validation set as here before training EquiformerV2 with DeNS on OC20 S2EF-2M dataset for 12 epochs.
We train EquiformerV2 with DeNS on the OC20 S2EF-2M dataset for 12 epochs by running: bash cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@multi-nodes.sh Note that following the above Environment section, we will run the script under ocp. This script will use 2 nodes with 8 GPUs on each node.

We can also run training on 8 GPUs on 1 node: bash cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@8.sh Note that this is to show that we can train on a single node and the results are not the same as training on 16 GPUs.

Similarly, we train EquiformerV2 with DeNS on the OC20 S2EF-2M dataset for 30 epochs by running: bash cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@30_splits@2M_g@multi-nodes.sh This script will use 4 nodes with 8 GPUs on each node.
We train EquiformerV2 with DeNS on the OC20 S2EF-All+MD dataset by running: bash cd ocp/ sh experimental/scripts/train/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@20_L@6_M@3_splits@all-md_g@multi-nodes.sh This script will use 16 nodes with 8 GPUs on each node.

We use a slightly different dataset class DeNSLmdbDataset so that we can differentiate whether a structure is from the All split or the MD split. This corresponds to the code here and requires relaxations and md to exist in data_log.*.txt files under the All+MD data directory. Those data_log.*.txt should look like: bash # for All split /.../relaxations/.../random1331004.traj,258,365 ... After reading the lmdb files, the DeNSLmdbDataset dataset will add a new attribute md as here.

OC22

Modify the paths to datasets before launching training. Specifically, we need to modify the path to the training set as here and the validation set as here.

In addition, we need to download the linear reference file from here and then add the path to the linear reference file as here and here.

Finally, we download the OC20 reference information file from here and add the path to that file as here and here.
We train EquiformerV2 with DeNS on OC22 dataset by running: bash cd ocp/ sh experimental/scripts/train/oc22/s2ef/equiformer_v2/equiformer_dens_v2_N@18_L@6_M@2_epochs@6_g@multi-nodes.sh This script will use 4 nodes with 8 GPUs on each node.

MD17

Please refer to this repository for training Equiformer with DeNS on MD17.

Checkpoint

We provide the checkpoints of EquiformerV2 trained with DeNS on OC20 S2EF-2M dataset for 12 and 30 epochs, OC20 S2EF-All+MD dataset, and OC22 dataset. |Split |Epochs |Download |val force MAE (meV / Å) |val energy MAE (meV) | |--- |--- |--- |--- |--- | | OC20 S2EF-2M | 12 |checkpoint | config | 19.09 | 269 | | OC20 S2EF-2M | 30 |checkpoint | config | 18.02 | 251 | | OC20 S2EF-All+MD | 2 | checkpoint | config | 14.0 | 222 | | OC22 | 6 | checkpoint | config | (ID) 20.66 | (OOD) 27.11 | (ID) 391.6 | (OOD) 533.0 |

Evaluation

We provide the evaluation script on OC20 and OC22 datasets. After following the above Environment section and downloading the checkpoints here, we run the script to evaluate the results on validation sets.

For instance, after updating the path to the validation set as here and CHECKPOINT as here, we evaluate the result of EquiformerV2 trained on OC20 S2EF-2M dataset for 12 epochs by running: bash cp ocp/ sh experimental/scripts/evaluate/oc20/s2ef/equiformer_v2/equiformer_dens_v2_N@12_L@6_M@2_epochs@12_splits@2M_g@multi-nodes.sh

We can update the path in the config file to evaluate on different validation sub-splits and use different config files to evaluate different models.

Citation

Please consider citing the works below if this repository is helpful:

DeNS: bibtex @article{ DeNS, title={Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields}, author={Yi-Lun Liao and Tess Smidt and Muhammed Shuaibi* and Abhishek Das*}, journal={arXiv preprint arXiv:2403.09549}, year={2024} }
EquiformerV2: bibtex @inproceedings{ equiformer_v2, title={{EquiformerV2: Improved Equivariant Transformer for Scaling to Higher-Degree Representations}}, author={Yi-Lun Liao and Brandon Wood and Abhishek Das* and Tess Smidt*}, booktitle={International Conference on Learning Representations (ICLR)}, year={2024}, url={https://openreview.net/forum?id=mCOBKZmrzD} }
Equiformer: bibtex @inproceedings{ equiformer, title={{Equiformer: Equivariant Graph Attention Transformer for 3D Atomistic Graphs}}, author={Yi-Lun Liao and Tess Smidt}, booktitle={International Conference on Learning Representations (ICLR)}, year={2023}, url={https://openreview.net/forum?id=KwmPfARgOTD} }

Please direct questions to Yi-Lun Liao (ylliao@mit.edu).

Acknowledgement

Our implementation is based on PyTorch, PyG, e3nn, timm, ocp, Equiformer, and EquiformerV2.

Owner

Name: The Atomic Architects
Login: atomicarchitects
Kind: organization
Location: United States of America

Website: https://atomicarchitects.github.io/
Twitter: AtomArchitects
Repositories: 2
Profile: https://github.com/atomicarchitects

Research Group of Prof. Tess Smidt

GitHub Events

Total

Watch event: 22
Push event: 3
Fork event: 1

Last Year

Watch event: 22
Push event: 3
Fork event: 1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/atomicarchitects/dens

Science Score: 23.0%

Keywords

Repository

Basic Info

Statistics

Topics

Metadata Files

README.md

Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields

Content

Environment Setup

Environment

OC20

OC22

MD17

File Structure

Training

OC20

OC22

MD17

Checkpoint

Evaluation

Citation

Acknowledgement

Owner

GitHub Events

Total

Last Year