geom3d

Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023

https://github.com/chao1224/geom3d

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in README
  • Academic publication links
    Links to: arxiv.org, nature.com, acs.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.2%) to scientific vocabulary

Keywords

3d 3d-structures ai4science biology chemistry crystals drugs equivariance geometry group invariance material molecules physics proteins symmetry
Last synced: 6 months ago · JSON representation

Repository

Geom3D: Geometric Modeling on 3D Structures, NeurIPS 2023

Basic Info
Statistics
  • Stars: 123
  • Watchers: 2
  • Forks: 13
  • Open Issues: 4
  • Releases: 0
Topics
3d 3d-structures ai4science biology chemistry crystals drugs equivariance geometry group invariance material molecules physics proteins symmetry
Created over 2 years ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials

Authors: Shengchao Liu, Weitao Du, Yanjing Li, Zhuoxinran Li, Zhiling Zheng, Chenru Duan, Zhiming Ma, Omar Yaghi, Anima Anandkumar, Christian Borgs, Jennifer Chayes, Hongyu Guo, Jian Tang

[ArXiv]

This is Geom3D, a platfrom for geometric modeling on 3D structures:

Environment

Conda

Setup the anaconda bash wget https://repo.continuum.io/archive/Anaconda3-2019.10-Linux-x86_64.sh bash Anaconda3-2019.10-Linux-x86_64.sh -b export PATH=$PWD/anaconda3/bin:$PATH

Packages

Start with some basic packages. ```bash conda create -n Geom3D python=3.7 conda activate Geom3D conda install -y -c rdkit rdkit conda install -y numpy networkx scikit-learn conda install -y -c conda-forge -c pytorch pytorch=1.9.1 conda install -y -c pyg -c conda-forge pyg=2.0.2 pip install ogb==1.2.1

pip install sympy

pip install ase

pip install lie_learn # for TFN and SE3-Trans

pip install packaging # for SEGNN pip3 install e3nn # for SEGNN

pip install transformers # for smiles pip install selfies # for selfies

pip install atom3d # for Atom3D pip install cffi # for Atom3D pip install biopython # for Atom3D

pip install cython # for pyximport

conda install -y -c conda-forge py-xgboost-cpu # for XGB ```

Datasets

We cover three types of datasets: - Small Molecules - QM9 - MD17 - rMD17 - COLL - Proteins - EC - FOLD - Small Molecules and Proteins - LBA - LEP - Materials - MatBench - QMOF

For dataset acquisition: - We provide a set of raw and processed dataset HuggingFace. You can download the data using python download_data.py under ./data. - Please refer to the data folder for more details.

Overview of Models

Representation Models

Geom3D includes the following representation models: - SchNet, NeurIPS'18 - TFN, NeurIPS'18 Workshop - DimeNet, ICLR'20 - SE(3)-Trans, NeurIPS'20 - EGNN, ICML'21 - PaiNN, ICML'21 - GemNet, NeurIPS'21 - SphereNet, ICLR'22 - SEGNN, ICLR'22 - NequIP, Nature Communications'22 - Allegro, Nature Communications'23 - Equiformer, ICLR'23 - GVP-GNN, ICLR'21 - IEConv, ICLR'21 - GearNet, ICLR'23 - ProNet, ICLR'23 - CDConv, ICLR'23

We also include the following 7 1D models and 11 2D models (specifically for small molecules): - 1D Fingerprints: MLP, RF, XGB - 1D SMILES: CNN, BERT - 1D Selfies: CNN, BERT - 2D topology: - GCN, NeurIPS'2015 - ENN-S2S, ICML'17 - GraphSAGE, NeurIPS'17 - GAT, ICLR'2018 - GIN, ICLR'2019 - D-MPNN, ACS-JCIM'2019 - N-Gram Graph, NeurIPS'2019 - PNA, NeurIPS'2020 - Graphormer, NeurIPS'21 - AWARE, TMLR'2022 - GraphGPS, NeurIPS'22

Notice that there is no pretraining considered at this stage. For geoemtric pretraining models, please check the following section.

Geometric Pretraining

We include the following 14 geometric pretraining methods:

Scripts

The python scripts can be found in examples_3D. We list the bash scripts (and hyperparameters) in scripts. For example, the bash script for SchNet on QM9 is: ``` cd examples_3D

export model3d=SchNet export dataset=QM9 export tasklist=(mu alpha homo lumo gap r2 zpve u0 u298 h298 g298 cv)

export lrlist=(5e-4) export lrschedulerlist=(CosineAnnealingLR) export split=customized01 export seed=42 export embdimlist=(128 300) export batchsizelist=(128)

export epochs=1000

for task in "${tasklist[@]}"; do for lr in "${lrlist[@]}"; do for lrscheduler in "${lrschedulerlist[@]}"; do for embdim in "${embdimlist[@]}"; do for batchsize in "${batchsize_list[@]}"; do

export output_model_dir=output/random/"$model_3d"/"$dataset"/"$task"_"$split"_"$seed"/"$lr"_"$lr_scheduler"_"$emb_dim"_"$batch_size"_"$epochs"
export output_file="$output_model_dir"/result.out
mkdir -p "$output_model_dir"

python finetune_QM9.py \
--model_3d="$model_3d" --dataset="$dataset" --epochs="$epochs" \
--task="$task" \
--split="$split" --seed="$seed" \
--batch_size="$batch_size" \
--emb_dim="$emb_dim" \
--lr="$lr" --lr_scheduler="$lr_scheduler" --no_eval_train --print_every_epoch=1 --num_workers=8 \
--output_model_dir="$output_model_dir" \
> "$output_file"

done done done done done ```

Now only the bash scripts for QM9 are available. We will release the complete version soon, together with Notebook demo. Please stay tuned.

Checkpoints

Checkpoints for all the pretraining and downstream tasks will be released soon.

Cite us

Feel free to cite this work if you find it useful to you!

@article{liu2023symmetry, title={Symmetry-Informed Geometric Representation for Molecules, Proteins, and Crystalline Materials}, author={Liu, Shengchao and Du, Weitao and Li, Yanjing and Li, Zhuoxinran and Zheng, Zhiling and Duan, Chenru and Ma, Zhiming and Yaghi, Omar and Anandkumar, Anima and Borgs, Christian and others}, journal={arXiv preprint arXiv:2306.09375}, year={2023} }

Owner

  • Name: Shengchao Liu
  • Login: chao1224
  • Kind: user
  • Location: Montreal, QC, Canada
  • Company: Mila-UdeM

Ph.D. candidate @ Mila-UdeM

GitHub Events

Total
  • Watch event: 15
  • Fork event: 5
Last Year
  • Watch event: 15
  • Fork event: 5

Committers

Last synced: over 1 year ago

All Time
  • Total Commits: 9
  • Total Committers: 1
  • Avg Commits per committer: 9.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Shengchao Liu s****r@g****m 9

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 3
  • Total pull requests: 4
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 minute
  • Total issue authors: 3
  • Total pull request authors: 2
  • Average comments per issue: 1.0
  • Average comments per pull request: 0.25
  • Merged pull requests: 3
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • cheliu-computation (1)
  • newalexander (1)
  • BaruaBee (1)
Pull Request Authors
  • chao1224 (3)
  • YanjingLiLi (1)
Top Labels
Issue Labels
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 27 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 0
  • Total versions: 1
  • Total maintainers: 1
pypi.org: geom3d

Geometric Modeling on 3D Data

  • Versions: 1
  • Dependent Packages: 0
  • Dependent Repositories: 0
  • Downloads: 27 Last month
Rankings
Dependent packages count: 7.6%
Stargazers count: 11.1%
Forks count: 23.0%
Average: 28.3%
Downloads: 30.3%
Dependent repos count: 69.5%
Maintainers (1)
Last synced: 6 months ago