https://github.com/aim-uofa/fadiff

[ICML 2024] Floating Anchor Diffusion Model for Multi-motif Scaffolding

Science Score: 36.0%

This score indicates how likely this project is to be science-related based on various indicators:

○
CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
✓
Academic publication links
Links to: arxiv.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation

Repository

[ICML 2024] Floating Anchor Diffusion Model for Multi-motif Scaffolding

Basic Info

Host: GitHub
Owner: aim-uofa
Language: Python
Default Branch: main
Size: 41.9 MB

Statistics

Stars: 29
Watchers: 3
Forks: 2
Open Issues: 2
Releases: 0

Created about 2 years ago · Last pushed almost 2 years ago

Metadata Files

Readme

Floating Anchor Diffusion Model for Multi-motif Scaffolding

This repository contains the source code accompanying the paper:

Floating Anchor Diffusion Model for Multi-motif Scaffolding, ICML 2024.

If you use our work then please cite @article{liu2024floating, title={Floating Anchor Diffusion Model for Multi-motif Scaffolding}, author={Liu, Ke and Mao, Weian and Shen, Shuaike and Jiao, Xiaoran and Sun, Zheng and Chen, Hao and Shen, Chunhua}, booktitle={Forty-first International Conference on Machine Learning}, year={2024}, url={https://openreview.net/forum?id=CtgJUQxmEo} }

Installation

We recommend miniconda (or anaconda). Run the following to install a conda environment with the necessary dependencies. bash conda env create -f FADiff.yml

Next, we recommend installing our code as a package. To do this, run the following. pip install -e .

Training

Downloading the PDB for training

To get the training dataset, first download PDB then preprocess it with our provided scripts. PDB can be downloaded from RCSB: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb. Our scripts assume you download in mmCIF format. Navigate down to "Download Protocols" and follow the instructions depending on your location.

WARNING: Downloading PDB can take up to 1TB of space.

After downloading, you should have a directory formatted like this: https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/ 00/ 01/ 02/ .. zz/ In this directory, unzip all the files: gzip -d **/*.gz Then run the following with replaced with the location of PDB. python python process_pdb_dataset.py --mmcif_dir <pdb_dir> See the script for more options. Each mmCIF will be written as a pickle file that we read and process in the data loading pipeline. A metadata.csv will be saved that contains the pickle path of each example as well as additional information about each example for faster filtering.

For PDB files, we provide some starter code in process_pdb_files.py of how to modify process_pdb_dataset.py to work with PDB files (as we did at an earlier point in the project). This has not been tested. Please make a pull request if you create a PDB file processing script.

Downloading PDB clusters

To use clustered training data, download the clusters at 30% sequence identity at rcsb. This download link also works at time of writing: https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-30.txt Place this file in data/processed_pdb or anywhere in your file system. Update your config to point to the clustered data: yaml data: cluster_path: ./data/processed_pdb/clusters-by-entity-30.txt To use clustered data, set sample_mode to either cluster_time_batch or cluster_length_batch. See next section for details.

Batching modes

```yaml experiment: # Use one of the following.

# Each batch contains multiple time steps of the same protein. samplemode: timebatch

# Each batch contains multiple proteins of the same length. samplemode: lengthbatch

# Each batch contains multiple time steps of a protein from a cluster. samplemode: clustertime_batch

# Each batch contains multiple clusters of the same length. samplemode: clusterlength_batch ```

Launching training

shell bash train.sh which contains shell export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 export MASTER_PORT=6003 export NCCL_P2P_DISABLE=1 python -m torch.distributed.run \ --nnodes 1 \ --nproc_per_node=8 \ --master_port=29504 \ experiments/train_se3_diffusion.py \ --config-name=train

License

For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.

Acknowledgements

This code is built upon FrameDiff "SE(3) diffusion model with application to protein backbone generation": https://arxiv.org/abs/2302.02277

Owner

Name: Advanced Intelligent Machines (AIM)
Login: aim-uofa
Kind: organization
Location: China

Repositories: 23
Profile: https://github.com/aim-uofa

A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...

GitHub Events

Total

Watch event: 17
Fork event: 3

Last Year

Watch event: 17
Fork event: 3

Issues and Pull Requests

Last synced: about 1 year ago

All Time

Total issues: 2
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 2
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

https://github.com/aim-uofa/fadiff

Science Score: 36.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Floating Anchor Diffusion Model for Multi-motif Scaffolding

Installation

Training

Downloading the PDB for training

Downloading PDB clusters

Batching modes

Launching training

License

Acknowledgements

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels