https://github.com/aim-uofa/fadiff
[ICML 2024] Floating Anchor Diffusion Model for Multi-motif Scaffolding
Science Score: 36.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
✓Academic publication links
Links to: arxiv.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (16.8%) to scientific vocabulary
Repository
[ICML 2024] Floating Anchor Diffusion Model for Multi-motif Scaffolding
Basic Info
- Host: GitHub
- Owner: aim-uofa
- Language: Python
- Default Branch: main
- Size: 41.9 MB
Statistics
- Stars: 29
- Watchers: 3
- Forks: 2
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Floating Anchor Diffusion Model for Multi-motif Scaffolding
This repository contains the source code accompanying the paper:
Floating Anchor Diffusion Model for Multi-motif Scaffolding, ICML 2024.
If you use our work then please cite
@article{liu2024floating,
title={Floating Anchor Diffusion Model for Multi-motif Scaffolding},
author={Liu, Ke and Mao, Weian and Shen, Shuaike and Jiao, Xiaoran and Sun, Zheng and Chen, Hao and Shen, Chunhua},
booktitle={Forty-first International Conference on Machine Learning},
year={2024},
url={https://openreview.net/forum?id=CtgJUQxmEo}
}
Installation
We recommend miniconda (or anaconda).
Run the following to install a conda environment with the necessary dependencies.
bash
conda env create -f FADiff.yml
Next, we recommend installing our code as a package. To do this, run the following.
pip install -e .
Training
Downloading the PDB for training
To get the training dataset, first download PDB then preprocess it with our provided scripts. PDB can be downloaded from RCSB: https://www.wwpdb.org/ftp/pdb-ftp-sites#rcsbpdb. Our scripts assume you download in mmCIF format. Navigate down to "Download Protocols" and follow the instructions depending on your location.
WARNING: Downloading PDB can take up to 1TB of space.
After downloading, you should have a directory formatted like this:
https://files.rcsb.org/pub/pdb/data/structures/divided/mmCIF/
00/
01/
02/
..
zz/
In this directory, unzip all the files:
gzip -d **/*.gz
Then run the following with python
python process_pdb_dataset.py --mmcif_dir <pdb_dir>
See the script for more options. Each mmCIF will be written as a pickle file that
we read and process in the data loading pipeline. A metadata.csv will be saved
that contains the pickle path of each example as well as additional information
about each example for faster filtering.
For PDB files, we provide some starter code in process_pdb_files.py of how to
modify process_pdb_dataset.py to work with PDB files (as we did at an earlier
point in the project). This has not been tested. Please make a pull request
if you create a PDB file processing script.
Downloading PDB clusters
To use clustered training data, download the clusters at 30% sequence identity
at rcsb.
This download link also works at time of writing:
https://cdn.rcsb.org/resources/sequence/clusters/clusters-by-entity-30.txt
Place this file in data/processed_pdb or anywhere in your file system.
Update your config to point to the clustered data:
yaml
data:
cluster_path: ./data/processed_pdb/clusters-by-entity-30.txt
To use clustered data, set sample_mode to either cluster_time_batch or cluster_length_batch.
See next section for details.
Batching modes
```yaml experiment: # Use one of the following.
# Each batch contains multiple time steps of the same protein. samplemode: timebatch
# Each batch contains multiple proteins of the same length. samplemode: lengthbatch
# Each batch contains multiple time steps of a protein from a cluster. samplemode: clustertime_batch
# Each batch contains multiple clusters of the same length. samplemode: clusterlength_batch ```
Launching training
shell
bash train.sh
which contains
shell
export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
export MASTER_PORT=6003
export NCCL_P2P_DISABLE=1
python -m torch.distributed.run \
--nnodes 1 \
--nproc_per_node=8 \
--master_port=29504 \
experiments/train_se3_diffusion.py \
--config-name=train
License
For non-commercial academic use, this project is licensed under the 2-clause BSD License. For commercial use, please contact Chunhua Shen.
Acknowledgements
This code is built upon FrameDiff "SE(3) diffusion model with application to protein backbone generation": https://arxiv.org/abs/2302.02277
Owner
- Name: Advanced Intelligent Machines (AIM)
- Login: aim-uofa
- Kind: organization
- Location: China
- Repositories: 23
- Profile: https://github.com/aim-uofa
A research team at Zhejiang University, focusing on Computer Vision and broad AI research ...
GitHub Events
Total
- Watch event: 17
- Fork event: 3
Last Year
- Watch event: 17
- Fork event: 3
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 2
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 2
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 2
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 2
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- WillHua127 (1)
- Cloud-Rambler (1)