https://github.com/aqlaboratory/genie

De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds

https://github.com/aqlaboratory/genie

Science Score: 23.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
    Links to: arxiv.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (13.5%) to scientific vocabulary

Keywords

diffusion-models protein-design
Last synced: 5 months ago · JSON representation

Repository

De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds

Basic Info
  • Host: GitHub
  • Owner: aqlaboratory
  • License: apache-2.0
  • Language: Python
  • Default Branch: main
  • Homepage:
  • Size: 124 MB
Statistics
  • Stars: 180
  • Watchers: 4
  • Forks: 23
  • Open Issues: 1
  • Releases: 0
Topics
diffusion-models protein-design
Created about 3 years ago · Last pushed almost 2 years ago
Metadata Files
Readme License

README.md

Genie: De Novo Protein Design by Equivariantly Diffusing Oriented Residue Clouds

This repository provides the implementation code for our ICML paper. Below provides an illustration of the sampling process.

Installation

Clone this repository and go into the root directory. Set up the package by running pip install -e .. This would automatically install dependencies needed for the code, including logging packages like tensorboard and wandb.

Data Download

We provide scripts that we use for downloading and cleaning SCOPe dataset. To download, run chmod +x scripts/install_dataset.sh ./scripts/install_dataset.sh

Training

To train Genie, create a directory runs/[RUN_NAME] and go into the directory. Create a configuration file with name configuration. An example of configuration file is provided in example_configuration and a complete list of configurable parameters could be found in genie/config.py. Note that in the configuration file, name should match with RUN_NAME in order to log into the correct directory. To start training, run python genie/train.py -c runs/RUN_NAME/configuration -g0 & for example, to run in the background on GPU 0.

Sampling

To sample domains using your own trained Genie, run python genie/sample.py -n RUN_NAME -g0 By default, it uses the checkpoint with the latest version and epoch. You could also specify the version and epoch by using the -v and -e flag respectively. This would sample 10 domains per sequence length between 50 and 128, with a sampling batch size of 5. The output are stored in the directory runs/[RUN_NAME]/version_[VERSION]/samples/epoch_[EPOCH].

We also provide the weights for our trained model, which are available under the weights directory, together with the corresponding configuration file. To load the model, run ``` from genie.config import Config from genie.diffusion.genie import Genie

config = Config('weights/configuration') model = Genie.loadfromcheckpoint('weights/geniel128_epoch=49999.ckpt', config=config) ```

Evaluation

To evaluate generated samples, we set up an evaluation pipeline based on ProteinMPNN and ESMFold. To set up the evaluation pipeline, run ./scripts/setup_evaluation_pipeline.sh To run the evaluation pipeline, run python evaluations/pipeline/evaluate.py --input_dir INPUT_DIR --output_dir OUTPUT_DIR Here, the input directory contains a subdirectory named coords, which contains Ca coordinates generated by Genie. The output directory contains the evaluation results.

Owner

  • Name: AQ Laboratory
  • Login: aqlaboratory
  • Kind: organization
  • Email: m.alquraishi@columbia.edu
  • Location: Columbia University

GitHub Events

Total
  • Watch event: 54
  • Pull request event: 2
  • Fork event: 6
Last Year
  • Watch event: 54
  • Pull request event: 2
  • Fork event: 6

Issues and Pull Requests

Last synced: 10 months ago

All Time
  • Total issues: 4
  • Total pull requests: 1
  • Average time to close issues: 14 days
  • Average time to close pull requests: over 1 year
  • Total issue authors: 3
  • Total pull request authors: 1
  • Average comments per issue: 1.5
  • Average comments per pull request: 0.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: about 11 hours
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 1.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • joelmeyerson (2)
  • LarsDu (1)
  • SuperCarryDFY (1)
Pull Request Authors
  • joelmeyerson (2)
  • ivanmilevtues (1)
Top Labels
Issue Labels
Pull Request Labels