https://github.com/bioinfomachinelearning/gate

Graph transformer for estimating protein model accuracy

Last synced: 9 months ago · JSON representation

Repository

Graph transformer for estimating protein model accuracy

Basic Info

Host: GitHub
Owner: BioinfoMachineLearning
Language: C
Default Branch: main
Size: 187 MB

Statistics

Stars: 4
Watchers: 3
Forks: 1
Open Issues: 1
Releases: 0

Created about 3 years ago · Last pushed 9 months ago

Metadata Files

Readme

GATE: Graph Transformers for Estimating Protein Model Accuracy

Introduction

GATE is a tool designed for estimating protein model accuracy using advanced graph transformers. This repository contains the code, pre-trained models, and instructions for setup and usage.

Program workflow

The overall performance of 23 CASP16 predictors in estimating the accuracy of the structural models of 36 out of 38 CASP16 multimer targets according to the z-scores of multiple evaluation metrics (i.e., Pearson's correlation, Spearman's correlation, AUC, and ranking loss) in terms of both TM-score and oligomer GDT-TS score. Each kind of z-score is denoted by a colored bar. The predictors are ordered according to the weighted sum of all the z-scores.

CASP16 result

Table 1. Average per-target evaluation metrics (Pearson's correlation, Spearman's correlation, ranking loss and AUC) of 23 CASP16 predictors in terms of TM-score and Oligo-GDT-TS. The best performance for each metric is marked with [BEST], second-best with [2nd], and third-best with [3rd].

| Predictor Name |----------------------|------- | MULTICOMLLM | **MULTICOMGATE** | AssemblyConsensus | 0.6367 | ModFOLDdock2 | 0.6542 [3rd] | MULTICOM | 0.6156 | MIEnsembles-Server | 0.6072 | GuijunLab-QA | 0.6480 | GuijunLab-Human | 0.6327 | MULTICOMhuman | 0.5897 | GuijunLab-PAthreader | 0.5309 | ModFOLDdock2R | 0.5724 | GuijunLab-Assembly | 0.5439 | ChaePred | 0.4548 | ModFOLDdock2S | 0.5285 | MQAserver | 0.4326 | MQAbase | 0.4331 | GuijunLab-Complex | 0.4889 | AFunmasked | 0.4015 | MQA | 0.4410 | COAST | 0.3840 | MULTICOMAI | 0.3281 | VifChartreuse | 0.2921 | VifChartreuseJaune | 0.3421 | PIEFoldhuman | 0.1929 | Corr (TM-score) | Corr (TM-score) | Ranking Loss (TM-score) | AUC (TM-score) | Corr (Oligo-GDT-TS) | Corr (Oligo-GDT-TS) | Ranking Loss (Oligo-GDT-TS) | AUC (Oligo-GDT-TS) | -----------|-------------------|-------------------------|----------------|---------------------|----------------------|----------------------------|--------------------| | 0.6836 [2nd] | 0.4808 [BEST] | 0.1230 | 0.6685 [2nd] | 0.6722 [3rd] | 0.4656 [2nd] | 0.1252 [BEST] | 0.6603 [3rd] | | 0.7076 [BEST] | 0.4514 | 0.1221 [3rd] | 0.6680 [3rd] | 0.7235 [2nd] | 0.4399 [3rd] | 0.1328 [2nd] | 0.6461 | | 0.4661 [2nd] | 0.1824 | 0.6584 | 0.7701 [BEST] | 0.5163 [BEST] | 0.1753 | 0.6702 [BEST] | | 0.4640 [3rd] | 0.1371 | 0.6859 [BEST] | 0.6547 | 0.4143 | 0.1530 | 0.6588 | | 0.4380 | 0.1207 [2nd] | 0.6660 | 0.6413 | 0.4319 | 0.1368 [3rd] | 0.6536 | | 0.4498 | 0.1325 | 0.6670 | 0.6084 | 0.4091 | 0.1451 | 0.6671 [2nd] | | 0.4149 | 0.1195 [BEST] | 0.6328 | 0.6524 | 0.3972 | 0.1406 | 0.6377 | | 0.4148 | 0.1477 | 0.6368 | 0.6404 | 0.3976 | 0.1499 | 0.6483 | | 0.4260 | 0.1518 | 0.6576 | 0.6149 | 0.4217 | 0.1498 | 0.6572 | | 0.3744 | 0.1331 | 0.6237 | 0.6360 | 0.4353 | 0.1371 | 0.6382 | | 0.3867 | 0.1375 | 0.6518 | 0.6339 | 0.3724 | 0.1483 | 0.6355 | | 0.3280 | 0.1636 | 0.6191 | 0.5809 | 0.3135 | 0.1611 | 0.6182 | | 0.3971 | 0.1580 | 0.6534 | 0.4875 | 0.3673 | 0.1563 | 0.6331 | | 0.3116 | 0.1806 | 0.6084 | 0.5819 | 0.3335 | 0.1648 | 0.6129 | | 0.2913 | 0.1468 | 0.6120 | 0.5617 | 0.3708 | 0.1521 | 0.6323 | | 0.2897 | 0.1462 | 0.6085 | 0.5533 | 0.3597 | 0.1509 | 0.6281 | | 0.3019 | 0.1792 | 0.6054 | 0.5693 | 0.3310 | 0.1772 | 0.6077 | | 0.2731 | 0.1595 | 0.6052 | 0.4354 | 0.2875 | 0.1815 | 0.6113 | | 0.2425 | 0.2183 | 0.5858 | 0.4911 | 0.2631 | 0.2499 | 0.5874 | | 0.2297 | 0.2091 | 0.6072 | 0.4484 | 0.2678 | 0.2204 | 0.6078 | | 0.2623 | 0.1913 | 0.6057 | 0.3843 | 0.2834 | 0.1963 | 0.6111 | | 0.2777 | 0.1440 | 0.6149 | 0.2982 | 0.2469 | 0.1641 | 0.5956 | | 0.1756 | 0.1630 | 0.5951 | 0.3300 | 0.1548 | 0.1915 | 0.5811 | | 0.1451 | 0.2306 | 0.5497 | 0.2599 | 0.1759 | 0.2409 | 0.5541 |

Table 2: Comparison of evaluation metrics (Pearson's correlation, Spearman's correlation, ranking loss, and AUC) for different EMA methods applied to in-house structural models generated by MULTICOM4 in the CASP16 blind experiment. The evaluation was conducted using both TM-score and Oligo-GDT-TS. The best performance for each metric is shown in bold, and the second-best is underlined. The values marked with * are statistically significantly worse (p $<$ 0.05) than the GATE-Ensemble baseline based on the one-sided Wilcoxon signed-rank test.

| Method | Corr (TM-score) | Corr (TM-score) | Ranking Loss (TM-score) | AUC (TM-score) | Corr (Oligo-GDT-TS) | Corr (Oligo-GDT-TS) | Ranking Loss (Oligo-GDT-TS) | AUC (Oligo-GDT-TS) | |---|---|---|---|---|---|---|---|---| | PSS | 0.3947 | 0.2523 | 0.1388 | 0.6384 | 0.3385 | 0.2495 | 0.1582 | 0.6282* | | AlphaFold plDDTnorm | 0.3806 | _0.2731 | 0.1334 | 0.6557 | 0.3663 | 0.2557 | 0.1206 | 0.6587 | | DProQAnorm | -0.0507* | 0.0112* | 0.1942* | 0.5689* | 0.0319* | 0.0709* | 0.2225 | 0.5874 | | VoroIF-GNN-scorenorm | 0.0648* | 0.1157* | 0.1929* | 0.5995 | 0.1143* | 0.1704 | 0.2066 | 0.6222 | | Avg-VoroIF-GNN-res-pCADnorm | 0.0729* | 0.1046* | 0.1669 | 0.5887* | 0.0744* | 0.1374* | 0.2044 | 0.6155 | | VoroMQA-dark globalnorm | 0.0385* | 0.1443 | 0.1286 | 0.6094 | -0.0126* | 0.1456 | 0.1626 | 0.6220 | | GCPNet-EMAnorm | 0.3597 | 0.2491 | 0.1345 | 0.6431 | 0.3555 | 0.2642 | 0.1691 | 0.6476 | | GATE-Ensemble | 0.4083 | 0.2774 | _0.1327 | 0.6469 | 0.3801 | 0.2989 | 0.1626 | 0.6475 |

Table 3: Comparison of GATE model, GATE ablation variants, CASP15 EMA predictors and other methods in terms of Pearson's correlation, Spearman's correlation, ranking loss, and AUC based on TM-score and DockQ on the CASP15 complex structure dataset. The term norm indicates that the quality scores predicted by a method are normalized by the length of the predicted structure relative to the native structure. Only the performance of the normalization version of such a method is shown because their unnormalized outputs do not account for partial structures. Bold font denotes the best result, while the second best result is underlined. The values marked with * are statistically significantly worse (p $<$ 0.05) than the GATE-Ensemble baseline based on the one-sided Wilcoxon signed-rank test.

| Method | Corr (TM-score) | Corr (TM-score) | Ranking Loss (TM-score) | AUC (TM-score) | Corr (DockQ) | Corr (DockQ) | Ranking Loss (DockQ) | AUC (DockQ) | |--------|------------|-----------|------------------|--------|--------------|--------------|--------------------|----------| | CASP15 EMA Predictors ||||||||| | VoroMQA-select-2020 | 0.3944* | 0.3692* | 0.1735* | 0.6663* | 0.4322* | 0.4044 | 0.2682 | 0.6741 | | ModFOLDdock | 0.5161* | 0.4356* | 0.1841 | 0.6721* | 0.5622 | 0.5185 | 0.2181 | 0.7022 | | ModFOLDdockS | 0.4717* | 0.3614* | 0.2199* | 0.6333* | 0.4068* | 0.4073 | 0.3119* | 0.6632 | | MULTICOMqa | 0.6678* | 0.5260 | 0.1472 | 0.7059 | 0.5256 | 0.4668 | 0.2661 | 0.6748 | | MULTICOMegnn | 0.1437* | 0.1179* | 0.2611* | 0.5956* | 0.2158* | 0.2283* | 0.2943* | 0.6302 | | VoroIF | 0.4645* | 0.3069* | 0.1568* | 0.6472* | 0.5039 | 0.3455* | 0.2297 | 0.6447 | | ModFOLDdockR | 0.5333* | 0.4040* | 0.2160* | 0.6626* | 0.5357 | 0.4673 | 0.2623 | 0.6787 | | Bhattacharya | 0.3803* | 0.3438* | 0.2220* | 0.6495* | 0.3581* | 0.3190* | 0.3475* | 0.6392* | | MUFold2 | 0.5370* | 0.2662* | 0.2374* | 0.6168* | 0.3846* | 0.1839* | 0.3850* | 0.5913* | | MUFold | 0.5435* | 0.2714* | 0.2267* | 0.6252* | 0.3856* | 0.1356* | 0.3457* | 0.5865* | | ChaePred | 0.4706* | 0.3507* | 0.2311* | 0.6592* | 0.4381* | 0.3545* | 0.3565* | 0.6615 | | Venclovas | 0.4677* | 0.3828* | 0.1249 | 0.6756* | 0.5288 | 0.4506 | 0.1828 | 0.6890 | | Other Methods (normalized if applicable) ||||||||| | PSS | 0.7292 | 0.5755 | 0.1406 | 0.7137 | 0.5118 | 0.4469 | 0.2648 | 0.6660 | | AlphaFold plDDTnorm | 0.2578* | 0.2611* | 0.1793 | 0.6399* | 0.1710* | 0.1886* | 0.2615* | 0.6165* | | DProQAnorm | 0.1598* | 0.1174* | 0.2555* | 0.5942* | 0.2109* | 0.2255* | 0.3162* | 0.6248 | | VoroIF-GNN-scorenorm | 0.1972* | 0.0966* | 0.2092* | 0.5695* | 0.2283* | 0.1335* | 0.2935* | 0.5704* | | Avg-VoroIF-GNN-res-pCADnorm | 0.1335* | -0.0027* | 0.1737 | 0.5525* | 0.1049* | -0.0030* | 0.2284 | 0.5522* | | VoroMQA-dark globalnorm | 0.0253* | 0.0037* | 0.1265 | 0.5580* | -0.0670* | -0.0316* | 0.2191 | 0.5476* | | GCPNet-EMAnorm | 0.3216* | 0.2696* | 0.2052* | 0.6379* | 0.1862* | 0.1803* | 0.2830* | 0.6198* | | GATE Models ||||||||| | GATE-Basic | 0.7447 | 0.5722 | 0.1127 | 0.7181 | 0.5330 | 0.4345 | 0.2348* | 0.6703 | | GATE-GCP | 0.7453 | 0.5788 | 0.1186 | 0.7191 | 0.5358 | 0.4389* | 0.2083 | 0.6715 | | GATE-Advanced | 0.7224* | 0.5416* | 0.1018 | 0.6981* | 0.5142 | 0.4298 | 0.2112 | 0.6618 | | GATE-Ensemble | 0.7480 | 0.5754 | 0.1191 | 0.7194 | 0.5353 | 0.4477 | 0.2140 | 0.6756 | | GATE Ablation Variants ||||||||| | GATE-Basic (w/o subgraph sampling) | 0.7169 | 0.5478* | 0.1266 | 0.7067 | 0.5063* | 0.4145* | 0.2620* | 0.6528* | | GATE-GCP (w/o subgraph sampling) | 0.7503 | 0.5771 | 0.1363 | 0.7278 | 0.5253 | 0.4394 | 0.2545* | 0.6773 | | GATE-Advanced (w/o subgraph sampling) | 0.7158* | 0.5403* | 0.1224 | 0.7043* | 0.4975* | 0.4286 | 0.2478* | 0.6616* | | GATE-Basic (w/o pairwise loss) | 0.6881* | 0.5534 | 0.1329 | 0.7183 | 0.5226 | 0.4498 | 0.2451 | 0.6796 | | GATE-GCP (w/o pairwise loss) | 0.6923* | 0.5392* | 0.1516* | 0.7051 | 0.4974* | 0.4062* | 0.2604* | 0.6582* | | GATE-Advanced (w/o pairwise loss) | 0.6756* | 0.5176* | 0.1588* | 0.6961* | 0.4982 | 0.4170 | 0.2538* | 0.6617 | | GATE-NoSingleEMA | 0.6570* | 0.4832* | 0.1511* | 0.6927 | 0.4987* | 0.3967* | 0.2986* | 0.6681 |

Installation

Clone the Repository

bash git clone -b public https://github.com/BioinfoMachineLearning/gate cd gate

Install Mamba

wget "https://github.com/conda-forge/miniforge/releases/download/23.1.0-3/Mambaforge-$(uname)-$(uname -m).sh" bash Mambaforge-$(uname)-$(uname -m).sh rm Mambaforge-$(uname)-$(uname -m).sh source ~/.bashrc

Install tools

``` cd tools

Install GCPNet-EMA

git clone https://github.com/BioinfoMachineLearning/GCPNet-EMA mkdir GCPNet-EMA/checkpoints wget -P GCPNet-EMA/checkpoints/ https://zenodo.org/record/10719475/files/structureemafinetunedgcpneti2d5t9xhbestepoch_106.ckpt

Install EnQA

git clone https://github.com/BioinfoMachineLearning/EnQA chmod -R 755 EnQA/utils

Install DProQA

git clone https://github.com/jianlin-cheng/DProQA

Install Venclovas QAs

git clone https://github.com/kliment-olechnovic/ftdmp

Install CDPred

git clone https://github.com/BioinfoMachineLearning/CDPred

Install openstructure

docker pull registry.scicore.unibas.ch/schwede/openstructure:latest

or

singularity pull docker://registry.scicore.unibas.ch/schwede/openstructure:latest ```

Set Up Python Environments

```

Install python enviorment for gate

mamba install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia mamba install -c dglteam dgl-cuda11.0 mamba install pandas biopython

Install python enviorment for GCPNet-EMA

mamba env create -f tools/GCPNet-EMA/environment.yaml mamba activate GCPNet-EMA pip3 install -e tools/GCPNet-EMA pip3 install prody==2.4.1 pip3 uninstall protobuf mamba deactivate

Install python enviorment for EnQA

mamba env create -f envs/enqa.yaml

Install python enviorment for DProQA

mamba env create -f envs/dproqa.yaml

Install python enviorment for VoroMQA

mamba env create -f envs/ftdmp.yaml

Install python enviorment for CDPred

mamba env create -f envs/cdpred.yaml

```

Download databases (~2.5T)

``` mkdir databases

Create virtual links if the databases are stored elsewhere

sh scripts/downloadbfd.sh databases/ sh scripts/downloaduniref90.sh databases/ ```

Configuration

* Replace the contents for the ROOTDIR in gate/feature/config.py with your installation path

* Set use_docker to False if using Singularity instead of Docker.

Usage

To run the GATE tool for estimating protein multimer structure accuracy, use the inference_multimer.py script with the following arguments:

Required Arguments:

--fastapath FASTAPATH

The path to the input FASTA file containing the protein sequences.
--inputmodeldir INPUTMODELDIR

The directory containing the input protein models.
--outputdir OUTPUTDIR

The directory where the output results will be saved.

Optional Arguments:

--pkldir PKLDIR

The directory where intermediate pickle files will be stored.
--useaffeature USEAFFEATURE

Specify whether to use AlphaFold features. Accepts True or False. Default is False.
--sampletimes SAMPLETIMES Number of times to sample the models. Default is 5.

Example Commands:

Here are examples of how to use the inference_multimer.py script with different settings:

Not using AlphaFold Features (default)

```bash python inferencemultimer.py --fastapath $FASTAPATH --inputmodeldir $INPUTMODELDIR --outputdir $OUTPUT_DIR

Using AlphaFold Features bash python inference_multimer.py --fasta_path $FASTA_PATH --input_model_dir $INPUT_MODEL_DIR --output_dir $OUTPUT_DIR --pkldir $PKLDIR --use_af_feature True

Citing This Work

If you find this work useful, please cite:

Liu, J., Neupane, P., & Cheng, J. (2025). Estimating Protein Complex Model Accuracy Using Graph Transformers and Pairwise Similarity Graphs. bioRxiv, 2025-02 (https://www.biorxiv.org/content/10.1101/2025.02.04.636562v1)

bibtex @article {Liu2025.02.04.636562, author = {Liu, Jian and Neupane, Pawan and Cheng, Jianlin}, title = {Estimating Protein Complex Model Accuracy Using Graph Transformers and Pairwise Similarity Graphs}, elocation-id = {2025.02.04.636562}, year = {2025}, doi = {10.1101/2025.02.04.636562}, publisher = {Cold Spring Harbor Laboratory}, URL = {https://doi.org/10.1101/2025.02.04.636562}, journal = {bioRxiv} }

Bonus

Monomer Structure Estimation

To estimate the accuracy of protein tertiary structures with GATE, you need to install an additional dependency: DeepRank3.

```bash cd tools git clone https://github.com/jianlin-cheng/DeepRank3/

Follow the installation instructions in DeepRank3

```

Once DeepRank3 is installed under the tools directory, you can run the inference_monomer.py script to evaluate the quality of a pool of protein tertiary structure models.

Required Arguments:

--fastapath FASTAPATH

The path to the input FASTA file containing the protein sequences.
--inputmodeldir INPUTMODELDIR

The directory containing the input protein models.
--outputdir OUTPUTDIR

The directory where the output results will be saved.

Optional Arguments:

--sampletimes SAMPLETIMES Number of times to sample the models. Default is 5.

Example Command:

bash python inference_monomer.py --fasta_path $FASTA_PATH --input_model_dir $INPUT_MODEL_DIR --output_dir $OUTPUT_DIR

Owner

Name: BioinfoMachineLearning
Login: BioinfoMachineLearning
Kind: organization

Repositories: 29
Profile: https://github.com/BioinfoMachineLearning

GitHub Events

Total

Issues event: 1
Watch event: 2
Delete event: 1
Issue comment event: 2
Public event: 1
Push event: 13
Pull request event: 2
Fork event: 1

Last Year

Issues event: 1
Watch event: 2
Delete event: 1
Issue comment event: 2
Public event: 1
Push event: 13
Pull request event: 2
Fork event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 0
Total pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Total issue authors: 0
Total pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 1
Average time to close issues: N/A
Average time to close pull requests: less than a minute
Issue authors: 0
Pull request authors: 1
Average comments per issue: 0
Average comments per pull request: 0.0
Merged pull requests: 1
Bot issues: 0
Bot pull requests: 0

https://github.com/bioinfomachinelearning/gate

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

GATE: Graph Transformers for Estimating Protein Model Accuracy

Table of Contents

Introduction

Table 1. Average per-target evaluation metrics (Pearson's correlation, Spearman's correlation, ranking loss and AUC) of 23 CASP16 predictors in terms of TM-score and Oligo-GDT-TS. The best performance for each metric is marked with [BEST], second-best with [2nd], and third-best with [3rd].

Installation

Clone the Repository

Install Mamba

Install tools

Install GCPNet-EMA

Install EnQA

Install DProQA

Install Venclovas QAs

Install CDPred

Install openstructure

or

Set Up Python Environments

Install python enviorment for gate

Install python enviorment for GCPNet-EMA

Install python enviorment for EnQA

Install python enviorment for DProQA

Install python enviorment for VoroMQA

Install python enviorment for CDPred

Download databases (~2.5T)

Create virtual links if the databases are stored elsewhere

Configuration

Usage

Required Arguments:

Optional Arguments:

Example Commands:

Citing This Work

Bonus

Monomer Structure Estimation

Follow the installation instructions in DeepRank3

Required Arguments:

Optional Arguments:

Example Command:

Owner

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels