swiftmhc

protein-structure-based peptide structure and affinity prediction

https://github.com/x-lab-3d/swiftmhc

Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 2 DOI reference(s) in README
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

protein-structure-based peptide structure and affinity prediction

Basic Info

Host: GitHub
Owner: X-lab-3D
License: apache-2.0
Language: Python
Default Branch: main
Homepage:
Size: 2.76 MB

Statistics

Stars: 2
Watchers: 3
Forks: 0
Open Issues: 3
Releases: 0

Created about 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Code of conduct Citation

Overview

SwiftMHC: A High-Speed Attention Network for MHC-Bound Peptide Identification and 3D Modeling

SwiftMHC is a deep learning algorithm for predicting pMHC structure and binding affinity at the same time. It currently works for HLA-A*0201 9-mers only.

Estimated speed

When running on 1/4 A100 card with batch size 64: * binding affinity (BA) prediction takes 0.01 seconds per pMHC case * 3D structure prediction without OpenMM (disabled) takes 0.9 seconds per pMHC case. * 3D structure prediction with OpenMM takes 2.2 seconds per case.

Dependencies

pip3
python >= 3.11.5
setuptools >= 75.5.0
openfold >= 1.0.0
position-encoding >= 1.0.0 (github.com/X-lab-3D/position-encoding)
PyTorch >= 2.0.1
pandas >= 1.5.3
numpy >= 1.26.4
h5py >= 3.10.0
ml-collections >= 0.1.1
scikit-learn >= 1.4.1
openmm >= 8.1.1 (SwiftMHC needs the cuda version if you were to run on a cuda platform)
blosum >= 2.0.3
modelcif >= 1.0
filelock >= 3.13.1
biopython >= 1.8.4
PyMol >= 3.1

CUDA is optional

Installation

First install PyTorch. Follow the instructions from https://pytorch.org/get-started/locally/

Then install openfold, clone this repo: https://github.com/aqlaboratory/openfold Then from inside that repo, run:

``` pip install -e .

scripts/installthirdparty_dependencies.sh ```

For preprocessing, pymol is required. Download and install from https://pymol.org

Then clone the SwiftMHC repo (this repo) From this repositiry run: pip install -e .

SwiftMHC is now installed.

Preprocessing data

Preprocessing means to create a file in HDF5 format, containing info in the peptide and MHC protein. This is only needed if you want to use a new MHC structure or if you want to train a new network.

Preprocessing requires a CSV table in IEDB format. See the data directory for an example. This table must have the following columns: - ID (required) : the id under which the row's data will be stored in the HDF5 file. This must correspond to the name of a structure in PDB format. - allele (required): the name of the MHC allele. (example: HLA-A*02:01) SwiftMHC will use this to identify MHC structures when predicting unlabeled data. - peptide (optional): the sequence of the peptide. This is used in training, validation, test and not in predicting unlabeled data. - measurement_value (optional): binding affinity data or classification (BINDING/NONBINDING). This is used in training, validation, test and not in predicting unlabeled data.

Preprocessing requires a reference structure, to align all MHC molecules to. It also requires a directory containing all the other structures. These may have a peptide in them, but must always contain an MHC structure.

Preprocessing also requires two mask files: a G-domain and a CROSS mask (pocket residues only). See the data directory for examples. These masks have to be compatible to the reference structure.

To create training, validation, test sets, run: swiftmhc_preprocess IEDB_table.csv ref_mhc.pdb mhcp_binder_models/ \ mhc_self_attention.mask mhc_cross_attention.mask preprocessed_data.hdf5

To preprocess just the MHC allele structures, for predicting unlabeled data, run: swiftmhc_preprocess allele_table.csv ref_mhc.pdb mhc_models/ \ mhc_self_attention.mask mhc_cross_attention.mask preprocessed_mhcs.hdf5

Run swiftmhc_preprocess --help for details.

Preprocessing requires data tables, 3D structures and mask files. Check the data directory in this repo for examples.

Training

This requires preprocessed HDF5 files, containing structures of the MHC protein, peptide and binding affinity or classification data.

Run swiftmhc_run -r example train.hdf5 valid.hdf5 test.hdf5

Run swiftmhc_run --help for details.

This will save the network model to example/best-predictor.pth

Predicting unlabelled data

Do this after training a model (pth format). Alternatively, there are pretrained models in this repository under the directory named trained-models.

Prediction requires preprocessed HDF5 files, containing structures of the MHC protein, for every allele. The data directory contains a preprocessed hdf5 file for the HLA-A*02:01 allele only. Prediction also requires a table, linking the peptides to MHC alleles.

Run swiftmhc_predict -B1 trained-models/8k-trained-model.pth table.csv preprocessed_mhcs.hdf5 results/

The output results directory will contain the BA data and the structures. The file results/results.csv will hold the BA and class values per MHC,peptide combination. Note that the affinities in this file are not IC50 or Kd. They correspond to 1 - log50000(IC50) or 1 - log50000(Kd).

If the flag --with-energy-minimization is included, SwiftMHC runs OpenMM with an amber99sb/tip3p forcefield to refine the final structure.

Run swiftmhc_predict --help for details.

Owner

Name: X-lab-3D
Login: X-lab-3D
Kind: organization

Repositories: 3
Profile: https://github.com/X-lab-3D

Citation (CITATION.cff)

cff-version: "1.2.0"
authors:
  - family-names: Baakman
    given-names: Coos
    orcid: "https://orcid.org/0000-0003-4317-1566"
  - family-names: Xue
    given-names: Li C.
    orcid: "https://orcid.org/0000-0002-2613-538X"
  - family-names: Crocioni
    given-names: Giulia
    orcid: "https://orcid.org/0000-0002-0823-0121"
  - family-names: Rademaker
    given-names: Daniel-T.
    orcid: "https://orcid.org/0000-0003-1959-1317"
  - family-names: Marzella
    given-names: Dario F.
    orcid: "https://orcid.org/0000-0002-0043-3055"
contact:
  - family-names: Baakman
    given-names: Coos
    orcid: "https://orcid.org/0000-0003-4317-1566"

GitHub Events

Total

Create event: 4
Issues event: 3
Watch event: 1
Delete event: 45
Issue comment event: 3
Member event: 2
Push event: 30
Public event: 1

Last Year

Create event: 4
Issues event: 3
Watch event: 1
Delete event: 45
Issue comment event: 3
Member event: 2
Push event: 30
Public event: 1

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 3
Total pull requests: 0
Average time to close issues: 22 days
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.67
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 3
Pull requests: 0
Average time to close issues: 22 days
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.67
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cbaakman (3)

Pull Request Authors

Top Labels

Issue Labels

bug (1) enhancement (1)

Pull Request Labels

Dependencies

setup.py pypi

biopython >=1.84
blosum >=2.0.3
filelock >=3.13.1
h5py >=3.10.0
ml-collections >=0.1.1
modelcif >=1.0
numpy >=1.26.4
openmm *
pandas >=1.5.3
scikit-learn >=1.4.1

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science

swiftmhc

Science Score: 57.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

Overview

Estimated speed

Dependencies

Installation

Preprocessing data

Training

Predicting unlabelled data

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies