https://github.com/bioinfomachinelearning/multicom4

The MULTICOM4 protein structure prediction system developed by the Bioinformatics and Machine Learning Lab at the University of Missouri - Columbia

Last synced: 6 months ago · JSON representation

Repository

The MULTICOM4 protein structure prediction system developed by the Bioinformatics and Machine Learning Lab at the University of Missouri - Columbia

Basic Info

Host: GitHub
Owner: BioinfoMachineLearning
License: other
Language: Python
Default Branch: main
Homepage:
Size: 143 MB

Statistics

Stars: 9
Watchers: 4
Forks: 3
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed 6 months ago

Metadata Files

Readme License

MULTICOM4 protein structure prediction system

MULTICOM4 is an advanced protein structure prediction system built on AlphaFold2 and 3. It achieved remarkable success in the 16th world-wide Critical Assessment of Techniques for Protein Structure Prediction (CASP16) concluded in December 2024, ranking 1st in protein complex structure prediction without stoichiometry information (Phase 0), 2nd in protein tertiary structure prediction, 2nd in estimating the global fold accuracy of protein complex structures, 3rd in protein complex structure prediction with stoichiometry information (Phase 1), and 5th in protein-ligand structure and binding affinity prediction.

Some CASP16 Prediction Examples

Colored Chains: Model predicted by MULTICOM4
Brown: True structure

| Target ID | Description | Visualization | | ------------- | ---------------------------------------------- | ------------------------------- | | H0215 | A1B1, mNeonGreen with Bound Nanobody | H0215 | | H0233 | A2B2C2, Antibody Fab 3H4 complex, virus capsid | H0233 | | H0245 | A1B1, FUNComplex, Shallow MSA | H0245 | | T0234o | A3, Better Stoichiometry Prediction | T0234 |

The workflow of the MULTICOM4 protein complex structure prediction system used in CASP16

MULTICOM4

The workflow of the MULTICOM4 protein tertiary structure prediction system used in CASP16

MULTICOM4

Download MULTICOM4 package

git clone --recursive https://github.com/BioinfoMachineLearning/MULTICOM4

Installation (non Docker version)

Install AlphaFold/AlphaFold-Multimer and other required third-party packages (modified from alphafoldnondocker)

Install miniconda

bash wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh

Create a new conda environment and update

bash conda create --name multicom4 python==3.8 conda update -n base conda

Activate conda environment

bash conda activate multicom4

Install dependencies

Change cudatoolkit==11.2.2 version if it is not supported in your system

bash conda install -y -c conda-forge openmm==7.5.1 cudatoolkit==11.2.2 pdbfixer conda install -y -c bioconda hmmer hhsuite==3.3.0 kalign2

Change jaxlib==0.3.25+cuda11.cudnn805 version if this is not supported in your system

``` bash pip install absl-py==1.0.0 biopython==1.79 chex==0.0.7 dm-haiku==0.0.9 dm-tree==0.1.6 immutabledict==2.0.0 jax==0.3.25 ml-collections==0.1.0 numpy==1.21.6 pandas==1.3.4 protobuf==3.20.1 scipy==1.7.0 tensorflow-cpu==2.9.0

pip install --upgrade --no-cache-dir jax==0.3.25 jaxlib==0.3.25+cuda11.cudnn805 -f https://storage.googleapis.com/jax-releases/jaxcudareleases.html ```

Download chemical properties to the common folder

``` bash

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

wget -q -P $MULTICOM4INSTALLDIR/tools/alphafold-v2.3.2/alphafold/common/ https://git.scicore.unibas.ch/schwede/openstructure/-/raw/7102c63615b64735c4941278d92b554ec94415f8/modules/mol/alg/src/stereochemicalprops.txt ```

Apply OpenMM patch

``` bash

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

cd ~/anaconda3/envs/multicom4/lib/python3.8/site-packages/ && patch -p0 < $MULTICOM4INSTALLDIR/tools/alphafold-v2.3.2/docker/openmm.patch

or

cd ~/miniconda3/envs/multicom4/lib/python3.8/site-packages/ && patch -p0 < $MULTICOM4INSTALLDIR/tools/alphafold-v2.3.2/docker/openmm.patch ```

Install other required third-party packages

``` conda install tqdm conda install -c conda-forge -c bioconda foldseek conda install scikit-learn

if running jackhmmer returns error: libgsl.so.25: cannot open shared object file: No such file or directory

conda install -c conda-forge gsl=2.5

pip install charset_normalizer==3.3.1 ```

Install third-party packages envorinments

```python

DHR

conda env create -f tools/Dense-Homolog-Retrieval/env.yml

ESMFold

conda env create -f envs/esm.yml conda activate esmfold pip install "fair-esm[esmfold]" pip install 'openfold @ git+https://github.com/aqlaboratory/openfold.git@4b41059694619831a7db195b7e0988fc4ff3a307' pip install transformers

AFsample

conda env create -f tools/afsample/afsample.yml ```

Download Genetic databases in AlphaFold2/AlphaFold-Multimer

```

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

bash $MULTICOM4INSTALLDIR/tools/alphafold-v2.3.2/scripts/downloadalldata.sh ```

AFsample also uses both v2.1.0 and v2.2.0 AlphaFold-Multimer model weights. Download them using the links below and extract them in the params/ folder in the $YOURALPHAFOLDDB_DIR.

The v2.2.0 AlphaFold-Multimer model weights: https://storage.googleapis.com/alphafold/alphafoldparams2022-03-02.tar

The v2.1.0 AlphaFold-Multimer model weights: https://storage.googleapis.com/alphafold/alphafoldparams2022-01-19.tar

Install the MULTICOM4 addon system and its databases

```python

Note: here the parameters should be the absolute paths

python downloaddatabaseandtools.py --multicom4dbdir

Configure the MULTICOM4 system

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

Replace $YOURALPHAFOLDDB_DIR with your downloaded AlphaFold databases directory

python configure.py --envdir ~/miniconda3/envs/multicom4 --multicom4dbdir <YOURMULTICOM4DBDIR> --afdbdir <YOURALPHAFOLDDBDIR>

e.g,

python downloaddatabaseand_tools.py \

--multicom4dbdir /home/multicom4/tools/multicom4db

python configure.py \

--multicom4dbdir /home/multicom4/tools/multicom4db \

--afdbdir /home/multicom4/tools/alphafolddatabases/

``` The configure.py python script will * Copy the alphafoldaddon scripts * Create the configuration file (bin/dboption) for running the system

Genetic databases used by MULTICOM4

Assume the following databases have been installed as a part of the AlphaFold2/AlphaFold-Multimer installation * BFD, * MGnify, * PDB70, * PDB (structures in the mmCIF format), * PDB seqres * UniRef30, * UniProt, * UniRef90.

Additional databases will be installed for the MULTICOM system by setup.py: * AlphaFoldDB: ~53G * ESM Atlas: ~99G * Metaclust: ~114G * STRING: ~129G * pdb_complex v2024: ~38G * pdb_sort90 v2024: ~48G * Uniclust30: ~87G * DHR_DATABASE: 1.5T * JGIclust: ~1.1T

Key Parameters for Running MULTICOM4

All parameters for running AlphaFold2/AlphaFold-Multimer are defined in multicom4/common/config.py. These configurations were used for large-scale sampling during the CASP16 competition, with the exception of the number of models, which should be adjusted according to the size of the target protein. Users may modify the following parameters as needed:

```python

Number of models generated per checkpoint for monomer predictions using AlphaFold2. Default: 100.

Note: Each AlphaFold2 predictor will generate 100 * 5 models.

MONOMERPREDICTIONSPER_MODEL = 100

Configuration for hetero-multimer predictions using AlphaFold-Multimer. Default: 100 models.

Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.

HETEROMULTIMERPREDICTIONSPERMODEL = 100

Configuration for homo-multimer predictions using AlphaFold-Multimer. Default: 100 models.

Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.

HOMOMULTIMERPREDICTIONSPERMODEL = 100 ```

Before running the system for non Docker version

Activate your python environment and add the MULTICOM4 system path to PYTHONPATH

```bash conda activate multicom4

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory (absolute path)

export PYTHONPATH=$MULTICOM4INSTALLDIR

e.g,

conda activate MULTICOM4

export PYTHONPATH=/home/multicom4/MULTICOM4

``` Now MULTICOM4 is ready for you to make predictions.

Running the monomer/tertiary structure prediction pipeline

Say we have a monomer with the sequence <SEQUENCE>. The input sequence file should be in the FASTA format as follows:

```fasta

sequence_name ```

Note: It is recommended that the name of the sequence file in FASTA format should be the same as the sequence name.

Then run the following command:

```bash

Please provide absolute path for the input parameters

sh bin/monomer/runmonomer.sh <optionfile> ```

optionfile (e.g., bin/dboption) is a file in the MULTICOM4 package to store the path of the databases/tools. fastapath is the full path of the file storing the input protein sequence(s) in the FASTA format. outputdir specifies where the prediction results are stored.

Output

$OUTPUT_DIR/ # Your output directory N1_monomer_alignments_generation/ # Working directory for generating monomer MSAs N2_monomer_template_search/ # Working directory for searching monomer templates N3_monomer_structure_generation/ # Working directory for generating monomer structural predictions N4_monomer_structure_evaluation/ # Working directory for evaluating the monomer structural predictions - alphafold_ranking.csv # AlphaFold2 pLDDT ranking

The predictions and ranking files are saved in the N4monomerstructure_evaluation folder. You can check the AlphaFold2 pLDDT score ranking file (alphafold_ranking.csv) to look for the structure with the highest pLDDT score.

Running the multimer/quaternary structure prediction pipeline

Folding a multimer

Say we have a homomer with 4 copies of the same sequence <SEQUENCE>. The input file should be in the format as follows:

```fasta

sequence1 sequence2 sequence3 sequence4 ```

Then run the following command:

```bash

Please provide absolute path for the input parameters

sh bin/multimer/runmultimer.sh <optionfile> ```

Output

``` $OUTPUTDIR/ # Your output directory N1monomeralignmentsgeneration/ # Working directory for generating monomer MSAs - Subunit A - Subunit B - ... N1monomeralignmentsgenerationimg/ # Working directory for generating IMG MSA - Subunit A - Subunit B - ... N2monomertemplatesearch/ # Working directory for searching monomer templates - Subunit A - Subunit B - ... N3monomerstructuregeneration/ # Working directory for generating monomer structural predictions - Subunit A - Subunit B - ... N4monomeralignmentsconcatenation/ # Working directory for concatenating the monomer MSAs N5monomertemplatessearch/ # Working directory for concatenating the monomer templates N6multimerstructuregeneration/ # Working directory for generating multimer structural predictions N7monomeronlystructureevaluation # Working directory for evaluating monomer structural predictions - Subunit A # Rankings for all the predictions - alphafoldranking.csv # AlphaFold2 pLDDT ranking - pairwiseranking.tm # Pairwise (APOLLO) ranking - pairwiseaf_avg.ranking # Average ranking of the two

        # Rankings for the predictions generated by monomer structure prediction
        - alphafold_ranking_monomer.csv    # AlphaFold2 pLDDT ranking 
        - pairwise_af_avg_monomer.ranking  # Average ranking 

        # Rankings for the predictions extracted from multimer predictions
        - alphafold_ranking_multimer.csv   # AlphaFold2 pLDDT ranking 
        - pairwise_af_avg_multimer.ranking # Average ranking 

    - Subunit B
    - ...
N7_multimer_structure_evaluation           # Working directory for evaluating multimer structural predictions
    - alphafold_ranking.csv                # AlphaFold2 pLDDT ranking
    - multieva.csv                         # Pairwise ranking using MMalign
    - pairwise_af_avg.ranking              # Average ranking of the two

```

The predictions and ranking files are saved in N7multimerstructure_evaluation, similarly, you can check the AlphaFold-Multimer confidence score ranking file (alphafoldranking.csv) to look for the structure with the highest predicted confidence score generated by AlphaFold-Multimer. The multieva.csv and *pairwiseaf_avg.ranking* are the other two ranking files.
The monomer structures and ranking files are saved in N7monomeronlystructureevaluation if you want to check the predictions and rankings for the monomer structures.

CASP16 Talks

Our CASP16 talk for protein complex structure prediction:

https://predictioncenter.org/casp16/doc/presentations/Day-2/Day2-05-Cheng-CASP16MULTICOMredacted.pdf

Our CASP16 talk for protein model quality assessment:

https://predictioncenter.org/casp16/doc/presentations/Day-2/Day2-15-Neupane-CASP16MULTICOMEMA.pptx

Our CASP16 talk for protein-ligand binding affinity prediction:

https://predictioncenter.org/casp16/doc/presentations/Day-3/Day3-14-Morehead-MULTICOM_ligand.pptx

Citing this work

``` @article{liu2025improving, title={Improving AlphaFold2-and AlphaFold3-Based Protein Complex Structure Prediction With MULTICOM4 in CASP16}, author={Liu, Jian and Neupane, Pawan and Cheng, Jianlin}, journal={Proteins: Structure, Function, and Bioinformatics}, year={2025}, publisher={Wiley Online Library} }

@article{liu2025boosting, title={Boosting AlphaFold Protein Tertiary Structure Prediction through MSA Engineering and Extensive Model Sampling and Ranking in CASP16}, author={Liu, Jian and Neupane, Pawan and Cheng, Jianlin}, journal={bioRxiv}, pages={2025--06}, year={2025}, publisher={Cold Spring Harbor Laboratory} } ```

Owner

Name: BioinfoMachineLearning
Login: BioinfoMachineLearning
Kind: organization

Repositories: 29
Profile: https://github.com/BioinfoMachineLearning

GitHub Events

Total

Watch event: 8
Delete event: 1
Public event: 1
Push event: 27
Fork event: 2

Last Year

Watch event: 8
Delete event: 1
Public event: 1
Push event: 27
Fork event: 2

https://github.com/bioinfomachinelearning/multicom4

Science Score: 49.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

MULTICOM4 protein structure prediction system

Some CASP16 Prediction Examples

The workflow of the MULTICOM4 protein complex structure prediction system used in CASP16

The workflow of the MULTICOM4 protein tertiary structure prediction system used in CASP16

Download MULTICOM4 package

Installation (non Docker version)

Install AlphaFold/AlphaFold-Multimer and other required third-party packages (modified from alphafoldnondocker)

Install miniconda

Create a new conda environment and update

Activate conda environment

Install dependencies

Download chemical properties to the common folder

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

Apply OpenMM patch

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

or

Install other required third-party packages

if running jackhmmer returns error: libgsl.so.25: cannot open shared object file: No such file or directory

Install third-party packages envorinments

DHR

ESMFold

AFsample

Download Genetic databases in AlphaFold2/AlphaFold-Multimer

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

Install the MULTICOM4 addon system and its databases

Note: here the parameters should be the absolute paths

Configure the MULTICOM4 system

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory

Replace $YOURALPHAFOLDDB_DIR with your downloaded AlphaFold databases directory

e.g,

python downloaddatabaseand_tools.py \

--multicom4dbdir /home/multicom4/tools/multicom4db

python configure.py \

--multicom4dbdir /home/multicom4/tools/multicom4db \

--afdbdir /home/multicom4/tools/alphafolddatabases/

Genetic databases used by MULTICOM4

Key Parameters for Running MULTICOM4

Number of models generated per checkpoint for monomer predictions using AlphaFold2. Default: 100.

Note: Each AlphaFold2 predictor will generate 100 * 5 models.

Configuration for hetero-multimer predictions using AlphaFold-Multimer. Default: 100 models.

Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.

Configuration for homo-multimer predictions using AlphaFold-Multimer. Default: 100 models.

Note: Each AlphaFold-Multimer predictor will generate 100 * 5 models.

Before running the system for non Docker version

Activate your python environment and add the MULTICOM4 system path to PYTHONPATH

Replace $MULTICOM4INSTALLDIR with your MULTICOM4 installation directory (absolute path)

e.g,

conda activate MULTICOM4

export PYTHONPATH=/home/multicom4/MULTICOM4

Running the monomer/tertiary structure prediction pipeline

Please provide absolute path for the input parameters

Output

Running the multimer/quaternary structure prediction pipeline

Folding a multimer

Please provide absolute path for the input parameters

Output

CASP16 Talks

Citing this work

Owner

GitHub Events

Total

Last Year

Dependencies