cg2all

Convert coarse-grained protein structure to all-atom model

https://github.com/huhlim/cg2all

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 8 DOI reference(s) in README
  • Academic publication links
    Links to: biorxiv.org, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.7%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Convert coarse-grained protein structure to all-atom model

Basic Info
Statistics
  • Stars: 38
  • Watchers: 2
  • Forks: 10
  • Open Issues: 13
  • Releases: 5
Created over 3 years ago · Last pushed 11 months ago
Metadata Files
Readme License Citation

README.md

cg2all

Convert coarse-grained protein structure to all-atom model

Web server / Google Colab notebook

Hugging Face Spaces
A demo web page is available for conversions of CG model to all-atom structure via Huggingface space.

Google Colab
A Google Colab notebook is available for tasks: - Task 1: Conversion of an all-atom structure to a CG model using convert_all2cg - Task 2: Conversion of a CG model to an all-atom structure using convert_cg2all - Task 3: Conversion of a CG simulation trajectory to an atomistic simulation trajectory using convert_cg2all

Google Colab
A Google Colab notebook is available for local optimization of a protein model structure against a cryo-EM density map using cryoemminimizer.py

Installation

These steps will install Python libraries including cg2all (this repository), a modified MDTraj, a modified SE3Transformer, and other dependent libraries. The installation steps also place executables convert_cg2all and convert_all2cg in your python binary directory.

This package is tested on Linux (CentOS) and MacOS (Apple Silicon, M1).

for CPU only

bash pip install git+http://github.com/huhlim/cg2all

for CUDA (GPU) usage

  1. Install Miniconda
  2. Create an environment with DGL library with CUDA support bash # This is an example with cudatoolkit=11.3. # Set a proper cudatoolkit version that is compatible with your CUDA driver and DGL library. # dgl>=1.1 occasionally raises some errors, so please use dgl<=1.0. conda create --name cg2all pip cudatoolkit=11.3 dgl=1.0 -c dglteam/label/cu113
  3. Activate the environment bash conda activate cg2all
  4. Install this package bash pip install git+http://github.com/huhlim/cg2all

for cryoemminimizer usage

You need additional python package, mrcfile to deal with cryo-EM density map. bash pip install mrcfile

Usages

convert_cg2all

convert a coarse-grained protein structure to all-atom model ```bash usage: convertcg2all [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [-opdb OUTPDBFN] [--cg {supportedcgmodels}] [--chain-break-cutoff CHAINBREAKCUTOFF] [-a] [--fix] [--ckpt CKPTFN] [--time TIME_JSON] [--device DEVICE] [--batch BATCHSIZE] [--proc NPROC]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUTFN -opdb OUTPDBFN --cg {supportedcgmodels} --chain-break-cutoff CHAINBREAKCUTOFF -a, --all, --isall --fix, --fixatom --standard-name --ckpt CKPTFN --time TIMEJSON --device DEVICE --batch BATCHSIZE --proc NPROC ```

arguments

  • -p/--pdb: Input PDB file (mandatory).
  • -d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
  • -o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
  • -opdb: If a DCD file is given, it will write the last snapshot as a PDB file. (optional)
  • --cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
    • CalphaBasedModel: CA-trace (atom names should be "CA")
    • ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
    • SidechainModel: Sidechain center-of-mass (atom names should be "SC")
    • CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
    • CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
    • BackboneModel: Model only with backbone atoms (N, CA, C)
    • MainchainModel: Model only with mainchain atoms (N, CA, C, O)
    • Martini: Martini model
    • Martini3: Martini3 model
    • PRIMO: PRIMO model
  • --chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
  • --fix/--fix_atom: preserve coordinates in the input CG model. For example, CA coordinates in a CA-trace model will be kept in its cg2all output model.
  • --standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
  • --ckpt: Input PyTorch ckpt file (optional). If a ckpt file is given, it will override "--cg" option.
  • --time: Output JSON file for recording timing. (optional)
  • --device: Specify a device to run the model. (optional) You can choose "cpu" or "cuda", or the script will detect one automatically.
    "cpu" is usually faster than "cuda" unless the input/output system is really big or you provided a DCD file with many frames because it takes a lot for loading a model ckpt file on a GPU.
  • --batch: the number of frames to be dealt at a time. (optional, default=1)
  • --proc: Specify the number of threads for loading input data. It is only used for dealing with a DCD file. (optional, default=OMPNUMTHREADS or 1)

examples

Conversion of a PDB file bash convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --cg CalphaBasedModel Conversion of a DCD trajectory file bash convert_cg2all -p tests/1jni.calpha.pdb -d tests/1jni.calpha.dcd -o tests/1jni.calpha.all.dcd --cg CalphaBasedModel Conversion of a PDB file using a ckpt file bash convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --ckpt CalphaBasedModel-104.ckpt


convert_all2cg

convert an all-atom protein structure to coarse-grained model ```bash usage: convertall2cg [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [--cg {supportedcgmodels}]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUT_FN --cg ```

arguments

  • -p/--pdb: Input PDB file (mandatory).
  • -d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
  • -o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
  • --cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
    • CalphaBasedModel: CA-trace (atom names should be "CA")
    • ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
    • SidechainModel: Sidechain center-of-mass (atom names should be "SC")
    • CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
    • CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
    • BackboneModel: Model only with backbone atoms (N, CA, C)
    • MainchainModel: Model only with mainchain atoms (N, CA, C, O)
    • Martini: Martini model
    • Martini3: Martini3 model
    • PRIMO: PRIMO model

an example

bash convert_all2cg -p tests/1ab1_A.pdb -o tests/1ab1_A.calpha.pdb --cg CalphaBasedModel


script/cryoemminimizer.py

Local optimization of protein model structure against given electron density map. This script is a proof-of-concept that utilizes cg2all network to optimize at CA-level resolution with objective functions in both atomistic and CA-level resolutions. It is highly recommended to use cuda environment. ```bash usage: cryoemminimizer [-h] -p INPDBFN -m INMAPFN -o OUT_DIR [-a] [-n N_STEP] [--freq OUTPUTFREQ] [--chain-break-cutoff CHAINBREAKCUTOFF] [--restraint RESTRAINT] [--cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res}] [--standard-name] [--uniformrestraint] [--nonuniformrestraint] [--segment SEGMENTS]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -m INMAPFN, --map INMAPFN -o OUTDIR, --out OUTDIR, --output OUTDIR -a, --all, --isall -n NSTEP, --step NSTEP --freq OUTPUTFREQ, --outputfreq OUTPUTFREQ --chain-break-cutoff CHAINBREAKCUTOFF --restraint RESTRAINT --cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res} --standard-name --uniformrestraint --nonuniformrestraint --segment SEGMENTS ```

arguments

  • -p/--pdb: Input PDB file (mandatory).
  • -m/--map: Input electron density map file in the MRC or CCP4 format (mandatory).
  • -o/--out/--output: Output directory to save optimized structures (mandatory).
  • -a/--all/--is_all: Whether the input PDB file is atomistic structure or not. (optional, default=False)
  • -n/--step: The number of minimization steps. (optional, default=1000)
  • --freq/--output_freq: The interval between saving intermediate outputs. (optional, default=100)
  • --chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
  • --restraint: The weight of distance restraints. (optional, default=100.0)
  • --cg: Coarse-grained representation to use (default=ResidueBasedModel)
  • --standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
  • --uniformrestraint/--nonuniformrestraint: Whether to use uniform restraints. (default=True) If it is set to False, the restraint weights will be dependent on the pLDDT values recorded in the PDB file's B-factor columns.
  • --segment: The segmentation method for applying rigid-body operations. (default=None)
    • None: Input structure is not segmented, so the same rigid-body operations are applied to the whole structure.
    • chain: Input structure is segmented based on chain IDs. Rigid-body operations are independently applied to each chain.
    • segment: Similar to "chain" option, but the structure is segmented based on peptide bond connectivities.
    • 0-99,100-199: Explicit segmentation based on the 0-index based residue numbers.

an example

bash ./cg2all/script/cryo_em_minimizer.py -p tests/3isr.af2.pdb -m tests/3isr_5.mrc -o 3isr_5+3isr.af2 --all

Datasets

The training/validation/test sets are available at zenodo.

Reference

Lim Heo & Michael Feig, "One particle per residue is sufficient to describe all-atom protein structures", bioRxiv (2023). Link

DOI

Owner

  • Name: Lim Heo
  • Login: huhlim
  • Kind: user
  • Location: Cambridge, MA
  • Company: Bristol Myers Squibb

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Heo
  given-names: Lim
- family-names: Feig
  given-names: Michael
orcid: https://orcid.org/0000-0002-3153-2363
title: One bead per residue can describe all-atom protein structures
version: v1.3.1
date-released: 2023-10-16

GitHub Events

Total
  • Issues event: 7
  • Watch event: 8
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 7
  • Pull request event: 8
  • Fork event: 3
  • Create event: 6
Last Year
  • Issues event: 7
  • Watch event: 8
  • Delete event: 2
  • Issue comment event: 2
  • Push event: 7
  • Pull request event: 8
  • Fork event: 3
  • Create event: 6

Issues and Pull Requests

Last synced: almost 2 years ago

All Time
  • Total issues: 17
  • Total pull requests: 6
  • Average time to close issues: 12 days
  • Average time to close pull requests: less than a minute
  • Total issue authors: 8
  • Total pull request authors: 1
  • Average comments per issue: 2.06
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 17
  • Pull requests: 6
  • Average time to close issues: 12 days
  • Average time to close pull requests: less than a minute
  • Issue authors: 8
  • Pull request authors: 1
  • Average comments per issue: 2.06
  • Average comments per pull request: 0.0
  • Merged pull requests: 6
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • huhlim (3)
  • shadow1229 (2)
  • ntxxt (1)
  • marbesu-instadeep (1)
  • dhq1216dhq (1)
  • lucajovine (1)
  • JMB-Scripts (1)
  • lunatictaco (1)
  • aashuph16221 (1)
  • aminsagar (1)
  • TomMakkink (1)
  • kilbyman15 (1)
Pull Request Authors
  • huhlim (9)
Top Labels
Issue Labels
Pull Request Labels
codex (6)

Dependencies

poetry.lock pypi
  • aiohttp 3.8.4 develop
  • aiosignal 1.3.1 develop
  • async-timeout 4.0.2 develop
  • attrs 22.2.0 develop
  • frozenlist 1.3.3 develop
  • fsspec 2023.3.0 develop
  • lightning-utilities 0.8.0 develop
  • multidict 6.0.4 develop
  • pytorch-lightning 1.9.4 develop
  • torchmetrics 0.11.4 develop
  • yarl 1.8.2 develop
  • absl-py 1.4.0
  • astunparse 1.6.3
  • certifi 2022.12.7
  • charset-normalizer 3.1.0
  • colorama 0.4.6
  • contextlib2 21.6.0
  • dgl 1.0.1
  • e3nn 0.5.1
  • idna 3.4
  • mdtraj 1.9.8.dev0
  • ml-collections 0.1.1
  • mpmath 1.3.0
  • networkx 3.0
  • numpy 1.24.2
  • nvidia-cublas-cu11 11.10.3.66
  • nvidia-cuda-nvrtc-cu11 11.7.99
  • nvidia-cuda-runtime-cu11 11.7.99
  • nvidia-cudnn-cu11 8.5.0.96
  • opt-einsum 3.3.0
  • opt-einsum-fx 0.1.4
  • packaging 23.0
  • psutil 5.9.4
  • pyparsing 3.0.9
  • pyyaml 6.0
  • requests 2.28.2
  • scipy 1.9.3
  • se3-transformer 0.1.0
  • setuptools 67.6.1
  • six 1.16.0
  • sympy 1.11.1
  • torch 1.13.1
  • tqdm 4.65.0
  • typing-extensions 4.5.0
  • urllib3 1.26.15
  • wheel 0.40.0
pyproject.toml pypi