cg2all

Convert coarse-grained protein structure to all-atom model

https://github.com/huhlim/cg2all

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 8 DOI reference(s) in README
✓
Academic publication links
Links to: biorxiv.org, zenodo.org
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Convert coarse-grained protein structure to all-atom model

Basic Info

Host: GitHub
Owner: huhlim
License: apache-2.0
Language: Python
Default Branch: main
Homepage: https://huggingface.co/spaces/huhlim/cg2all
Size: 130 MB

Statistics

Stars: 38
Watchers: 2
Forks: 10
Open Issues: 13
Releases: 5

Created over 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme License Citation

cg2all

Convert coarse-grained protein structure to all-atom model

Web server / Google Colab notebook

A demo web page is available for conversions of CG model to all-atom structure via Huggingface space.

A Google Colab notebook is available for tasks: - Task 1: Conversion of an all-atom structure to a CG model using convert_all2cg - Task 2: Conversion of a CG model to an all-atom structure using convert_cg2all - Task 3: Conversion of a CG simulation trajectory to an atomistic simulation trajectory using convert_cg2all

A Google Colab notebook is available for local optimization of a protein model structure against a cryo-EM density map using cryoemminimizer.py

Installation

These steps will install Python libraries including cg2all (this repository), a modified MDTraj, a modified SE3Transformer, and other dependent libraries. The installation steps also place executables convert_cg2all and convert_all2cg in your python binary directory.

This package is tested on Linux (CentOS) and MacOS (Apple Silicon, M1).

for CPU only

bash pip install git+http://github.com/huhlim/cg2all

for CUDA (GPU) usage

Install Miniconda
Create an environment with DGL library with CUDA support bash # This is an example with cudatoolkit=11.3. # Set a proper cudatoolkit version that is compatible with your CUDA driver and DGL library. # dgl>=1.1 occasionally raises some errors, so please use dgl<=1.0. conda create --name cg2all pip cudatoolkit=11.3 dgl=1.0 -c dglteam/label/cu113
Activate the environment bash conda activate cg2all
Install this package bash pip install git+http://github.com/huhlim/cg2all

for cryoemminimizer usage

You need additional python package, mrcfile to deal with cryo-EM density map. bash pip install mrcfile

Usages

convert_cg2all

convert a coarse-grained protein structure to all-atom model ```bash usage: convertcg2all [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [-opdb OUTPDBFN] [--cg {supportedcgmodels}] [--chain-break-cutoff CHAINBREAKCUTOFF] [-a] [--fix] [--ckpt CKPTFN] [--time TIME_JSON] [--device DEVICE] [--batch BATCHSIZE] [--proc NPROC]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUTFN -opdb OUTPDBFN --cg {supportedcgmodels} --chain-break-cutoff CHAINBREAKCUTOFF -a, --all, --isall --fix, --fixatom --standard-name --ckpt CKPTFN --time TIMEJSON --device DEVICE --batch BATCHSIZE --proc NPROC ```

arguments

-p/--pdb: Input PDB file (mandatory).
-d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
-o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
-opdb: If a DCD file is given, it will write the last snapshot as a PDB file. (optional)
--cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
- CalphaBasedModel: CA-trace (atom names should be "CA")
- ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
- SidechainModel: Sidechain center-of-mass (atom names should be "SC")
- CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
- CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
- BackboneModel: Model only with backbone atoms (N, CA, C)
- MainchainModel: Model only with mainchain atoms (N, CA, C, O)
- Martini: Martini model
- Martini3: Martini3 model
- PRIMO: PRIMO model
--chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
--fix/--fix_atom: preserve coordinates in the input CG model. For example, CA coordinates in a CA-trace model will be kept in its cg2all output model.
--standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
--ckpt: Input PyTorch ckpt file (optional). If a ckpt file is given, it will override "--cg" option.
--time: Output JSON file for recording timing. (optional)
--device: Specify a device to run the model. (optional) You can choose "cpu" or "cuda", or the script will detect one automatically.
"cpu" is usually faster than "cuda" unless the input/output system is really big or you provided a DCD file with many frames because it takes a lot for loading a model ckpt file on a GPU.
--batch: the number of frames to be dealt at a time. (optional, default=1)
--proc: Specify the number of threads for loading input data. It is only used for dealing with a DCD file. (optional, default=OMPNUMTHREADS or 1)

examples

Conversion of a PDB file bash convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --cg CalphaBasedModel Conversion of a DCD trajectory file bash convert_cg2all -p tests/1jni.calpha.pdb -d tests/1jni.calpha.dcd -o tests/1jni.calpha.all.dcd --cg CalphaBasedModel Conversion of a PDB file using a ckpt file bash convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --ckpt CalphaBasedModel-104.ckpt

convert_all2cg

convert an all-atom protein structure to coarse-grained model ```bash usage: convertall2cg [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [--cg {supportedcgmodels}]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUT_FN --cg ```

arguments

-p/--pdb: Input PDB file (mandatory).
-d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
-o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
--cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
- CalphaBasedModel: CA-trace (atom names should be "CA")
- ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
- SidechainModel: Sidechain center-of-mass (atom names should be "SC")
- CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
- CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
- BackboneModel: Model only with backbone atoms (N, CA, C)
- MainchainModel: Model only with mainchain atoms (N, CA, C, O)
- Martini: Martini model
- Martini3: Martini3 model
- PRIMO: PRIMO model

an example

bash convert_all2cg -p tests/1ab1_A.pdb -o tests/1ab1_A.calpha.pdb --cg CalphaBasedModel

script/cryoemminimizer.py

Local optimization of protein model structure against given electron density map. This script is a proof-of-concept that utilizes cg2all network to optimize at CA-level resolution with objective functions in both atomistic and CA-level resolutions. It is highly recommended to use cuda environment. ```bash usage: cryoemminimizer [-h] -p INPDBFN -m INMAPFN -o OUT_DIR [-a] [-n N_STEP] [--freq OUTPUTFREQ] [--chain-break-cutoff CHAINBREAKCUTOFF] [--restraint RESTRAINT] [--cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res}] [--standard-name] [--uniformrestraint] [--nonuniformrestraint] [--segment SEGMENTS]

options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -m INMAPFN, --map INMAPFN -o OUTDIR, --out OUTDIR, --output OUTDIR -a, --all, --isall -n NSTEP, --step NSTEP --freq OUTPUTFREQ, --outputfreq OUTPUTFREQ --chain-break-cutoff CHAINBREAKCUTOFF --restraint RESTRAINT --cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res} --standard-name --uniformrestraint --nonuniformrestraint --segment SEGMENTS ```

arguments

-p/--pdb: Input PDB file (mandatory).
-m/--map: Input electron density map file in the MRC or CCP4 format (mandatory).
-o/--out/--output: Output directory to save optimized structures (mandatory).
-a/--all/--is_all: Whether the input PDB file is atomistic structure or not. (optional, default=False)
-n/--step: The number of minimization steps. (optional, default=1000)
--freq/--output_freq: The interval between saving intermediate outputs. (optional, default=100)
--chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
--restraint: The weight of distance restraints. (optional, default=100.0)
--cg: Coarse-grained representation to use (default=ResidueBasedModel)
--standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
--uniformrestraint/--nonuniformrestraint: Whether to use uniform restraints. (default=True) If it is set to False, the restraint weights will be dependent on the pLDDT values recorded in the PDB file's B-factor columns.
--segment: The segmentation method for applying rigid-body operations. (default=None)
- None: Input structure is not segmented, so the same rigid-body operations are applied to the whole structure.
- chain: Input structure is segmented based on chain IDs. Rigid-body operations are independently applied to each chain.
- segment: Similar to "chain" option, but the structure is segmented based on peptide bond connectivities.
- 0-99,100-199: Explicit segmentation based on the 0-index based residue numbers.

an example

bash ./cg2all/script/cryo_em_minimizer.py -p tests/3isr.af2.pdb -m tests/3isr_5.mrc -o 3isr_5+3isr.af2 --all

Datasets

The training/validation/test sets are available at zenodo.

Reference

Lim Heo & Michael Feig, "One particle per residue is sufficient to describe all-atom protein structures", bioRxiv (2023). Link

Owner

Name: Lim Heo
Login: huhlim
Kind: user
Location: Cambridge, MA
Company: Bristol Myers Squibb

Website: https://scholar.google.com/citations?user=73JdVH0AAAAJ
Twitter: huhlim
Repositories: 19
Profile: https://github.com/huhlim

Citation (CITATION.cff)

cff-version: 1.1.0
message: "If you use this software, please cite it as below."
authors:
- family-names: Heo
  given-names: Lim
- family-names: Feig
  given-names: Michael
orcid: https://orcid.org/0000-0002-3153-2363
title: One bead per residue can describe all-atom protein structures
version: v1.3.1
date-released: 2023-10-16

GitHub Events

Total

Issues event: 7
Watch event: 8
Delete event: 2
Issue comment event: 2
Push event: 7
Pull request event: 8
Fork event: 3
Create event: 6

Last Year

Issues event: 7
Watch event: 8
Delete event: 2
Issue comment event: 2
Push event: 7
Pull request event: 8
Fork event: 3
Create event: 6

Issues and Pull Requests

Last synced: about 2 years ago

All Time

Total issues: 17
Total pull requests: 6
Average time to close issues: 12 days
Average time to close pull requests: less than a minute
Total issue authors: 8
Total pull request authors: 1
Average comments per issue: 2.06
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 17
Pull requests: 6
Average time to close issues: 12 days
Average time to close pull requests: less than a minute
Issue authors: 8
Pull request authors: 1
Average comments per issue: 2.06
Average comments per pull request: 0.0
Merged pull requests: 6
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

huhlim (3)
shadow1229 (2)
ntxxt (1)
marbesu-instadeep (1)
dhq1216dhq (1)
lucajovine (1)
JMB-Scripts (1)
lunatictaco (1)
aashuph16221 (1)
aminsagar (1)
TomMakkink (1)
kilbyman15 (1)

Pull Request Authors

huhlim (9)

Top Labels

Issue Labels

Pull Request Labels

codex (6)

Dependencies

poetry.lock pypi

aiohttp 3.8.4 develop
aiosignal 1.3.1 develop
async-timeout 4.0.2 develop
attrs 22.2.0 develop
frozenlist 1.3.3 develop
fsspec 2023.3.0 develop
lightning-utilities 0.8.0 develop
multidict 6.0.4 develop
pytorch-lightning 1.9.4 develop
torchmetrics 0.11.4 develop
yarl 1.8.2 develop
absl-py 1.4.0
astunparse 1.6.3
certifi 2022.12.7
charset-normalizer 3.1.0
colorama 0.4.6
contextlib2 21.6.0
dgl 1.0.1
e3nn 0.5.1
idna 3.4
mdtraj 1.9.8.dev0
ml-collections 0.1.1
mpmath 1.3.0
networkx 3.0
numpy 1.24.2
nvidia-cublas-cu11 11.10.3.66
nvidia-cuda-nvrtc-cu11 11.7.99
nvidia-cuda-runtime-cu11 11.7.99
nvidia-cudnn-cu11 8.5.0.96
opt-einsum 3.3.0
opt-einsum-fx 0.1.4
packaging 23.0
psutil 5.9.4
pyparsing 3.0.9
pyyaml 6.0
requests 2.28.2
scipy 1.9.3
se3-transformer 0.1.0
setuptools 67.6.1
six 1.16.0
sympy 1.11.1
torch 1.13.1
tqdm 4.65.0
typing-extensions 4.5.0
urllib3 1.26.15
wheel 0.40.0

pyproject.toml pypi

cg2all

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

cg2all

Web server / Google Colab notebook

Installation

for CPU only

for CUDA (GPU) usage

for cryoemminimizer usage

Usages

convert_cg2all

arguments

examples

convert_all2cg

arguments

an example

script/cryoemminimizer.py

arguments

an example

Datasets

Reference

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies