Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 8 DOI reference(s) in README -
✓Academic publication links
Links to: biorxiv.org, zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Repository
Convert coarse-grained protein structure to all-atom model
Basic Info
- Host: GitHub
- Owner: huhlim
- License: apache-2.0
- Language: Python
- Default Branch: main
- Homepage: https://huggingface.co/spaces/huhlim/cg2all
- Size: 130 MB
Statistics
- Stars: 38
- Watchers: 2
- Forks: 10
- Open Issues: 13
- Releases: 5
Metadata Files
README.md
cg2all
Convert coarse-grained protein structure to all-atom model
Web server / Google Colab notebook
A demo web page is available for conversions of CG model to all-atom structure via Huggingface space.
A Google Colab notebook is available for tasks:
- Task 1: Conversion of an all-atom structure to a CG model using convert_all2cg
- Task 2: Conversion of a CG model to an all-atom structure using convert_cg2all
- Task 3: Conversion of a CG simulation trajectory to an atomistic simulation trajectory using convert_cg2all
A Google Colab notebook is available for local optimization of a protein model structure against a cryo-EM density map using cryoemminimizer.py
Installation
These steps will install Python libraries including cg2all (this repository), a modified MDTraj, a modified SE3Transformer, and other dependent libraries. The installation steps also place executables convert_cg2all and convert_all2cg in your python binary directory.
This package is tested on Linux (CentOS) and MacOS (Apple Silicon, M1).
for CPU only
bash
pip install git+http://github.com/huhlim/cg2all
for CUDA (GPU) usage
- Install Miniconda
- Create an environment with DGL library with CUDA support
bash # This is an example with cudatoolkit=11.3. # Set a proper cudatoolkit version that is compatible with your CUDA driver and DGL library. # dgl>=1.1 occasionally raises some errors, so please use dgl<=1.0. conda create --name cg2all pip cudatoolkit=11.3 dgl=1.0 -c dglteam/label/cu113 - Activate the environment
bash conda activate cg2all - Install this package
bash pip install git+http://github.com/huhlim/cg2all
for cryoemminimizer usage
You need additional python package, mrcfile to deal with cryo-EM density map.
bash
pip install mrcfile
Usages
convert_cg2all
convert a coarse-grained protein structure to all-atom model ```bash usage: convertcg2all [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [-opdb OUTPDBFN] [--cg {supportedcgmodels}] [--chain-break-cutoff CHAINBREAKCUTOFF] [-a] [--fix] [--ckpt CKPTFN] [--time TIME_JSON] [--device DEVICE] [--batch BATCHSIZE] [--proc NPROC]
options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUTFN -opdb OUTPDBFN --cg {supportedcgmodels} --chain-break-cutoff CHAINBREAKCUTOFF -a, --all, --isall --fix, --fixatom --standard-name --ckpt CKPTFN --time TIMEJSON --device DEVICE --batch BATCHSIZE --proc NPROC ```
arguments
- -p/--pdb: Input PDB file (mandatory).
- -d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
- -o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
- -opdb: If a DCD file is given, it will write the last snapshot as a PDB file. (optional)
- --cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
- CalphaBasedModel: CA-trace (atom names should be "CA")
- ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
- SidechainModel: Sidechain center-of-mass (atom names should be "SC")
- CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
- CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
- BackboneModel: Model only with backbone atoms (N, CA, C)
- MainchainModel: Model only with mainchain atoms (N, CA, C, O)
- Martini: Martini model
- Martini3: Martini3 model
- PRIMO: PRIMO model
- --chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
- --fix/--fix_atom: preserve coordinates in the input CG model. For example, CA coordinates in a CA-trace model will be kept in its cg2all output model.
- --standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
- --ckpt: Input PyTorch ckpt file (optional). If a ckpt file is given, it will override "--cg" option.
- --time: Output JSON file for recording timing. (optional)
- --device: Specify a device to run the model. (optional) You can choose "cpu" or "cuda", or the script will detect one automatically. "cpu" is usually faster than "cuda" unless the input/output system is really big or you provided a DCD file with many frames because it takes a lot for loading a model ckpt file on a GPU.
- --batch: the number of frames to be dealt at a time. (optional, default=1)
- --proc: Specify the number of threads for loading input data. It is only used for dealing with a DCD file. (optional, default=OMPNUMTHREADS or 1)
examples
Conversion of a PDB file
bash
convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --cg CalphaBasedModel
Conversion of a DCD trajectory file
bash
convert_cg2all -p tests/1jni.calpha.pdb -d tests/1jni.calpha.dcd -o tests/1jni.calpha.all.dcd --cg CalphaBasedModel
Conversion of a PDB file using a ckpt file
bash
convert_cg2all -p tests/1ab1_A.calpha.pdb -o tests/1ab1_A.calpha.all.pdb --ckpt CalphaBasedModel-104.ckpt
convert_all2cg
convert an all-atom protein structure to coarse-grained model ```bash usage: convertall2cg [-h] -p INPDBFN [-d INDCDFN] -o OUTFN [--cg {supportedcgmodels}]
options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -d INDCDFN, --dcd INDCDFN -o OUTFN, --out OUTFN, --output OUT_FN --cg ```
arguments
- -p/--pdb: Input PDB file (mandatory).
- -d/--dcd: Input DCD file (optional). If a DCD file is given, the input PDB file will be used to define its topology.
- -o/--out/--output: Output PDB or DCD file (mandatory). If a DCD file is given, it will be a DCD file. Otherwise, a PDB file will be created.
- --cg: Coarse-grained representation to use (optional, default=CalphaBasedModel).
- CalphaBasedModel: CA-trace (atom names should be "CA")
- ResidueBasedModel: Residue center-of-mass (atom names should be "CA")
- SidechainModel: Sidechain center-of-mass (atom names should be "SC")
- CalphaCMModel: CA-trace + Residue center-of-mass (atom names should be "CA" and "CM")
- CalphaSCModel: CA-trace + Sidechain center-of-mass (atom names should be "CA" and "SC")
- BackboneModel: Model only with backbone atoms (N, CA, C)
- MainchainModel: Model only with mainchain atoms (N, CA, C, O)
- Martini: Martini model
- Martini3: Martini3 model
- PRIMO: PRIMO model
an example
bash
convert_all2cg -p tests/1ab1_A.pdb -o tests/1ab1_A.calpha.pdb --cg CalphaBasedModel
script/cryoemminimizer.py
Local optimization of protein model structure against given electron density map. This script is a proof-of-concept that utilizes cg2all network to optimize at CA-level resolution with objective functions in both atomistic and CA-level resolutions. It is highly recommended to use cuda environment. ```bash usage: cryoemminimizer [-h] -p INPDBFN -m INMAPFN -o OUT_DIR [-a] [-n N_STEP] [--freq OUTPUTFREQ] [--chain-break-cutoff CHAINBREAKCUTOFF] [--restraint RESTRAINT] [--cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res}] [--standard-name] [--uniformrestraint] [--nonuniformrestraint] [--segment SEGMENTS]
options: -h, --help show this help message and exit -p INPDBFN, --pdb INPDBFN -m INMAPFN, --map INMAPFN -o OUTDIR, --out OUTDIR, --output OUTDIR -a, --all, --isall -n NSTEP, --step NSTEP --freq OUTPUTFREQ, --outputfreq OUTPUTFREQ --chain-break-cutoff CHAINBREAKCUTOFF --restraint RESTRAINT --cg {CalphaBasedModel,CA,ca,ResidueBasedModel,RES,res} --standard-name --uniformrestraint --nonuniformrestraint --segment SEGMENTS ```
arguments
- -p/--pdb: Input PDB file (mandatory).
- -m/--map: Input electron density map file in the MRC or CCP4 format (mandatory).
- -o/--out/--output: Output directory to save optimized structures (mandatory).
- -a/--all/--is_all: Whether the input PDB file is atomistic structure or not. (optional, default=False)
- -n/--step: The number of minimization steps. (optional, default=1000)
- --freq/--output_freq: The interval between saving intermediate outputs. (optional, default=100)
- --chain-break-cutoff: The CA-CA distance cutoff that determines chain breaks. (default=10 Angstroms)
- --restraint: The weight of distance restraints. (optional, default=100.0)
- --cg: Coarse-grained representation to use (default=ResidueBasedModel)
- --standard-name: output atom names follow the IUPAC nomenclature. (default=False; output atom names will use CHARMM atom names)
- --uniformrestraint/--nonuniformrestraint: Whether to use uniform restraints. (default=True) If it is set to False, the restraint weights will be dependent on the pLDDT values recorded in the PDB file's B-factor columns.
- --segment: The segmentation method for applying rigid-body operations. (default=None)
- None: Input structure is not segmented, so the same rigid-body operations are applied to the whole structure.
- chain: Input structure is segmented based on chain IDs. Rigid-body operations are independently applied to each chain.
- segment: Similar to "chain" option, but the structure is segmented based on peptide bond connectivities.
- 0-99,100-199: Explicit segmentation based on the 0-index based residue numbers.
an example
bash
./cg2all/script/cryo_em_minimizer.py -p tests/3isr.af2.pdb -m tests/3isr_5.mrc -o 3isr_5+3isr.af2 --all
Datasets
The training/validation/test sets are available at zenodo.
Reference
Lim Heo & Michael Feig, "One particle per residue is sufficient to describe all-atom protein structures", bioRxiv (2023). Link
Owner
- Name: Lim Heo
- Login: huhlim
- Kind: user
- Location: Cambridge, MA
- Company: Bristol Myers Squibb
- Website: https://scholar.google.com/citations?user=73JdVH0AAAAJ
- Twitter: huhlim
- Repositories: 19
- Profile: https://github.com/huhlim
Citation (CITATION.cff)
cff-version: 1.1.0 message: "If you use this software, please cite it as below." authors: - family-names: Heo given-names: Lim - family-names: Feig given-names: Michael orcid: https://orcid.org/0000-0002-3153-2363 title: One bead per residue can describe all-atom protein structures version: v1.3.1 date-released: 2023-10-16
GitHub Events
Total
- Issues event: 7
- Watch event: 8
- Delete event: 2
- Issue comment event: 2
- Push event: 7
- Pull request event: 8
- Fork event: 3
- Create event: 6
Last Year
- Issues event: 7
- Watch event: 8
- Delete event: 2
- Issue comment event: 2
- Push event: 7
- Pull request event: 8
- Fork event: 3
- Create event: 6
Issues and Pull Requests
Last synced: almost 2 years ago
All Time
- Total issues: 17
- Total pull requests: 6
- Average time to close issues: 12 days
- Average time to close pull requests: less than a minute
- Total issue authors: 8
- Total pull request authors: 1
- Average comments per issue: 2.06
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 17
- Pull requests: 6
- Average time to close issues: 12 days
- Average time to close pull requests: less than a minute
- Issue authors: 8
- Pull request authors: 1
- Average comments per issue: 2.06
- Average comments per pull request: 0.0
- Merged pull requests: 6
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- huhlim (3)
- shadow1229 (2)
- ntxxt (1)
- marbesu-instadeep (1)
- dhq1216dhq (1)
- lucajovine (1)
- JMB-Scripts (1)
- lunatictaco (1)
- aashuph16221 (1)
- aminsagar (1)
- TomMakkink (1)
- kilbyman15 (1)
Pull Request Authors
- huhlim (9)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- aiohttp 3.8.4 develop
- aiosignal 1.3.1 develop
- async-timeout 4.0.2 develop
- attrs 22.2.0 develop
- frozenlist 1.3.3 develop
- fsspec 2023.3.0 develop
- lightning-utilities 0.8.0 develop
- multidict 6.0.4 develop
- pytorch-lightning 1.9.4 develop
- torchmetrics 0.11.4 develop
- yarl 1.8.2 develop
- absl-py 1.4.0
- astunparse 1.6.3
- certifi 2022.12.7
- charset-normalizer 3.1.0
- colorama 0.4.6
- contextlib2 21.6.0
- dgl 1.0.1
- e3nn 0.5.1
- idna 3.4
- mdtraj 1.9.8.dev0
- ml-collections 0.1.1
- mpmath 1.3.0
- networkx 3.0
- numpy 1.24.2
- nvidia-cublas-cu11 11.10.3.66
- nvidia-cuda-nvrtc-cu11 11.7.99
- nvidia-cuda-runtime-cu11 11.7.99
- nvidia-cudnn-cu11 8.5.0.96
- opt-einsum 3.3.0
- opt-einsum-fx 0.1.4
- packaging 23.0
- psutil 5.9.4
- pyparsing 3.0.9
- pyyaml 6.0
- requests 2.28.2
- scipy 1.9.3
- se3-transformer 0.1.0
- setuptools 67.6.1
- six 1.16.0
- sympy 1.11.1
- torch 1.13.1
- tqdm 4.65.0
- typing-extensions 4.5.0
- urllib3 1.26.15
- wheel 0.40.0