pygen-structures

pygen-structures: A Python package to generate 3D molecular structures for simulations using the CHARMM forcefield - Published in JOSS (2020)

https://github.com/thesketh/pygen-structures

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 1 DOI reference(s) in JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 4 committers (25.0%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Physics Physical Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

3D molecular structure generation for MD simulation

Basic Info
Statistics
  • Stars: 9
  • Watchers: 2
  • Forks: 3
  • Open Issues: 6
  • Releases: 4
Created almost 6 years ago · Last pushed over 5 years ago
Metadata Files
Readme Contributing License

README.md

pygen-structures

Documentation StatusBuild Statuscodecovstatus

pygen-structures (pigeon structures) is a Python utility which allows for the generation of 3 dimensional molecular structures which can be used in molecular dynamics or Monte Carlo simulations. Molecules are generated from a list of residues and patches in the format of the CHARMM forcefield, and can be written out as valid PSF and PDB files. The package can be used as a command line utility, or as a Python library.

pygen-structures can be installed using pip (pip install pygen-structures), but relies upon RDKit, which is not pip-installable. RDKit can be installed in many ways, but the easiest way is to use the conda package manager. For full installation instructions, see the 'Installation' section of the readme. Python 3.6 and 3.7 are supported. To run the tests, pytest and OpenMM are required.

In essence, pygen-structures aims (eventually) to be a complete psfgen replacement with more autonomous functionality. At present, structures for small molecules can be generated. This should make it significantly easier to perform combinatorial searches on particular sequence lengths and linkages. This requires no manual intervention provided the molecules of interest are reasonably small (small enough that embedding coordinates is possible, and that the secondary structure is not vitally important) and the residue/patch definitions already exist in the forcefield.

Installation

There are other ways to install the required dependencies, but the easiest way by far is to use conda. Instructions, include the installation of test dependencies, are outlined below:

  1. Install the conda package manager. Make sure the conda executable is in your PATH.
  2. Set up a conda environment with the relevant dependencies (or install them in your base distribution). This can be done with the following command: conda create -n pygen-structures -c rdkit -c omnia 'python>=3.6' 'rdkit>=2018.3' numpy 'openmm>=7.4' pytest.
  3. Activate the conda environment: conda activate pygen-structures
  4. Use pip to install pygen-structures in this environment: pip install pygen-structures.
  5. Installation complete! You will have to activate this environment using conda activate pygen-structures each time you want to use it.
  6. Test the installation: pytest --pyargs pygen_structures

To install only the runtime dependencies, use the following command in step 2: conda create -n pygen-structures -c rdkit 'python>=3.6' 'rdkit>=2018.3' numpy

Command line usage

Command line usage for peptides is simple, and takes the following form:

pygen-structures SEQUENCE -o OUTPUT_PREFIX

Sequences are specified using the one letter protein code by default, and terminal patches can be supplied by using hyphens as delimiters (e.g. NNEU-AFK-CT2, note that both termini must be supplied). D-amino acids need only be preceded by a lowercase 'd'.

OUTPUTPREFIX.psf and OUTPUTPREFIX.pdb are created. If -o is not specified, the PDB file is written to stdout and no PSF file is generated.

The histidine form used can be set using --histidine, and defaults to HSE.

Patches can be supplied using --patches, the name of the patch, and the 0-based indices the patch is to be applied to (or 'FIRST'/'LAST').

To generate more complex structures, such as sugars, the residue names should be supplied (hyphen delimited) and the -u/--use-charmm-names option selected. Some example usage is given below.

--name and --segid control the names given in the COMPND record and the segment id respectively.

Examples

To produce a simple peptide sequence, the one letter code can be used. To produce the peptide HIS-GLU-TYR, creating HEY.psf and HEY.pdb:

pygen-structures HEY -o HEY

Supposing we think histidine should be protonated, we can change the protonation state of histidine by specifying a different histidine form:

pygen-structures HEY -o HEY --histidine HSP

Or we could use the three letter codes:

pygen-structures -u HSP-GLU-TYR -o HEY

For the trisaccharide raffinose, we must use the residue codes. The default segid is PROT, so we can specify a more specific segid using the --segid flag, and set the name given in the COMPND header using --name. The following command produces RAFF.psf and RAFF.pdb:

pygen-structures -u AGLC-BFRU-AGAL --patches RAFF 0 1 2 --segid RAFF --name Raffinose -o RAFF

We can also make glycopeptides. To link alpha-glucose to an arginine residue (in this case, from an ALA-ASN-ALA peptide), we can use the NGLA patch. Note that because the protein residue is not the last in the chain, we have to apply the C-terminus patch manually.

pygen-structures -u ALA-ASN-ALA-AGLC --patches CTER -2 NGLA 1 -1 -o ANA-NAGLC

By default, if parameters are missing then the files are not created and the missing parameters are written to stdout. Using the -v flag will disable verification.

$ pygen-structures AdP -o AdP Missing parameters: bonds {('CPD1', 'CC')} $ pygen-structures -v AdP -o AdP $

A different CHARMM distribution can be loaded using the -t option, with the path to the folder. pygen-structures ships with the latest CHARMM distribution (July 2019) at the time of writing, with some modifications to correct the D-amino acid parameters (these modifications are highlighted in the toppar README). The function which parses the folder will pick the latest versions of the parameter and topology files (36 over 27, 36m over 36), so if you plan on using an older version of the forcefield (this is not recommended) you will have to remove the newer versions and change the extensions to match the current conventions (.rtf, .prm).

Library usage

Information about classes and functions for usage as a library can be found on the project's ReadtheDocs page.

Contributing

Contributions are welcome! Please read our Code of Conduct.

Owner

  • Name: Travis Hesketh
  • Login: thesketh
  • Kind: user
  • Location: London

Senior software developer working in healthcare. Former chem PGR.

JOSS Publication

pygen-structures: A Python package to generate 3D molecular structures for simulations using the CHARMM forcefield
Published
April 13, 2020
Volume 5, Issue 48, Page 2157
Authors
Travis Hesketh ORCID
University of Strathclyde, Department of Pure and Applied Chemistry
Editor
Richard Gowers ORCID
Tags
python computational chemistry chemistry molecular dynamics

GitHub Events

Total
Last Year

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 69
  • Total Committers: 4
  • Avg Commits per committer: 17.25
  • Development Distribution Score (DDS): 0.493
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Travis Hesketh t****h@s****k 35
Travis Hesketh t****s@h****t 32
avanteijlingen 5****n 1
Arfon Smith a****n 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 9
  • Total pull requests: 2
  • Average time to close issues: 4 days
  • Average time to close pull requests: 26 minutes
  • Total issue authors: 2
  • Total pull request authors: 2
  • Average comments per issue: 0.78
  • Average comments per pull request: 1.0
  • Merged pull requests: 2
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 0
  • Pull request authors: 0
  • Average comments per issue: 0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • thesketh (8)
  • amandadumi (1)
Pull Request Authors
  • arfon (1)
  • avanteijlingen (1)
Top Labels
Issue Labels
bug (3) documentation (2) enhancement (1) testing (1) typing (1)
Pull Request Labels

Packages

  • Total packages: 1
  • Total downloads:
    • pypi 15 last-month
  • Total dependent packages: 0
  • Total dependent repositories: 1
  • Total versions: 6
  • Total maintainers: 1
pypi.org: pygen-structures

3D molecular structure generation for MD simulation

  • Versions: 6
  • Dependent Packages: 0
  • Dependent Repositories: 1
  • Downloads: 15 Last month
Rankings
Dependent packages count: 10.1%
Forks count: 16.9%
Stargazers count: 17.7%
Average: 21.5%
Dependent repos count: 21.6%
Downloads: 41.4%
Maintainers (1)
Last synced: 4 months ago

Dependencies

environment.yml conda
  • numpy
  • rdkit >=2018.03
setup.py pypi
  • numpy *