E2EDNA 2.0

E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands - Published in JOSS (2022)

https://github.com/siminegroup/e2edna2

Science Score: 93.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 11 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org, zenodo.org
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software

Scientific Fields

Chemistry Physical Sciences - 60% confidence
Earth and Environmental Sciences Physical Sciences - 40% confidence
Biology Life Sciences - 40% confidence
Last synced: 4 months ago · JSON representation

Repository

Basic Info
  • Host: GitHub
  • Owner: siminegroup
  • License: gpl-3.0
  • Language: Python
  • Default Branch: main
  • Size: 23.9 MB
Statistics
  • Stars: 17
  • Watchers: 3
  • Forks: 7
  • Open Issues: 4
  • Releases: 1
Created almost 4 years ago · Last pushed over 1 year ago
Metadata Files
Readme Contributing License

README.MD

DOI DOI GitHub release

E2EDNA 2.0 - OpenMM Implementation of E2EDNA !

New feature: DeltaGzip [JCIM paper][code]

An automated pipeline for simulating DNA aptamers complexed with target ligands (peptide, DNA, RNA or small molecules).

  • Please note that the main branch is in ongoing development and tests may or may not work. For a fully working version use the released code v2.0.0

  • To view Tinker-based version of E2EDNA, refer to its GitHub repo and DOI. <!-- J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 -->

  • Interested in contributing to developing E2EDNA? Check out how to contribute here.

  • Please download the most recent release v2.0.0 here or here

Reference

If you use this code in any future publications, please cite our work using Kilgour et al., (2022). E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands. Journal of Open Source Software, 7(73), 4182

E2EDNA pipeline makes use of several other open-sourced software packages, therefore please be mindful of citing them as well: - NUPACK - MMB - OpenMM - LightDock <!-- - or LightDock_GitHub -->

Table of contents

1. Installation

  1. Download the E2EDNA 2.0 package from this repository.
  2. Locate macos_installation.sh in the downloaded E2EDNA2 codebase directory. Then at the codebase directory, run $ source macos_installation.sh in command line to create a conda virtual environment named e2edna and install required dependences. The e2edna environment should be activated when the installation script finishes, which means a string '(e2edna)' should show up at the beginning of the command line prompt.
    • If the script fails to activate the environment automatically, this is likely because $ conda activate e2edna command in the script gives an error such as Your shell has not been properly configured to use 'conda activate'.
    • If so, manually run $ source activate <path_to_e2edna_conda_environment> to activate the environment. To help find the path, run $ conda info -e to list all conda environments and their paths on your computer.
  3. As the message indicates at the end of installation process, if you wish to execute E2EDNA pipeline with a DNA aptamer sequence rather than its 3D structure, please register and download MMB from https://simtk.org/projects/rnatoolbox. Then copy or move the downloaded MMB folder to the codebase directory and remember to fill MMB-related paths section in the configuration file simu_config.yaml <!-- N.B.: Our experience recommends macOS users not specify DYLD_LIBRARY_PATH against the MMB installation guide, to avoid interference with the OpenMM module. No longer need to worry, it is taken care of in the interfaces.py-->

2. Usage

The usage and help statements can be accessed with the -h/--help flags: ``` (e2edna)$ ./main.py --help usage: main.py [-h] -yaml [-ow] [-d] [-os] [-p] [--CUDAprecision] [-w DIR] [-mbdir] [-mb] [--quickcheckmode] [-r] [-m] [-a] [-l] [-lt] [-ls] [--exampletargetpdb] [--examplepeptideseq] [--skipMMB] [-init] [--secondarystructureengine] [--N2Dstructures] [--Mg_conc] [--foldfidelity] [--foldspeed] [--mmbnormaltemplate] [--mmbquicktemplate] [--mmbslowtemplate] [--mmb_params] [-pk] [--pickupfromfreeAptamerChk] [--pickupfromcomplexChk] [--chk_file] [--pickuppdb] [--pressure] [--temperature] [--ionicStrength] [--pH] [--auto_sampling] [--autoMDconvergencecutoff] [--maxaptamersampling_iter] [--max_walltime] [--skip_smoothing] [--equilibration_time] [--smoothing_time] [--aptamersamplingtime] [--complexsamplingtime] [--time_step] [--print_step] [--force_field] [--hydrogen_mass] [--water_model] [--box_offset] [--constraints] [--constraint_tolerance] [--rigid_water] [--nonbonded_method] [--nonbonded_cutoff] [--ewalderrortolerance] [--friction] [--implicit_solvent] [--implicitsolventmodel] [--soluteDielectric] [--solventDielectric] [--implicitsolvent_Kappa] [--leaptemplate] [--DNAforce_field] [--dockingsteps] [--Ndocked_structures]

E2EDNA: Simulate DNA aptamers complexed with target ligands

optional arguments: -h, --help show this help message and exit -yaml, --yamlconfig A YAML configuration file that can specify all the arguments (default: simuconfig.yaml) -ow, --overwrite Overwrite existing --run_num (default: False)

Compute Platform Configuration: -d, --device Device configuration (default: local) -os, --operatingsystem Operating system (default: macos) -p, --platform Processing platform (default: CPU) --CUDAprecision Precision of CUDA, if used (default: single)

Directory Settings: -w DIR, --workdir DIR Working directory to store individual output runs (default: ./localruns) -mbdir, --mmb_dir MMB library directory (default: None) -mb, --mmb Path to MMB executable (default: None)

Run Parameters: --quickcheckmode Rapidly run a certain mode for quick check using default test parameters (default: Yes) -r, --runnum Run number. Output will be written to {--workdir}/run{--runnum} (default: 1) -m, --mode Run mode (default: None) -a, --aptamerseq DNA Aptamer sequence (5'->3') (default: None) -l, --ligand Name of PDB file for ligand structure; None if not to have ligand (default: None) -lt, --ligandtype Type of ligand molecule (default: None) -ls, --ligandseq Ligand sequence if peptide, DNA, or RNA (default: None) --exampletargetpdb An example peptide ligand included in E2EDNA package: used when wish to test docking (default: examples/examplepeptideligand.pdb) --examplepeptideseq The sequence of the example peptide ligand (default: YQTQTNSPRRAR) --skipMMB If Yes: skip both 2D structure analysis and MMB folding, and start with a known --initstructure (default: No) -init, --initstructure Name of PDB file if starting pipeline on a DNA aptamer with known structure (default: None) --secondarystructureengine Pipeline module that is used to predict secondary structures (default: NUPACK) --N2Dstructures Number of predicted secondary structures (default: 1) --Mgconc Magnesium molar concentration used in NUPACK: 0, 0.2 --foldfidelity Refold in MMB if score < fold_fidelity unless the fold_speed is quick (default: 0.9) --foldspeed MMB folding speed (default: normal) --mmbnormaltemplate Path to MMB folding protocol of normal speed (default: lib/mmb/commands.template.dat) --mmbquicktemplate Path to MMB folding protocol of quick speed (default: lib/mmb/commands.templatequick.dat) --mmbslowtemplate Path to MMB folding protocol of slow speed (default: lib/mmb/commands.templatelong.dat) --mmbparams Path to parameter file bundled with MMB package (default: lib/mmb/parameters.csv) -pk, --pickup Whether the run is to resume MD sampling of an unfinished run or an old run (default: No) --pickupfromfreeAptamerChk Resume MD sampling of free aptamer: skip everything before it (default: No) --pickupfromcomplexChk Resume MD sampling of aptamer-ligand: skip everything before it (default: No) --chkfile Name of checkpoint file for resuming MD sampling, format: /.chk (default: None) --pickuppdb PDB file (topology+coordinates) for resuming MD sampling in explicit solvent, format: /.pdb (default: None) --pressure Pressure in the unit of atm (default: 1.0) --temperature Temperature in Kelvin (default: 298.0) --ionicStrength Sodium molar concentration (could be used by NUPACK and OpenMM) (default: 0.1) --pH Could be used by OpenMM (default: 7.4) --autosampling If Yes: run MD sampling till convergence, currently only feasible in free aptamer sampling (default: No) --autoMDconvergencecutoff Convergence cutoff if doing autosampling (default: 0.01) --maxaptamersamplingiter Max number of iterations for free aptamer MD sampling if doing autosampling (default: 20) --maxwalltime Walltime in hours to check runtime (default: 24.0) --skipsmoothing If Yes: no short MD relaxation before MD sampling (default: Yes) --equilibrationtime Equilibration time in nanoseconds after energy minimization and before MD sampling (default: 0.1) --smoothingtime Time in nanoseconds for short MD relaxation before MD sampling, if any (default: None) --aptamersamplingtime MD sampling time in nanoseconds for free aptamer dynamics (default: None) --complexsamplingtime MD sampling time in nanoseconds for aptamer-ligand complex dynamics (default: None) --timestep time step in femtoseconds in MD sampling (default: 2.0) --printstep Printout step in picoseconds in MD sampling (default: 10.0) --forcefield Force field used in OpenMM (default: amber14-all) --hydrogenmass Unit is amu (default: 1.5) --watermodel Explicit water solvent model used in OpenMM (default: amber14/tip3p) --boxoffset Buffering offset in nanometers on solvent box, if using explicit solvent (default: 1.0) --constraints Specify which bond angles and/or lengths should be implemented with constraints (default: None) --constrainttolerance Distance tolerance for constraint in OpenMM integrator (default: 1e-06) --rigidwater Whether to make water molecules completely rigid at bond lengths and angles (default: Yes) --nonbondedmethod Type of nonbonded interactions (default: NoCutoff) --nonbondedcutoff The cutoff distance in nanometers to use for nonbonded interactions (default: 1.0) --ewalderrortolerance Error tolerance if nonbonded_method is Ewald, PME, or LJPME (default: 0.0005) --friction Friction coefficient in unit of 1/ps, used in Langevin integrator (default: 1.0) --implicitsolvent Whether to use an Amber GB implicit solvent model (default: No) --implicitsolventmodel Specify an Amber GB implicit solvent model if needed (default: None) --soluteDielectric The solute dielectric constant to use in the implicit solvent model (default: 1.0) --solventDielectric The solvent dielectric constant to use in the implicit solvent model (default: 78.5) --implicitsolventKappa Debye screening parameter; If specified by user, OpenMM will ignore {--ionicStrength} in implicit solvent. (default: None) --leaptemplate A script for running LEap program in Ambertools21, provided in E2EDNA package. (default: leaptemplate.in) --DNAforcefield Force field for DNA used by LEap program (default: DNA.OL15) --dockingsteps Number of steps for docking simulations (default: 10) --Ndockedstructures Number of docked structures output from the docker (default: 1) ```

3. Running a job

Using the scripts for automated tests

  • examples/automated_tests/ folder provides .sh and .yaml for automated testing. Check out how_to_run_automated_tests.txt in the folder for instructions. Each set of automated tests takes about 15 minutes to complete on a macbook pro laptop. We chose a simple DNA aptamer system for the "test runs" purpose. <!-- and whereever docking is performed the docking configuration may or may not be found due to the limitations of the chosen system. Please note that a failure to find a docked configuration is not a failure of E2EDNA. -->

Running on your own data with customized input arguments

All the input arguments listed in Usage can be customized either in command line or a .yaml configuration file, or both. The configuration .yaml is designed to be superior to command line inputs. Therefore, if an argument is specified in both .yaml and command line, the one in command line will be ignored.

A single run can be carried out by specifying all necessary input arguments in a configuration file: (e2edna)$ ./main.py --yaml_config=simu_config.yaml Alternatively, part or all of those necessary input arguments can also be passed into the pipeline via command line while having been commented out in the .yaml file. See simu_config_automated_tests.yaml and automated_tests.sh in examples/automated_tests/ as an example.

Below list the part of parameters in Usage which call for particular attention, due to their values have limited choices, for example. * Compute Platform Configuration * -d/--device: running device; either local or cluster * -os/--operating_system: operating sysmtem; macos or linux or WSL * -p/--platform: processing platform; either CPU or CUDA * --CUDA_precision: precision of CUDA if used; either single or double; Default is single

  • Directory Settings

    • -w/--workdir: directory to write results for each run; Default is ./localruns
    • -md/--mmb_dir: path to MMB library directory. Both absolute and relative (to the codebase directory) paths are accepted. <!-- Wildcards can be used to describe location, for instance the default is "Installer-/lib" which will match both "Installer-3.0-Ubuntu18.04/lib" and "Installer.3_0.OSX/lib". -->
    • -mb/--mmb: path to MMB executable. Both absolute and relative (to the codebase directory) paths are accepted. <!-- Wild cards can be used to describe location. -->
  • Run Parameters

    • -m/--mode: mode of operation; Must be one of the modes described in Functionality, ie, '2d structure', '3d coarse', '3d smooth', 'coarse dock', 'smooth dock', 'free aptamer', 'full dock', 'full binding'
    • -a/--aptamerSeq: DNA aptamer sequence (5'->3'). A string made of case sensitive letters from {A, G, C, T} only. <!-- or the name of a readable text file containing the sequence. -->
    • -l/--ligand: PDB filename of target ligand; If no ligand, --ligand and the following two inputs (--ligand_type and --ligand_seq) should be left off.
    • -lt/--ligand_type: peptide, DNA, RNA, or other, assuming other ligand can be described by force field used in MD simulation (default is Amber14).
    • -ls/--ligand_seq: a string of target ligand's sequence if --ligand_type is a peptide or DNA or RNA. If --ligand_type=other, do not use --ligand_seq flag. <!-- or the name of a readable text file containing the sequence. -->
    • --fold_speed: three choices; quick, normal or slow
    • --force_field and --water_model: plenty of choices, check out options of forcefields in OpenMM/7.7.0.
    • --constraints: four choices; HBonds, AllBonds, HAngles or None.
    • --nonbonded_method: at most six choices; Ewald, PME, LJPME, CutoffPeriodic, CutoffNonPeriodic or NoCutoff; Only the last three if using an Amber GB implicit solvent model.
    • --implicit_solvent_model: five choices; HCT, OBC1, OBC2, GBn or GBn2
    • --DNA_force_field: two choices; DNA.OL15 or DNA.bsc1

Check results in an output directory

Output of a single run will be written to {--workdir}/run{--run_num} directory.

Only the main outputs remain in the run{--run_num}/ folder, including a log file named run_output_log.txt. All intermediate or temporary files are grouped to subfolders, such as run1/md_aptamer_sampling_runfiles_0, based on which module they were generated from. In each run_output_log.txt, generation statements of those main output files are marked with >>> and one can selectively print out: ``` $ cat runoutputlog.txt | grep '>>>'

Predicted 2D structure #0 : .(((....))). MMB folded the aptamer and generated folded structure: foldedAptamer0.pdb Generated aptamer-ligand complex structure: complex0_0.pdb $ ```

4. Functionality of eight different operation modes

The pipeline could implement several distinct operation modes so users may customize the level of computational cost and accuracy.

  • '2d structure' → returns NUPACK analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.
  • '3d coarse' → returns MMB fold of the best secondary structure. Fast, O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.
  • '3d smooth' → identical to '3d coarse', with a short MD relaxation in solvent. About less than double the cost of '3d coarse' depending on relaxation time.
  • 'coarse dock' → uses the 3D structure from '3d coarse' as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to '3d coarse'.
  • 'smooth dock' → identical to 'coarse dock', instead using the relaxed structure from '3d smooth'. Similar cost to 'coarse dock'.
  • 'free aptamer' → fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow, O(hours).
  • 'full dock' → Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure 'free aptamer'. Similar cost (LightDock is relatively cheap)
  • 'full binding' → Same steps as 'full dock', with follow-up extended MD simulation of the best binding configuration. Slowest, O(hours).

5. Automated test runs

Running the scripts of automated tests mentioned in Running a job will automatically run light tests of 8 modes. Here we explain the inputs, what outputs to look for, and what a successful run should look like for each mode.

  1. --mode='2d structure'
  • Key inputs: DNA aptamer sequence

  • Outputs: predicted secondary structure in runoutputlog.txt

  • Success evaluation: observe the dot-bracket notion for secondary structure, such as .(((....))).

  1. --mode='3d coarse'
  • Key inputs: DNA aptamer sequence

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer_0.pdb

  • Success evaluation: visualize MMB-folded aptamer structure in software like VMD or PyMOL

  1. --mode='3d smooth'
  • Key inputs: DNA aptamer sequence

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Short MD relaxation trajectory of free aptamer: foldedAptamer0processedtrajectory.dcd and cleanfoldedAptamer0processedtrajectory.dcd (without solvent and ions); Relaxed aptamer structure: relaxedAptamer_0.pdb

  • Success evaluation: simulation logfile, MDlog_freeAptamerSmoothing.txt, indicates 100% completion; visualize the relaxation trajectory and relaxed structure in software like VMD or PyMOL

  1. --mode='coarse dock'
  • Key inputs: DNA aptamer sequence; PDB filename of target ligand

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; MMB-folded aptamer docked by target ligand: complex0_0.pdb (if docking happened)

  • Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL

  1. --mode='smooth dock'
  • Key inputs: DNA aptamer sequence; PDB filename of target ligand

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Short MD relaxation trajectory of free aptamer: foldedAptamer0processedtrajectory.dcd and cleanfoldedAptamer0processedtrajectory.dcd (without solvent and ions); Relaxed aptamer structure: relaxedAptamer0.pdb; Relaxed aptamer docked by target ligand: complex0_0.pdb (if docking happened)

  • Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL

  1. --mode='free aptamer'
  • Key inputs: DNA aptamer sequence

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure_0.pdb

  • Success evaluation: simulation logfile, MDlog_freeAptamerSampling.txt, indicates 100% completion; visualize the sampling trajectory and representative structure of free aptamer in software like VMD or PyMOL

  1. --mode='full dock'
  • Key inputs: DNA aptamer sequence; PDB filename of target ligand
  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure0.pdb; Representative aptamer docked by target ligand: complex0_0.pdb (if docking happened)
  • Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL
  1. --mode='full binding'
  • Key inputs: DNA aptamer sequence; PDB filename of target ligand

  • Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure0.pdb; Representative aptamer docked by target ligand: complex00.pdb (if docking happened); Long MD sampling trajectory of aptamer-ligand complex: complex00processedcompletetrajectory.dcd and cleancomplex00processedcompletetrajectory.dcd (without solvent and ions)

  • Success evaluation: simulation logfile, MDlog_complexSampling.txt, indicates 100% completion; visualize the sampling trajectory of aptamer-ligand complex in software like VMD or PyMOL

Owner

  • Login: siminegroup
  • Kind: user
  • Location: Montreal, QC, Canada
  • Company: McGill University

Computational Chemistry group at McGill University

JOSS Publication

E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands
Published
May 30, 2022
Volume 7, Issue 73, Page 4182
Authors
Michael Kilgour ORCID
Department of Chemistry, McGill University, Montreal, Quebec, Canada
Tao Liu ORCID
Department of Chemistry, McGill University, Montreal, Quebec, Canada
Ilya S. Dementyev ORCID
Department of Chemistry, McGill University, Montreal, Quebec, Canada
Lena Simine ORCID
Department of Chemistry, McGill University, Montreal, Quebec, Canada
Editor
Charlotte Soneson ORCID
Tags
simulation pipeline DNA aptamers

GitHub Events

Total
  • Issues event: 1
  • Issue comment event: 5
Last Year
  • Issues event: 1
  • Issue comment event: 5

Committers

Last synced: 5 months ago

All Time
  • Total Commits: 395
  • Total Committers: 6
  • Avg Commits per committer: 65.833
  • Development Distribution Score (DDS): 0.547
Past Year
  • Commits: 0
  • Committers: 0
  • Avg Commits per committer: 0.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Tao Liu t****2@o****m 179
ISDementyev 6****v 136
InfluenceFunctional m****r@g****m 37
lenasimine 6****e 31
siminegroup 9****p 11
Owais Ahmad o****4@g****m 1

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 14
  • Total pull requests: 12
  • Average time to close issues: 18 days
  • Average time to close pull requests: 5 days
  • Total issue authors: 7
  • Total pull request authors: 7
  • Average comments per issue: 3.57
  • Average comments per pull request: 1.17
  • Merged pull requests: 8
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 5.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • schackartk (7)
  • Owaiskhan9654 (2)
  • noah-camp (1)
  • snixon2 (1)
  • taoliu032 (1)
  • JoaoRodrigues (1)
  • brianjimenez (1)
Pull Request Authors
  • taoliu032 (4)
  • siminegroup (3)
  • InfluenceFunctional (1)
  • schackartk (1)
  • csoneson (1)
  • danielskatz (1)
  • Owaiskhan9654 (1)
Top Labels
Issue Labels
enhancement (3) bug (3) documentation (1) question (1)
Pull Request Labels
enhancement (1)