E2EDNA 2.0
E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands - Published in JOSS (2022)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 11 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org, zenodo.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Scientific Fields
Repository
Basic Info
- Host: GitHub
- Owner: siminegroup
- License: gpl-3.0
- Language: Python
- Default Branch: main
- Size: 23.9 MB
Statistics
- Stars: 17
- Watchers: 3
- Forks: 7
- Open Issues: 4
- Releases: 1
Metadata Files
README.MD
E2EDNA 2.0 - OpenMM Implementation of E2EDNA !
New feature: DeltaGzip [JCIM paper][code]
An automated pipeline for simulating DNA aptamers complexed with target ligands (peptide, DNA, RNA or small molecules).
Please note that the main branch is in ongoing development and tests may or may not work. For a fully working version use the released code v2.0.0
To view Tinker-based version of E2EDNA, refer to its GitHub repo and DOI. <!-- J. Chem. Inf. Model. 2021, 61, 9, 4139–4144 -->
Interested in contributing to developing E2EDNA? Check out how to contribute here.
Reference
If you use this code in any future publications, please cite our work using Kilgour et al., (2022). E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands. Journal of Open Source Software, 7(73), 4182
E2EDNA pipeline makes use of several other open-sourced software packages, therefore please be mindful of citing them as well: - NUPACK - MMB - OpenMM - LightDock <!-- - or LightDock_GitHub -->
Table of contents
- Installation
- Usage
- Running a job
- Functionality of eight different operation modes
- Automated test runs
1. Installation
- Download the E2EDNA 2.0 package from this repository.
- Locate
macos_installation.shin the downloaded E2EDNA2 codebase directory. Then at the codebase directory, run$ source macos_installation.shin command line to create a conda virtual environment namede2ednaand install required dependences. Thee2ednaenvironment should be activated when the installation script finishes, which means a string '(e2edna)' should show up at the beginning of the command line prompt.- If the script fails to activate the environment automatically, this is likely because
$ conda activate e2ednacommand in the script gives an error such asYour shell has not been properly configured to use 'conda activate'. - If so, manually run
$ source activate <path_to_e2edna_conda_environment>to activate the environment. To help find the path, run$ conda info -eto list all conda environments and their paths on your computer.
- If the script fails to activate the environment automatically, this is likely because
- As the message indicates at the end of installation process, if you wish to execute E2EDNA pipeline with a DNA aptamer sequence rather than its 3D structure, please register and download MMB from https://simtk.org/projects/rnatoolbox. Then copy or move the downloaded MMB folder to the codebase directory and remember to fill
MMB-related pathssection in the configuration filesimu_config.yaml<!-- N.B.: Our experience recommends macOS users not specifyDYLD_LIBRARY_PATHagainst the MMB installation guide, to avoid interference with the OpenMM module. No longer need to worry, it is taken care of in the interfaces.py-->
2. Usage
The usage and help statements can be accessed with the -h/--help flags:
```
(e2edna)$ ./main.py --help
usage: main.py [-h] -yaml [-ow] [-d] [-os] [-p] [--CUDAprecision] [-w DIR] [-mbdir] [-mb] [--quickcheckmode] [-r] [-m] [-a]
[-l] [-lt] [-ls] [--exampletargetpdb] [--examplepeptideseq] [--skipMMB] [-init] [--secondarystructureengine]
[--N2Dstructures] [--Mg_conc] [--foldfidelity] [--foldspeed] [--mmbnormaltemplate] [--mmbquicktemplate]
[--mmbslowtemplate] [--mmb_params] [-pk] [--pickupfromfreeAptamerChk] [--pickupfromcomplexChk] [--chk_file]
[--pickuppdb] [--pressure] [--temperature] [--ionicStrength] [--pH] [--auto_sampling] [--autoMDconvergencecutoff]
[--maxaptamersampling_iter] [--max_walltime] [--skip_smoothing] [--equilibration_time] [--smoothing_time]
[--aptamersamplingtime] [--complexsamplingtime] [--time_step] [--print_step] [--force_field] [--hydrogen_mass]
[--water_model] [--box_offset] [--constraints] [--constraint_tolerance] [--rigid_water] [--nonbonded_method]
[--nonbonded_cutoff] [--ewalderrortolerance] [--friction] [--implicit_solvent] [--implicitsolventmodel]
[--soluteDielectric] [--solventDielectric] [--implicitsolvent_Kappa] [--leaptemplate] [--DNAforce_field]
[--dockingsteps] [--Ndocked_structures]
E2EDNA: Simulate DNA aptamers complexed with target ligands
optional arguments: -h, --help show this help message and exit -yaml, --yamlconfig A YAML configuration file that can specify all the arguments (default: simuconfig.yaml) -ow, --overwrite Overwrite existing --run_num (default: False)
Compute Platform Configuration: -d, --device Device configuration (default: local) -os, --operatingsystem Operating system (default: macos) -p, --platform Processing platform (default: CPU) --CUDAprecision Precision of CUDA, if used (default: single)
Directory Settings: -w DIR, --workdir DIR Working directory to store individual output runs (default: ./localruns) -mbdir, --mmb_dir MMB library directory (default: None) -mb, --mmb Path to MMB executable (default: None)
Run Parameters:
--quickcheckmode Rapidly run a certain mode for quick check using default test parameters (default: Yes)
-r, --runnum Run number. Output will be written to {--workdir}/run{--runnum} (default: 1)
-m, --mode Run mode (default: None)
-a, --aptamerseq
DNA Aptamer sequence (5'->3') (default: None)
-l, --ligand Name of PDB file for ligand structure; None if not to have ligand (default: None)
-lt, --ligandtype
Type of ligand molecule (default: None)
-ls, --ligandseq
Ligand sequence if peptide, DNA, or RNA (default: None)
--exampletargetpdb
An example peptide ligand included in E2EDNA package: used when wish to test docking (default:
examples/examplepeptideligand.pdb)
--examplepeptideseq
The sequence of the example peptide ligand (default: YQTQTNSPRRAR)
--skipMMB If Yes: skip both 2D structure analysis and MMB folding, and start with a known --initstructure (default: No)
-init, --initstructure
Name of PDB file if starting pipeline on a DNA aptamer with known structure (default: None)
--secondarystructureengine
Pipeline module that is used to predict secondary structures (default: NUPACK)
--N2Dstructures Number of predicted secondary structures (default: 1)
--Mgconc Magnesium molar concentration used in NUPACK: 0, 0.2
--foldfidelity Refold in MMB if score < fold_fidelity unless the fold_speed is quick (default: 0.9)
--foldspeed MMB folding speed (default: normal)
--mmbnormaltemplate
Path to MMB folding protocol of normal speed (default: lib/mmb/commands.template.dat)
--mmbquicktemplate
Path to MMB folding protocol of quick speed (default: lib/mmb/commands.templatequick.dat)
--mmbslowtemplate
Path to MMB folding protocol of slow speed (default: lib/mmb/commands.templatelong.dat)
--mmbparams Path to parameter file bundled with MMB package (default: lib/mmb/parameters.csv)
-pk, --pickup Whether the run is to resume MD sampling of an unfinished run or an old run (default: No)
--pickupfromfreeAptamerChk
Resume MD sampling of free aptamer: skip everything before it (default: No)
--pickupfromcomplexChk
Resume MD sampling of aptamer-ligand: skip everything before it (default: No)
--chkfile Name of checkpoint file for resuming MD sampling, format: Yes: run MD sampling till convergence, currently only feasible in free aptamer sampling (default: No)
--autoMDconvergencecutoff
Convergence cutoff if doing autosampling (default: 0.01)
--maxaptamersamplingiter
Max number of iterations for free aptamer MD sampling if doing autosampling (default: 20)
--maxwalltime Walltime in hours to check runtime (default: 24.0)
--skipsmoothing If Yes: no short MD relaxation before MD sampling (default: Yes)
--equilibrationtime
Equilibration time in nanoseconds after energy minimization and before MD sampling (default: 0.1)
--smoothingtime Time in nanoseconds for short MD relaxation before MD sampling, if any (default: None)
--aptamersamplingtime
MD sampling time in nanoseconds for free aptamer dynamics (default: None)
--complexsamplingtime
MD sampling time in nanoseconds for aptamer-ligand complex dynamics (default: None)
--timestep time step in femtoseconds in MD sampling (default: 2.0)
--printstep Printout step in picoseconds in MD sampling (default: 10.0)
--forcefield Force field used in OpenMM (default: amber14-all)
--hydrogenmass Unit is amu (default: 1.5)
--watermodel Explicit water solvent model used in OpenMM (default: amber14/tip3p)
--boxoffset Buffering offset in nanometers on solvent box, if using explicit solvent (default: 1.0)
--constraints Specify which bond angles and/or lengths should be implemented with constraints (default: None)
--constrainttolerance
Distance tolerance for constraint in OpenMM integrator (default: 1e-06)
--rigidwater Whether to make water molecules completely rigid at bond lengths and angles (default: Yes)
--nonbondedmethod Type of nonbonded interactions (default: NoCutoff)
--nonbondedcutoff The cutoff distance in nanometers to use for nonbonded interactions (default: 1.0)
--ewalderrortolerance
Error tolerance if nonbonded_method is Ewald, PME, or LJPME (default: 0.0005)
--friction Friction coefficient in unit of 1/ps, used in Langevin integrator (default: 1.0)
--implicitsolvent Whether to use an Amber GB implicit solvent model (default: No)
--implicitsolventmodel
Specify an Amber GB implicit solvent model if needed (default: None)
--soluteDielectric The solute dielectric constant to use in the implicit solvent model (default: 1.0)
--solventDielectric
The solvent dielectric constant to use in the implicit solvent model (default: 78.5)
--implicitsolventKappa
Debye screening parameter; If specified by user, OpenMM will ignore {--ionicStrength} in implicit solvent. (default: None)
--leaptemplate A script for running LEap program in Ambertools21, provided in E2EDNA package. (default: leaptemplate.in)
--DNAforcefield Force field for DNA used by LEap program (default: DNA.OL15)
--dockingsteps Number of steps for docking simulations (default: 10)
--Ndockedstructures
Number of docked structures output from the docker (default: 1)
```
3. Running a job
Using the scripts for automated tests
examples/automated_tests/folder provides.shand.yamlfor automated testing. Check outhow_to_run_automated_tests.txtin the folder for instructions. Each set of automated tests takes about 15 minutes to complete on a macbook pro laptop. We chose a simple DNA aptamer system for the "test runs" purpose. <!-- and whereever docking is performed the docking configuration may or may not be found due to the limitations of the chosen system. Please note that a failure to find a docked configuration is not a failure of E2EDNA. -->
Running on your own data with customized input arguments
All the input arguments listed in Usage can be customized either in command line or a .yaml configuration file, or both. The configuration .yaml is designed to be superior to command line inputs. Therefore, if an argument is specified in both .yaml and command line, the one in command line will be ignored.
A single run can be carried out by specifying all necessary input arguments in a configuration file:
(e2edna)$ ./main.py --yaml_config=simu_config.yaml
Alternatively, part or all of those necessary input arguments can also be passed into the pipeline via command line while having been commented out in the .yaml file. See simu_config_automated_tests.yaml and automated_tests.sh in examples/automated_tests/ as an example.
Below list the part of parameters in Usage which call for particular attention, due to their values have limited choices, for example.
* Compute Platform Configuration
* -d/--device: running device; either local or cluster
* -os/--operating_system: operating sysmtem; macos or linux or WSL
* -p/--platform: processing platform; either CPU or CUDA
* --CUDA_precision: precision of CUDA if used; either single or double; Default is single
Directory Settings
-w/--workdir: directory to write results for each run; Default is./localruns-md/--mmb_dir: path to MMB library directory. Both absolute and relative (to the codebase directory) paths are accepted. <!-- Wildcards can be used to describe location, for instance the default is "Installer-/lib" which will match both "Installer-3.0-Ubuntu18.04/lib" and "Installer.3_0.OSX/lib". -->-mb/--mmb: path to MMB executable. Both absolute and relative (to the codebase directory) paths are accepted. <!-- Wild cards can be used to describe location. -->
Run Parameters
-m/--mode: mode of operation; Must be one of the modes described in Functionality, ie,'2d structure','3d coarse','3d smooth','coarse dock','smooth dock','free aptamer','full dock','full binding'-a/--aptamerSeq: DNA aptamer sequence (5'->3'). A string made of case sensitive letters from {A, G, C, T} only. <!-- or the name of a readable text file containing the sequence. -->-l/--ligand: PDB filename of target ligand; If no ligand,--ligandand the following two inputs (--ligand_typeand--ligand_seq) should be left off.-lt/--ligand_type:peptide,DNA,RNA, orother, assumingotherligand can be described by force field used in MD simulation (default is Amber14).-ls/--ligand_seq: a string of target ligand's sequence if--ligand_typeis apeptideorDNAorRNA. If--ligand_type=other, do not use--ligand_seqflag. <!-- or the name of a readable text file containing the sequence. -->--fold_speed: three choices;quick,normalorslow--force_fieldand--water_model: plenty of choices, check out options of forcefields in OpenMM/7.7.0.--constraints: four choices;HBonds,AllBonds,HAnglesorNone.--nonbonded_method: at most six choices;Ewald,PME,LJPME,CutoffPeriodic,CutoffNonPeriodicorNoCutoff; Only the last three if using an Amber GB implicit solvent model.--implicit_solvent_model: five choices;HCT,OBC1,OBC2,GBnorGBn2--DNA_force_field: two choices;DNA.OL15orDNA.bsc1
Check results in an output directory
Output of a single run will be written to {--workdir}/run{--run_num} directory.
Only the main outputs remain in the run{--run_num}/ folder, including a log file named run_output_log.txt. All intermediate or temporary files are grouped to subfolders, such as run1/md_aptamer_sampling_runfiles_0, based on which module they were generated from. In each run_output_log.txt, generation statements of those main output files are marked with >>> and one can selectively print out:
```
$ cat runoutputlog.txt | grep '>>>'
Predicted 2D structure #0 : .(((....))). MMB folded the aptamer and generated folded structure: foldedAptamer0.pdb Generated aptamer-ligand complex structure: complex0_0.pdb $ ```
4. Functionality of eight different operation modes
The pipeline could implement several distinct operation modes so users may customize the level of computational cost and accuracy.
'2d structure'→ returns NUPACK analysis of aptamer secondary structure. Very fast, O(<1s). If using NUPACK, includes probability of observing a certain fold and of suboptimal folds within kT of the minimum.'3d coarse'→ returns MMB fold of the best secondary structure. Fast, O(5-30 mins). Results in a strained 3D structure which obeys base pairing rules and certain stacking interactions.'3d smooth'→ identical to'3d coarse', with a short MD relaxation in solvent. About less than double the cost of'3d coarse'depending on relaxation time.'coarse dock'→ uses the 3D structure from'3d coarse'as the initial condition for a LightDock simulation, and returns best docking configurations and scores. Depending on docking parameters, adds O(5-30mins) to'3d coarse'.'smooth dock'→ identical to 'coarse dock', instead using the relaxed structure from'3d smooth'. Similar cost to'coarse dock'.'free aptamer'→ fold the aptamer in MMB and run extended MD sampling to identify a representative, equilibrated 2D and 3D structure. Slow, O(hours).'full dock'→ Return best docking configurations and scores from a LightDock run using the fully-equilibrated aptamer structure'free aptamer'. Similar cost (LightDock is relatively cheap)'full binding'→ Same steps as'full dock', with follow-up extended MD simulation of the best binding configuration. Slowest, O(hours).
5. Automated test runs
Running the scripts of automated tests mentioned in Running a job will automatically run light tests of 8 modes. Here we explain the inputs, what outputs to look for, and what a successful run should look like for each mode.
--mode='2d structure'
Key inputs: DNA aptamer sequence
Outputs: predicted secondary structure in runoutputlog.txt
Success evaluation: observe the dot-bracket notion for secondary structure, such as .(((....))).
--mode='3d coarse'
Key inputs: DNA aptamer sequence
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer_0.pdb
Success evaluation: visualize MMB-folded aptamer structure in software like VMD or PyMOL
--mode='3d smooth'
Key inputs: DNA aptamer sequence
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Short MD relaxation trajectory of free aptamer: foldedAptamer0processedtrajectory.dcd and cleanfoldedAptamer0processedtrajectory.dcd (without solvent and ions); Relaxed aptamer structure: relaxedAptamer_0.pdb
Success evaluation: simulation logfile, MDlog_freeAptamerSmoothing.txt, indicates 100% completion; visualize the relaxation trajectory and relaxed structure in software like VMD or PyMOL
--mode='coarse dock'
Key inputs: DNA aptamer sequence; PDB filename of target ligand
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; MMB-folded aptamer docked by target ligand: complex0_0.pdb (if docking happened)
Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL
--mode='smooth dock'
Key inputs: DNA aptamer sequence; PDB filename of target ligand
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Short MD relaxation trajectory of free aptamer: foldedAptamer0processedtrajectory.dcd and cleanfoldedAptamer0processedtrajectory.dcd (without solvent and ions); Relaxed aptamer structure: relaxedAptamer0.pdb; Relaxed aptamer docked by target ligand: complex0_0.pdb (if docking happened)
Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL
--mode='free aptamer'
Key inputs: DNA aptamer sequence
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure_0.pdb
Success evaluation: simulation logfile, MDlog_freeAptamerSampling.txt, indicates 100% completion; visualize the sampling trajectory and representative structure of free aptamer in software like VMD or PyMOL
--mode='full dock'
- Key inputs: DNA aptamer sequence; PDB filename of target ligand
- Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure0.pdb; Representative aptamer docked by target ligand: complex0_0.pdb (if docking happened)
- Success evaluation: visualize the docked structure, if docking happened, in software like VMD or PyMOL
--mode='full binding'
Key inputs: DNA aptamer sequence; PDB filename of target ligand
Outputs: predicted secondary structure in runoutputlog.txt; MMB-folded aptamer structure: foldedAptamer0.pdb; Long MD sampling trajectory of free aptamer: foldedAptamer0processedcompletetrajectory.dcd and cleanfoldedAptamer0processedcompletetrajectory.dcd (without solvent and ions); Representative structure of free aptamer: repStructure0.pdb; Representative aptamer docked by target ligand: complex00.pdb (if docking happened); Long MD sampling trajectory of aptamer-ligand complex: complex00processedcompletetrajectory.dcd and cleancomplex00processedcompletetrajectory.dcd (without solvent and ions)
Success evaluation: simulation logfile, MDlog_complexSampling.txt, indicates 100% completion; visualize the sampling trajectory of aptamer-ligand complex in software like VMD or PyMOL
Owner
- Login: siminegroup
- Kind: user
- Location: Montreal, QC, Canada
- Company: McGill University
- Website: www.siminegroup.ca
- Twitter: thesiminegroup
- Repositories: 1
- Profile: https://github.com/siminegroup
Computational Chemistry group at McGill University
JOSS Publication
E2EDNA 2.0: Python Pipeline for Simulating DNA Aptamers with Ligands
Authors
Tags
simulation pipeline DNA aptamersGitHub Events
Total
- Issues event: 1
- Issue comment event: 5
Last Year
- Issues event: 1
- Issue comment event: 5
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Tao Liu | t****2@o****m | 179 |
| ISDementyev | 6****v | 136 |
| InfluenceFunctional | m****r@g****m | 37 |
| lenasimine | 6****e | 31 |
| siminegroup | 9****p | 11 |
| Owais Ahmad | o****4@g****m | 1 |
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 14
- Total pull requests: 12
- Average time to close issues: 18 days
- Average time to close pull requests: 5 days
- Total issue authors: 7
- Total pull request authors: 7
- Average comments per issue: 3.57
- Average comments per pull request: 1.17
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 5.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- schackartk (7)
- Owaiskhan9654 (2)
- noah-camp (1)
- snixon2 (1)
- taoliu032 (1)
- JoaoRodrigues (1)
- brianjimenez (1)
Pull Request Authors
- taoliu032 (4)
- siminegroup (3)
- InfluenceFunctional (1)
- schackartk (1)
- csoneson (1)
- danielskatz (1)
- Owaiskhan9654 (1)
