dryad
Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
✓Academic publication links
Links to: pubmed.ncbi, ncbi.nlm.nih.gov -
○Committers with academic emails
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.2%) to scientific vocabulary
Repository
Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.
Basic Info
Statistics
- Stars: 18
- Watchers: 3
- Forks: 5
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
Dryad

Dryad is a Nextflow pipeline for examining prokaryote relatedness. Dryad can perform a reference free analysis and/or SNP analysis.
Dryad analyzes fasta files that have been processed either by Spriggan or by PHoeNIx. Dryad is split into two major workflows: 1. A workflow dedicated to fine scale outbreak investigations that are within a singular outbreak. This process uses a reference to determine relatedness and snp distances. The reference can be removed from the alignment based workflow to create a phylogenetic tree that gives a high resolution look at a singular outbreak. 2. A workflow dedicated to identifying historical relatedness across multiple years and multiple outbreaks without the use of a reference. This alignment free workflow gives a low resolution look at historical relatedness.
Table of Contents:
Usage
Input
Parameters
Workflow
Output
Credits
Contributions-and-Support
Citations
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data. To use Dryad, a Nextflow minimum version of 24.04.2.5914 is needed.
To run an alignment free comparison, use:
bash
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--alignment_free
Alternatively, to run an alignment based comparison, use:
bash
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--fasta <REFERENCE_FASTA | random> \
--alignment_based
To run both and alignment based and an alignment free comparison, use:
bash
nextflow run wslh-bio/dryad \
-latest \
-profile <docker/singularity/.../institute> \
--input samplesheet.csv \
--outdir <OUTDIR> \
--fasta <REFERENCE_FASTA | random> \
--alignment_based \
--alignment_free
- Nextflow caches previously run pipelines. This can result in an older version of a pipeline being utilized. To get the most up-to-date version of a pipeline like Dryad, use the
-latesttag.
Input
Prepare a samplesheet with your input data with each row representing one fasta file. The samplesheet will look as follows:
samplesheet.csv:
| sample | fasta |
| ------------- | ------------- |
| sample1 | 20241.contigs.fa |
| sample2 | 20242.contigs.fa |
Parameters
Dryad's main parameters and their defaults are shown in the table below:
| Parameter | Parameter description and defaults | Example usage |
| ------------- | ------------- | ------------- |
| input | Path to comma-separated file containing information about the samples in the experiment | --input
*If you are running an alignment based workflow on >100 samples, it may be beneficial to take into account a higher partitioning value than the default of 100. More information can be found in parsnp 2.0's paper.
Workflow

1. Universal Steps
- Enter assembled FASTA genomes into a samplesheet.
- QUAST v5.2.0 is used to determine assembly quality if skip_quast is not indicated.
- QUAST results are summarized with a custom python script to increase readability.
2. Comparison Steps
- Historical Comparison
- Mashtree v1.4.6 generates a phylogenetic tree using Mash distances.
- Fine scale Comparison
- Bootstrapping in IQ-TREE2 requires at least 4 genomes. If less than 4 genomes are used, IQ-TREE2 will not perform bootstrapping.
- Parsnp v2.0.5 is used to perform a core genome alignment.
- IQ-TREE2 v2.3.4 is used for inferring a phylogenetic tree.
- Snp-dists v0.8.2 is used to calculate the SNP distance matrix.
Output
An example of Dryad's output directory structure for both alignment based and alignment free steps can be seen below. These directories will not include QUAST if --skip_quast is used:
alignment_based_output/
├── compare
│ └── sample_exclusion_status.csv
├── dryad
│ └── dryad_summary.csv
├── iqtree
│ └── parsnp.snps.mblocks.treefile
├── parse
│ └── aligner_log.tsv
├── parsnp
│ └── parsnp_output
│ ├── config
│ │ ├── all.mumi
│ │ └── all_mumi.ini
│ ├── log
│ │ ├── harvest-mblocks.err
│ │ ├── harvest-mblocks.out
│ │ ├── parsnp-aligner.err
│ │ ├── parsnpAligner.log
│ │ ├── parsnp-aligner.out
│ │ ├── parsnp-mumi.err
│ │ ├── parsnp-mumi.out
│ │ ├── raxml.err
│ │ └── raxml.out
│ ├── parsnpAligner.ini
│ ├── parsnp.ggr
│ ├── parsnp.maf
│ ├── parsnp.snps.mblocks
│ ├── parsnp.tree
│ ├── parsnp.xmfa
│ └── *.fna.ref
├── pipeline_info
│ ├── execution_report_*.html
│ ├── execution_timeline_*.html
│ ├── execution_trace_*.txt
│ ├── pipeline_dag_*.html
│ └── samplesheet.valid.csv
├── quast
│ ├── *.quast.report.tsv
│ ├── *.transposed.quast.tsv
│ ├── quast_results.tsv
├── sample
│ └── count.txt
└── snpdists
└── snp_dists_matrix.tsv
alignment_free_output/
├── mashtree
│ └── mashtree.bootstrap.dnd
├── pipeline_info
│ ├── *.html
│ ├── *.txt
│ └── samplesheet.valid.csv
└── quast
│ ├── *.quast.report.tsv
│ ├── *.transposed.quast.report.tsv
│ └── quast_results.tsv
├── rejected_samples
│ └── Empty_samples.csv
Notable output files:
Alignment based
| File | Output |
| ------------- | ------------- |
| quastresults.tsv* | Assembly quality results |
| snpdistsmatrix.tsv | Number of SNP distances between each pair of isolates |
| parsnp.snps.mblocks.treefile | Maximum likelihood phylogenetic tree |
| alignerlog.tsv | Coverage statistics calculated by parsnp |
| excludedsamplesfromparsnp.txt | Lists samples that were excluded from parsnp's analysis due to a MUMi distance > 0.01 |
| dryadsummary.csv | Summarizes quast report, if run, and core genome percentages |
| Empty_samples.csv| Lists any samples that are empty and were removed from the pipeline |
*QUAST results will not be present if --skip_quast was used.
Alignment free | File | Output | | ------------- | ------------- | | quastresults.tsv* | Assembly quality results | | mashtree.bootstrap.dnd | Neighbor joining tree based on mash distances | | Emptysamples.csv| Lists any samples that are empty and were removed from the pipeline |
*QUAST results will not be present if --skip_quast was utilized.
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Credits
Dryad was written by Dr. Kelsey Florek, Dr. Abigail C. Shockey, and Eva Gunawan.
We thank the bioinformatics group at the Wisconsin State Laboratory of Hygiene for all of their contributions.
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
If you use Dryad for your analysis, please cite it using the following:
K. Florek, A.C. Shockey, & E. Gunawan (2014). Dryad (Version 4.1.1) [https://github.com/wslh-bio/dryad].
An extensive list of references for the tools used by Dryad can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: WSLH Bioinformatics
- Login: wslh-bio
- Kind: organization
- Repositories: 4
- Profile: https://github.com/wslh-bio
Wisconsin State Laboratory of Hygiene Bioinformatics
Citation (CITATIONS.md)
# wslh-bio/dryad: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools ## [Quast](https://quast.sourceforge.net/docs/manual.html) > A. Mikheenko, A. Prjibelski, V. Saveliev, D. Antipov, A. Gurevich, Versatile genome assembly evaluation with QUAST-LG, Bioinformatics (2018) 34 (13): i142-i150. doi: 10.1093/bioinformatics/bty266 ## [Parsnp](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-014-0524-x) > Kille B, Nute MG, Huang V, Kim E, Phillippy AM, Treangen TJ: Parsnp 2.0: Scalable Core-Genome Alignment for Massive Microbial Datasets. bioRxiv (2024). doi: https://doi.org/10.1101/2024.01.30.577458 ## [IQ-TREE](https://doi.org/10.1093/molbev/msaa015) > D.T. Hoang, O. Chernomor, A. von Haeseler, B.Q. Minh, and L.S. Vinh (2018) UFBoot2: Improving the ultrafast bootstrap approximation. Mol. Biol. Evol., 35:518–522. https://doi.org/10.1093/molbev/msx281 ## [snp-dists](https://github.com/tseemann/snp-dists) > T. Seemann, F. Klotzl, A. Page (2014). Snp-Dists (Version 0.8.2) [https://github.com/tseemann/snp-dists]. ## [Mashtree](https://doi.org/10.21105/joss.01762) > Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762 ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 1
- Watch event: 1
- Push event: 61
- Pull request review event: 4
- Pull request event: 2
Last Year
- Create event: 2
- Release event: 1
- Issues event: 1
- Watch event: 1
- Push event: 61
- Pull request review event: 4
- Pull request event: 2
Committers
Last synced: over 2 years ago
Top Committers
| Name | Commits | |
|---|---|---|
| Abigail Shockey | a****y@g****m | 111 |
| Kelsey Florek | n****k@g****m | 96 |
| Kelsey Florek | k****2@g****m | 78 |
Issues and Pull Requests
Last synced: 8 months ago
All Time
- Total issues: 1
- Total pull requests: 19
- Average time to close issues: N/A
- Average time to close pull requests: 5 days
- Total issue authors: 1
- Total pull request authors: 2
- Average comments per issue: 0.0
- Average comments per pull request: 0.16
- Merged pull requests: 15
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 4
- Average time to close issues: N/A
- Average time to close pull requests: 6 minutes
- Issue authors: 0
- Pull request authors: 1
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- k-florek (1)
Pull Request Authors
- evagunawan (13)
- AbigailShockey (11)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/cache v2 composite
- actions/checkout v2 composite
- aws-actions/configure-aws-credentials v1 composite