Mashtree
Mashtree: a rapid comparison of whole genome sequence files - Published in JOSS (2019)
Science Score: 93.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 6 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
○Committers with academic emails
-
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Keywords
Scientific Fields
Repository
:deciduous_tree: Create a tree using Mash distances
Basic Info
Statistics
- Stars: 170
- Watchers: 12
- Forks: 26
- Open Issues: 18
- Releases: 46
Topics
Metadata Files
README.md
mashtree
Create a tree using Mash distances.
For simple usage, see mashtree --help. This is an example command:
mashtree *.fastq.gz > tree.dnd
For confidence values, run either with --help: mashtree_bootstrap.pl or mashtree_jackknife.pl.
Two modes: fast or accurate
Input files: fastq files are interpreted as raw read files. Fasta, GenBank, and EMBL files are interpreted as genome assemblies. Compressed files are also accepted of any of the above file types. You can compress with gz, bz2, or zip.
Output files: Newick (.dnd). If --outmatrix is supplied, then
a distance matrix too.
See the documentation on the algorithms for more information.
Faster
mashtree --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd
More accurate
You can get a more accurate tree with the minimum abundance finder. Simply
give --mindepth 0. This step helps ignore very unique kmers that are
more likely read errors.
mashtree --mindepth 0 --numcpus 12 *.fastq.gz [*.fasta] > mashtree.dnd
Adding confidence values
Mashtree can add confidence values using jack knifing. For each
jack knife tree, 50% of hashes are used. Confidence values are calculated from
the jack knife trees using BioPerl. When using this method, you can pass
flags to mashtree using the double-dash like in the example below.
Added in version 0.40.
mashtree_jackknife.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.jackknife.dnd
mashtree_jackknife.pl --help # additional usage help
Bootsrapping was added in version 0.55. This runs mashtree itself multiple times, each with a random seed.
mashtree_bootstrap.pl --reps 100 --numcpus 12 *.fastq.gz -- --min-depth 0 > mashtree.bootstrap.dnd
Usage
Usage: mashtree [options] *.fastq *.fasta *.gbk *.msh > tree.dnd
NOTE: fastq files are read as raw reads;
fasta, gbk, and embl files are read as assemblies;
Input files can be gzipped.
--tempdir '' If specified, this directory will not be
removed at the end of the script and can
be used to cache results for future
analyses.
If not specified, a dir will be made for you
and then deleted at the end of this script.
--numcpus 1 This script uses Perl threads.
--outmatrix '' If specified, will write a distance matrix
in tab-delimited format
--file-of-files If specified, mashtree will try to read
filenames from each input file. The file of
files format is one filename per line. This
file of files cannot be compressed.
--outtree If specified, the tree will be written to
this file and not to stdout. Log messages
will still go to stderr.
--version Display the version and exit
TREE OPTIONS
--truncLength 250 How many characters to keep in a filename
--sort-order ABC For neighbor-joining, the sort order can
make a difference. Options include:
ABC (alphabetical), random, input-order
MASH SKETCH OPTIONS
--genomesize 5000000
--mindepth 5 If mindepth is zero, then it will be
chosen in a smart but slower method,
to discard lower-abundance kmers.
--kmerlength 21
--sketch-size 10000
Installation
Please see INSTALL.md
Further documentation
For perl library help, run perldoc on a .pm file, e.g., perldoc lib/Mashtree/Db.pm.
For executable help run --help, e.g., mashtree_bootstrap.pl --help.
For more information and help please see the docs folder
For more information on plugins, see the plugins folder. (in development)
For more information on contributions, please see CONTRIBUTING.md.
References
- Mash: http://mash.readthedocs.io
- BioPerl: http://bioperl.org
Citation
JOSS
Katz, L. S., Griswold, T., Morrison, S., Caravas, J., Zhang, S., den Bakker, H.C., Deng, X., and Carleton, H. A., (2019). Mashtree: a rapid comparison of whole genome sequence files. Journal of Open Source Software, 4(44), 1762, https://doi.org/10.21105/joss.01762
Poster
Katz, L. S., Griswold, T., & Carleton, H. A. (2017, October 8-11). Generating WGS Trees with Mashtree. Poster presented at the American Society for Microbiology Conference on Rapid Applied Microbial Next-Generation Sequencing and Bioinformatic Pipelines, Washington, DC. Poster number 27.
Owner
- Name: Lee Katz
- Login: lskatz
- Kind: user
- Location: Atlanta, GA
- Company: CDC (work) + personal projects
- Website: https://lskatz.github.io
- Twitter: lskatz
- Repositories: 138
- Profile: https://github.com/lskatz
JOSS Publication
Mashtree: a rapid comparison of whole genome sequence files
Authors
Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA, Center for Food Safety, University of Georgia, Griffin, GA, USA
Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
Respiratory Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
Respiratory Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
Center for Food Safety, University of Georgia, Griffin, GA, USA
Enteric Diseases Laboratory Branch, Centers for Disease Control and Prevention, Atlanta, GA, USA
Tags
dendrogram mash sketch tree rapidGitHub Events
Total
- Issues event: 6
- Watch event: 18
- Issue comment event: 5
- Fork event: 2
Last Year
- Issues event: 6
- Watch event: 18
- Issue comment event: 5
- Fork event: 2
Committers
Last synced: 5 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Lee Katz - Aspen | g****2@c****v | 401 |
| Mohammad S Anwar | m****r@y****m | 4 |
| Franklin Bristow | f****w@g****m | 1 |
| Charlotte Soneson | c****n@g****m | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 4 months ago
All Time
- Total issues: 85
- Total pull requests: 9
- Average time to close issues: 4 months
- Average time to close pull requests: about 12 hours
- Total issue authors: 49
- Total pull request authors: 4
- Average comments per issue: 3.0
- Average comments per pull request: 0.89
- Merged pull requests: 8
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 6
- Pull requests: 0
- Average time to close issues: 4 months
- Average time to close pull requests: N/A
- Issue authors: 6
- Pull request authors: 0
- Average comments per issue: 1.33
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- tseemann (19)
- lskatz (6)
- mihkelvaher (5)
- mdiricks (3)
- samlipworth (2)
- schultzm (2)
- karel-brinda (2)
- noorshu (2)
- Rob-murphys (2)
- vaofford (2)
- JChristopherEllis (1)
- noaheb98 (1)
- andrewsanchez (1)
- hmontenegro (1)
- caizhangbin (1)
Pull Request Authors
- manwar (4)
- lskatz (3)
- fbristow (1)
- csoneson (1)
Top Labels
Issue Labels
Pull Request Labels
Packages
- Total packages: 1
- Total downloads: unknown
- Total dependent packages: 0
- Total dependent repositories: 0
- Total versions: 38
- Total maintainers: 1
metacpan.org: Mashtree
functions for Mashtree databasing
- License: gpl_3
-
Latest release: v1.4.6
published about 2 years ago