SneakerNet
SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data - Published in JOSS (2021)
Science Score: 95.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README and JOSS metadata -
✓Academic publication links
Links to: joss.theoj.org -
✓Committers with academic emails
1 of 13 committers (7.7%) from academic institutions -
○Institutional organization owner
-
✓JOSS paper metadata
Published in Journal of Open Source Software
Repository
:feet: QA/QC pipeline for a MiSeq/HiSeq/Ion Torrent/assembly-only run
Basic Info
Statistics
- Stars: 11
- Watchers: 4
- Forks: 4
- Open Issues: 12
- Releases: 31
Metadata Files
README.md
SneakerNet
Synopsis
A pipeline for processing reads from a sequencing run. Currently supports Illumina or Ion Torrent, but it can be expanded to other platforms.
# Run SneakerNet on the example data
SneakerNetPlugins.pl --numcpus 4 t/M00123-18-001-test
Main steps
This is the default workflow in v0.14 but there are other workflows available as described in PLUGINS.md.
- Parse sample entries - create an input file
samples.tsv - Read metrics - get raw read yields and raw read QC summary (CG-Pipeline)
- Assembly - assemble each genome (Shovill/skesa)
- MLST - 7-gene MLST (mlst)
- Run Kraken
- Contamination detection - check that all reads come from one taxon for each genome (Kraken)
- Contamination detection - check that all seven MLST genes have only one instance in the genome as expected (ColorID)
- Base balance - check that the ratio of A/T is approximately 1 and same with C/T
- Antimicrobial resistance gene prediction - detect genotype and predict phenotype (staramr)
- Pass/fail - list all genomes that have failed Q/C
- Transfer Files - files are copied to a remote folder
- HTML summary report
- Email the report
Quick start
- Install and configure SneakerNet - from source or with a container
- Make an input folder from your MiSeq run docs/SneakerNetInput.md
- Run
SneakerNetPlugins.plon the input folder.
Installation
See docs/INSTALL.md
NOTE: to ensure all dependencies are met, please follow the dependencies section under the installation document.
Container installation
SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.
Here is a summary of Docker commands, from the containers documentation.
# Pull image
docker pull lskatz/sneakernet:latest
# Import data directly from the MiSeq machine, where $MISEQ is a raw run folder exported by the MiSeq machine
# and $INDIR is the newly created SneakerNet input folder
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl /data/$MISEQ -o /data/$INDIR
# Run SneakerNet on the $INDIR (SneakerNet formatted folder)
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNetPlugins.pl --numcpus 12 --no email --no transfer --no save /data/$INDIR
Workflow
Creating a SneakerNet project directory
For more information on a SneakerNet-style folder, see docs/SneakerNetInput.md
SneakerNet requires a project directory that is in a certain format already.
To create the project, you can use SneakerNet.roRun.pl. For example,
SneakerNet.roRun.pl --createsamplesheet -o M1234-18-001-test miseq/working/directory
M01234-19-01-test is a project folder name, where it is dash-delimited and contains
machine name, year, ordinal, and optionally a name.
Fastq files must be in the format of _R1_ instead of _1 and _R2_ instead of _2 for this particular script to parse the files properly.
Running SneakerNet
It is generally a good idea to edit a file snok.txt to configure the run further.
For more information on the workflow, see the configuration section in INSTALL.md.
For example,
echo "emails = example@example.com, blah@example.com" > t/data/M00123-18-001/snok.txt
echo "workflow = default" >> t/data/M00123-18-001/snok.txt
And then run SneakerNet like so (optionally following the log with tail -f):
SneakerNetPlugins.pl --numcpus 8 t/data/M00123-18-001 > t/data/M00123-18-001/SneakerNet.log 2>&1 &
tail -f t/data/M00123-18-001/SneakerNet.log
Containers
SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.
Output
For more information, please see docs/SneakerNetOutput.md
SneakerNet produces a subfolder SneakerNet/ in your run directory.
It also emails a report. To view a sample report, please go to
t/report.html
in this repository.
Plugins
SneakerNet is based on plugins. In this context, a plugin is an independent script
that can run an analysis on a run directory, accept standard inputs (e.g., --help),
and create standard output files.
For more details, see the plugins readme.
Plugins for developers
You too can develop for SneakerNet! For more information, please look at the readme for plugins and the contributing doc.
Further reading
Please see the docs subfolder for more specific documentation.
For inline documentation on some of the perl code, run perldoc lib/perl5/SneakerNet.pm.
Citation
Griswold, T., Kapsak, C., Chen, J. C., den Bakker, H. C., Williams, G., Kelley, A., Vidyaprakash, E., & Katz, L. S. (2021). SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data. Journal of open source software, 6(60), 10.21105/joss.02334. https://doi.org/10.21105/joss.02334
Owner
- Name: Lee Katz
- Login: lskatz
- Kind: user
- Location: Atlanta, GA
- Company: CDC (work) + personal projects
- Website: https://lskatz.github.io
- Twitter: lskatz
- Repositories: 138
- Profile: https://github.com/lskatz
JOSS Publication
SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data
Authors
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA, Weems Design Studio, Inc., Suwanee, GA, USA
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Weems Design Studio, Inc., Suwanee, GA, USA, Waterborne Disease Prevention Branch (WDPB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA, IHRC, Atlanta, GA, USA
Tags
QA/QCGitHub Events
Total
- Release event: 1
- Delete event: 1
- Push event: 1
- Create event: 1
Last Year
- Release event: 1
- Delete event: 1
- Push event: 1
- Create event: 1
Committers
Last synced: 7 months ago
Top Committers
| Name | Commits | |
|---|---|---|
| Lee Katz - Aspen | g****2@c****v | 583 |
| edlb-sneakernet | e****t@s****p | 271 |
| sequencermaster | s****r@m****0 | 46 |
| edlb-sneakernet | e****t@m****v | 13 |
| Lee Katz | l****z@m****v | 11 |
| Monolith0 root | r****t@M****v | 8 |
| Curtis Kapsak | k****j@g****m | 3 |
| Lee Katz | l****z@s****4 | 3 |
| Khushbu Patel | k****7@g****u | 2 |
| Root | r****t@s****v | 2 |
| SHRIVATSA HEGDE | 1****e | 1 |
| sequencermaster | s****r@m****v | 1 |
| root | r****t@m****v | 1 |
Committer Domains (Top 20 + Academic)
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 50
- Total pull requests: 21
- Average time to close issues: about 1 month
- Average time to close pull requests: 5 days
- Total issue authors: 3
- Total pull request authors: 4
- Average comments per issue: 1.38
- Average comments per pull request: 0.62
- Merged pull requests: 19
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 1
- Average time to close issues: N/A
- Average time to close pull requests: 27 minutes
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.0
- Merged pull requests: 1
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- lskatz (39)
- lfaller (5)
- kapsakcj (5)
Pull Request Authors
- lskatz (18)
- kapsakcj (4)
- kpatel427 (2)
- ShrivatsaHegde (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- actions/checkout v2 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- docker/setup-qemu-action v1 composite
- actions/checkout v2 composite
- docker/build-push-action v1 composite
- docker/build-push-action v2 composite
- docker/login-action v1 composite
- docker/setup-buildx-action v1 composite
- actions/checkout v2 composite
- satackey/action-docker-layer-caching v0.0.8 composite
- actions/checkout v2 composite
- shogo82148/actions-setup-perl v1 composite
- flowcraft/krona 2.7-1 build
- mgibio/samtools 1.9 build
- staphb/kraken 1.1.1-no-db build
- staphb/mash 2.2 build
- staphb/mlst 2.19.0 build
- staphb/prokka 1.14.5 build
- staphb/salmid 0.1.23 build
- staphb/seqtk 1.3 build
- staphb/shovill 1.1.0 build
- staphb/skesa 2.4.0 build
- ubuntu bionic build