SneakerNet

SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data - Published in JOSS (2021)

https://github.com/lskatz/sneakernet

Science Score: 95.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README and JOSS metadata
  • Academic publication links
    Links to: joss.theoj.org
  • Committers with academic emails
    1 of 13 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
    Published in Journal of Open Source Software
Last synced: 6 months ago · JSON representation

Repository

:feet: QA/QC pipeline for a MiSeq/HiSeq/Ion Torrent/assembly-only run

Basic Info
  • Host: GitHub
  • Owner: lskatz
  • License: apache-2.0
  • Language: Perl
  • Default Branch: master
  • Homepage:
  • Size: 142 MB
Statistics
  • Stars: 11
  • Watchers: 4
  • Forks: 4
  • Open Issues: 12
  • Releases: 31
Created almost 10 years ago · Last pushed about 1 year ago
Metadata Files
Readme Contributing License Code of conduct

README.md

SneakerNet

DOI

Synopsis

A pipeline for processing reads from a sequencing run. Currently supports Illumina or Ion Torrent, but it can be expanded to other platforms.

# Run SneakerNet on the example data
SneakerNetPlugins.pl --numcpus 4 t/M00123-18-001-test

SneakerNet workflow

Main steps

This is the default workflow in v0.14 but there are other workflows available as described in PLUGINS.md.

Quick start

  1. Install and configure SneakerNet - from source or with a container
  2. Make an input folder from your MiSeq run docs/SneakerNetInput.md
  3. Run SneakerNetPlugins.pl on the input folder.

Installation

See docs/INSTALL.md

NOTE: to ensure all dependencies are met, please follow the dependencies section under the installation document.

Container installation

SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.

Here is a summary of Docker commands, from the containers documentation.

# Pull image
docker pull lskatz/sneakernet:latest
# Import data directly from the MiSeq machine, where $MISEQ is a raw run folder exported by the MiSeq machine
# and $INDIR is the newly created SneakerNet input folder
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl /data/$MISEQ -o /data/$INDIR
# Run SneakerNet on the $INDIR (SneakerNet formatted folder)
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNetPlugins.pl --numcpus 12 --no email --no transfer --no save /data/$INDIR

Workflow

Creating a SneakerNet project directory

For more information on a SneakerNet-style folder, see docs/SneakerNetInput.md

SneakerNet requires a project directory that is in a certain format already. To create the project, you can use SneakerNet.roRun.pl. For example,

SneakerNet.roRun.pl --createsamplesheet -o M1234-18-001-test miseq/working/directory

M01234-19-01-test is a project folder name, where it is dash-delimited and contains machine name, year, ordinal, and optionally a name. Fastq files must be in the format of _R1_ instead of _1 and _R2_ instead of _2 for this particular script to parse the files properly.

Running SneakerNet

It is generally a good idea to edit a file snok.txt to configure the run further. For more information on the workflow, see the configuration section in INSTALL.md. For example,

echo "emails = example@example.com, blah@example.com" > t/data/M00123-18-001/snok.txt
echo "workflow = default" >> t/data/M00123-18-001/snok.txt

And then run SneakerNet like so (optionally following the log with tail -f):

SneakerNetPlugins.pl --numcpus 8 t/data/M00123-18-001 > t/data/M00123-18-001/SneakerNet.log 2>&1 &
tail -f t/data/M00123-18-001/SneakerNet.log

Containers

SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.

Output

For more information, please see docs/SneakerNetOutput.md

SneakerNet produces a subfolder SneakerNet/ in your run directory. It also emails a report. To view a sample report, please go to t/report.html in this repository.

Plugins

SneakerNet is based on plugins. In this context, a plugin is an independent script that can run an analysis on a run directory, accept standard inputs (e.g., --help), and create standard output files.

For more details, see the plugins readme.

Plugins for developers

You too can develop for SneakerNet! For more information, please look at the readme for plugins and the contributing doc.

Further reading

Please see the docs subfolder for more specific documentation.

For inline documentation on some of the perl code, run perldoc lib/perl5/SneakerNet.pm.

Citation

Griswold, T., Kapsak, C., Chen, J. C., den Bakker, H. C., Williams, G., Kelley, A., Vidyaprakash, E., & Katz, L. S. (2021). SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data. Journal of open source software, 6(60), 10.21105/joss.02334. https://doi.org/10.21105/joss.02334

Owner

  • Name: Lee Katz
  • Login: lskatz
  • Kind: user
  • Location: Atlanta, GA
  • Company: CDC (work) + personal projects

JOSS Publication

SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data
Published
April 16, 2021
Volume 6, Issue 60, Page 2334
Authors
Taylor Griswold
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Curtis Kapsak ORCID
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA, Weems Design Studio, Inc., Suwanee, GA, USA
Jessica C. Chen
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Henk C. den Bakker ORCID
Center for Food Safety, University of Georgia, Griffin, GA, USA
Grant Williams
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Alyssa Kelley
Weems Design Studio, Inc., Suwanee, GA, USA, Waterborne Disease Prevention Branch (WDPB), Centers for Disease Control and Prevention, Atlanta, GA, USA
Eshaw Vidyaprakash
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA, IHRC, Atlanta, GA, USA
Lee S. Katz ORCID
Enteric Diseases Laboratory Branch (EDLB), Centers for Disease Control and Prevention, Atlanta, GA, USA, Center for Food Safety, University of Georgia, Griffin, GA, USA
Editor
Lorena Pantano ORCID
Tags
QA/QC

GitHub Events

Total
  • Release event: 1
  • Delete event: 1
  • Push event: 1
  • Create event: 1
Last Year
  • Release event: 1
  • Delete event: 1
  • Push event: 1
  • Create event: 1

Committers

Last synced: 7 months ago

All Time
  • Total Commits: 945
  • Total Committers: 13
  • Avg Commits per committer: 72.692
  • Development Distribution Score (DDS): 0.383
Past Year
  • Commits: 2
  • Committers: 1
  • Avg Commits per committer: 2.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
Lee Katz - Aspen g****2@c****v 583
edlb-sneakernet e****t@s****p 271
sequencermaster s****r@m****0 46
edlb-sneakernet e****t@m****v 13
Lee Katz l****z@m****v 11
Monolith0 root r****t@M****v 8
Curtis Kapsak k****j@g****m 3
Lee Katz l****z@s****4 3
Khushbu Patel k****7@g****u 2
Root r****t@s****v 2
SHRIVATSA HEGDE 1****e 1
sequencermaster s****r@m****v 1
root r****t@m****v 1
Committer Domains (Top 20 + Academic)

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 50
  • Total pull requests: 21
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 5 days
  • Total issue authors: 3
  • Total pull request authors: 4
  • Average comments per issue: 1.38
  • Average comments per pull request: 0.62
  • Merged pull requests: 19
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 27 minutes
  • Issue authors: 1
  • Pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 1
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • lskatz (39)
  • lfaller (5)
  • kapsakcj (5)
Pull Request Authors
  • lskatz (18)
  • kapsakcj (4)
  • kpatel427 (2)
  • ShrivatsaHegde (1)
Top Labels
Issue Labels
enhancement (10)
Pull Request Labels

Dependencies

.github/workflows/build-and-push-container.yml actions
  • actions/checkout v2 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
  • docker/setup-buildx-action v1 composite
  • docker/setup-qemu-action v1 composite
.github/workflows/github-docker.yml.bak actions
  • actions/checkout v2 composite
  • docker/build-push-action v1 composite
  • docker/build-push-action v2 composite
  • docker/login-action v1 composite
  • docker/setup-buildx-action v1 composite
.github/workflows/travis-docker.yml.bak actions
  • actions/checkout v2 composite
  • satackey/action-docker-layer-caching v0.0.8 composite
.github/workflows/unittests.yml actions
  • actions/checkout v2 composite
  • shogo82148/actions-setup-perl v1 composite
Dockerfile docker
  • flowcraft/krona 2.7-1 build
  • mgibio/samtools 1.9 build
  • staphb/kraken 1.1.1-no-db build
  • staphb/mash 2.2 build
  • staphb/mlst 2.19.0 build
  • staphb/prokka 1.14.5 build
  • staphb/salmid 0.1.23 build
  • staphb/seqtk 1.3 build
  • staphb/shovill 1.1.0 build
  • staphb/skesa 2.4.0 build
  • ubuntu bionic build