meta-short

Nf-core based workflow for short read metagenomic data

https://github.com/srusher/meta-short

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (13.8%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Nf-core based workflow for short read metagenomic data

Basic Info

Host: GitHub
Owner: srusher
License: mit
Language: Nextflow
Default Branch: main
Size: 220 KB

Statistics

Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Citation

Introduction

metashort is a bioinformatics workflow that accepts short reads as input and runs them through the following processes/analyses:

OPTIONAL: Subsampling (BBMap)
Read QC (FastQC)
Adapter Trimming and Quality Filtering (trimmomatic or fastp)
Taxonomic Classification (kraken2)
Taxonomy distribution visualization (Krona)
OPTIONAL: Taxonomic Filtering (KrakenTools)
De Novo Assembly Spades
Assembly QC (quast)
Binning (Maxbin2)
Contig alignment and identification (blast)
Generate summary report (MultiQC)

Setup

This workflow uses assets and depencies native to the CDC's SciComp environment. If you do not have access to the SciComp environment, you can request an account here.

Usage

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

csv sample,fastq_1,fastq_2 SAMPLE_1,/data/reads/sample1-R1.fastq.gz,data/reads/sample1-R2.fastq.gz SAMPLE_2,/data/reads/sample2-R1.fastq.gz,data/reads/sample2-R2.fastq.gz

The top row is the header row ("sample,fastq1,fastq2") and should never be altered. Each row below the header, represents two paired-end fastq file with a unique identifier in the "sample" column (SAMPLE1 and SAMPLE2 in the example above). Each fastq file needs to be gzipped/compressed to prevent validation errors from occuring at the initialization of the pipeline

There is an example samplesheet located under the assets folder (assets/samplesheet.csv) that you can view and edit yourself. NOTE If you use this samplesheet, please make a back up copy of it as it will be overwritten each time you pull an updated version of this repository.

Once the samplesheet has been formatted, we can run the workflow using one of the 3 methods methods listed below.

Method 1: Cluster Submission:

The qsub method allows you to submit the job to SciComp's high memory cluster computing nodes for fast performance and load distribution. This is a good "fire and forget" method for new users who aren't as familiar with SciComp's compute environment

Format: bash bash ./run_qsub.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_qsub.sh --input "assets/samplesheet.csv" --outdir "results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 2: Local Execution:

The local method may be a better option if you are experiencing technical issues with the qsub method. qsub adds additonal layers of complexity to workflow execution, while local simply runs the workflow on your local machine or the host that you're connected to, provided it has sufficient memory/RAM and CPUs to execute the workflow

Format: bash bash ./run_local.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_local.sh --input "./assets/samplesheet.csv" --outdir "./results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 3: Native Nextflow Execution:

If you are familiar with nextflow and Scicomp's computing environment, you can invoke the nextflow command straight from the terminal. NOTE: if you are using this method you will need to load up a nextflow environment via module load or conda

Format: bash nextflow run main.nf -profile singularity,local --input "/path/to/samplesheet" --outdir "/path/to/output/directory" \<additional flags\>

Example: bash nextflow run main.nf -profile singularity,local --input "./assets/samplesheet.csv" --outdir "./results/test" --skip_subsample false --num_subsamples 1000 --skip_kraken2 false

Parameters

See below for all possible input parameters:

Credits

Meta-short was originally written by Sam Rusher (rtq0@cdc.gov)..

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

Name: Samuel Rusher
Login: srusher
Kind: user
Location: Frankfort, KY
Company: Bioinformatics Specialist with Leidos

Website: https://www.linkedin.com/in/samuel-rusher/
Repositories: 1
Profile: https://github.com/srusher

Programs with an emphasis on bioinformatics | Experience with C#, Python, Java, SQL, R, HTML, and CSS

Citation (CITATIONS.md)

# emel/metashort: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Push event: 3

Last Year

Push event: 3

Dependencies

modules/nf-core/blast/makeblastdb/meta.yml cpan

modules/nf-core/busco/busco/meta.yml cpan

modules/nf-core/checkm/lineagewf/meta.yml cpan

modules/nf-core/chopper/meta.yml cpan

modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan

modules/nf-core/fastp/meta.yml cpan

modules/nf-core/fastqc/meta.yml cpan

modules/nf-core/flye/meta.yml cpan

modules/nf-core/kraken2/kraken2/meta.yml cpan

modules/nf-core/krakentools/extractkrakenreads/meta.yml cpan

modules/nf-core/krakentools/kreport2krona/meta.yml cpan

modules/nf-core/krona/krona_db/meta.yml cpan

modules/nf-core/krona/ktimporttaxonomy/meta.yml cpan

modules/nf-core/krona/ktimporttext/meta.yml cpan

modules/nf-core/maxbin2/meta.yml cpan

modules/nf-core/medaka/meta.yml cpan

modules/nf-core/megahit/meta.yml cpan

modules/nf-core/metabat2/jgisummarizebamcontigdepths/meta.yml cpan

modules/nf-core/metabat2/metabat2/meta.yml cpan

modules/nf-core/metaphlan/meta.yml cpan

modules/nf-core/minimap2/meta.yml cpan

modules/nf-core/multiqc/meta.yml cpan

modules/nf-core/nanoplot/meta.yml cpan

modules/nf-core/porechop/porechop/meta.yml cpan

modules/nf-core/quast/meta.yml cpan

modules/nf-core/samtools/fastq/meta.yml cpan

modules/nf-core/samtools/index/meta.yml cpan

modules/nf-core/samtools/sort/meta.yml cpan

modules/nf-core/spades/meta.yml cpan

modules/nf-core/trimmomatic/meta.yml cpan

modules/nf-core/blast/makeblastdb/environment.yml pypi

modules/nf-core/busco/busco/environment.yml pypi

modules/nf-core/checkm/lineagewf/environment.yml pypi

modules/nf-core/chopper/environment.yml pypi

modules/nf-core/fastp/environment.yml pypi

modules/nf-core/flye/environment.yml pypi

modules/nf-core/kraken2/kraken2/environment.yml pypi

modules/nf-core/krakentools/extractkrakenreads/environment.yml pypi

modules/nf-core/krakentools/kreport2krona/environment.yml pypi

modules/nf-core/krona/krona_db/environment.yml pypi

modules/nf-core/krona/ktimporttaxonomy/environment.yml pypi

modules/nf-core/krona/ktimporttext/environment.yml pypi

modules/nf-core/maxbin2/environment.yml pypi

modules/nf-core/medaka/environment.yml pypi

modules/nf-core/megahit/environment.yml pypi

modules/nf-core/metabat2/jgisummarizebamcontigdepths/environment.yml pypi

modules/nf-core/metabat2/metabat2/environment.yml pypi

modules/nf-core/metaphlan/environment.yml pypi

modules/nf-core/minimap2/environment.yml pypi

modules/nf-core/nanoplot/environment.yml pypi

modules/nf-core/porechop/porechop/environment.yml pypi

modules/nf-core/samtools/fastq/environment.yml pypi

modules/nf-core/samtools/index/environment.yml pypi

modules/nf-core/samtools/sort/environment.yml pypi

modules/nf-core/spades/environment.yml pypi

pyproject.toml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science