meta-ont

Nf-core based workflow for Nanopore metagenomic reads

https://github.com/srusher/meta-ont

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
✓
DOI references
Found 4 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.2%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

Nf-core based workflow for Nanopore metagenomic reads

Basic Info

Host: GitHub
Owner: srusher
License: mit
Language: Nextflow
Default Branch: main
Size: 2.29 MB

Statistics

Stars: 1
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created over 1 year ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog Contributing License Citation

Meta-ONT Workflow Diagram

Meta-ONT workflow diagram

Summary

Meta-ONT is a bioinformatics workflow that accepts ONT reads as input and runs them through the following processes:

OPTIONAL: Subsampling (BBMap)
Long read QC
- Read Statistics/Summary (Nanoplot)
- OPTIONAL: Adapter trimming (Porechop)
- Quality and Length Filtering (Chopper)
Read Alignment (minimap2)
Alignment-based Classification (samtools) plus custom scripts

 - _OPTIONAL_: Read filtering by taxonomic ID (custom module)

K-mer based Classification (Kraken2)

 - _OPTIONAL_: Read filtering by taxonomic ID ([`KrakenTools`](https://github.com/jenniferlu717/KrakenTools))

OPTIONAL: K-mer based Taxonomic Distribution Visualization (Krona)
OPTIONAL: Alignment-based and K-mer Classifier Comparison Barplot (custom module)
Assembly with one of 3 tools:

 - [`Flye`](https://github.com/fenderglass/Flye)

 - [`Megahit`](https://github.com/voutcn/megahit)

 - [`Spades`](https://github.com/ablab/spades) - NOTE: Spades assembler should only be used with long reads when performing a hybrid assembly

Assembly QC (quast)
OPTIONAL: Assembly Polishing (Medaka)
OPTIONAL: Polished Assembly QC (quast)
Binning (Maxbin2)
Contig Alignment and Identification (blast)
Contig Translation (Nucl -> AA) and Protein Identification (blastx)
Workflow Summary (MultiQC)

Setup

This workflow uses assets and depencies native to the CDC's SciComp environment. If you do not have access to the SciComp environment, you can request an account here.

Usage

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

csv sample,fastq_long SAMPLE_1,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample1-long-read-ont.fastq.gz SAMPLE_2,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample2-long-read-ont.fastq.gz

The top row is the header row ("sample,fastqlong") and should never be altered. Each row below the header, represents a fastq file with a unique identifier in the "sample" column (SAMPLE1 and SAMPLE_2 in the example above). Each fastq file needs to be gzipped/compressed to prevent validation errors from occuring at the initialization of the pipeline

There is an example samplesheet located under the assets folder (assets/samplesheet.csv) that you can view and edit yourself. NOTE If you use this samplesheet, please make a back up copy of it as it will be overwritten each time you pull an updated version of this repository.

Once the samplesheet has been formatted, we can run the workflow using one of the 3 methods methods listed below.

Method 1: Cluster Submission:

The qsub method allows you to submit the job to SciComp's high memory cluster computing nodes for fast performance and load distribution. This is a good "fire and forget" method for new users who aren't as familiar with SciComp's compute environment

Format: bash bash ./run_qsub.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_qsub.sh --input "assets/samplesheet.csv" --outdir "results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 2: Local Execution:

The local method may be a better option if you are experiencing technical issues with the qsub method. qsub adds additonal layers of complexity to workflow execution, while local simply runs the workflow on your local machine or the host that you're connected to, provided it has sufficient memory/RAM and CPUs to execute the workflow

Format: bash bash ./run_local.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_local.sh --input "./assets/samplesheet.csv" --outdir "./results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 3: Native Nextflow Execution:

If you are familiar with nextflow and Scicomp's computing environment, you can invoke the nextflow command straight from the terminal. NOTE: if you are using this method you will need to load up a nextflow environment via module load or conda

Format: bash nextflow run main.nf -profile singularity,local --input "/path/to/samplesheet" --outdir "/path/to/output/directory" \<additional flags\>

Example: bash nextflow run main.nf -profile singularity,local --input "./assets/samplesheet.csv" --outdir "./results/test" --skip_subsample false --num_subsamples 1000 --skip_kraken2 false

Update: Sample barplot from classification comparison module

The following barplot was generated using my new classification comparison module which is intended to plot the top 10 classification hits from alignment-based classification and kraken2 classification. The data used to generate this graph was a 10,000 read subsample from a ZYMO mock community sample:

Classification-comparison-barplot

Parameters

See below for all possible input parameters:

Credits

Meta-ONT was originally written by Sam Rusher (rtq0@cdc.gov).

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

Name: Samuel Rusher
Login: srusher
Kind: user
Location: Frankfort, KY
Company: Bioinformatics Specialist with Leidos

Website: https://www.linkedin.com/in/samuel-rusher/
Repositories: 1
Profile: https://github.com/srusher

Programs with an emphasis on bioinformatics | Experience with C#, Python, Java, SQL, R, HTML, and CSS

Citation (CITATIONS.md)

# scicomp/nfcoreskeleton: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Watch event: 1
Push event: 11
Create event: 2

Last Year

Watch event: 1
Push event: 11
Create event: 2

Dependencies

.github/workflows/branch.yml actions

mshick/add-pr-comment v1 composite

.github/workflows/ci.yml actions

actions/checkout v3 composite
nf-core/setup-nextflow v1 composite

.github/workflows/clean-up.yml actions

actions/stale v7 composite

.github/workflows/fix-linting.yml actions

actions/checkout v3 composite
actions/setup-node v3 composite

.github/workflows/linting.yml actions

actions/checkout v3 composite
actions/setup-node v3 composite
actions/setup-python v4 composite
actions/upload-artifact v3 composite
mshick/add-pr-comment v1 composite
nf-core/setup-nextflow v1 composite
psf/black stable composite

.github/workflows/linting_comment.yml actions

dawidd6/action-download-artifact v2 composite
marocchino/sticky-pull-request-comment v2 composite

.github/workflows/release-announcments.yml actions

actions/setup-python v4 composite
rzr/fediverse-action master composite
zentered/bluesky-post-action v0.0.2 composite

modules/nf-core/blast/makeblastdb/meta.yml cpan

modules/nf-core/busco/busco/meta.yml cpan

modules/nf-core/chopper/meta.yml cpan

modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan

modules/nf-core/fastqc/meta.yml cpan

modules/nf-core/flye/meta.yml cpan

modules/nf-core/krakentools/kreport2krona/meta.yml cpan

modules/nf-core/krona/krona_db/meta.yml cpan

modules/nf-core/krona/ktimporttaxonomy/meta.yml cpan

modules/nf-core/krona/ktimporttext/meta.yml cpan

modules/nf-core/maxbin2/meta.yml cpan

modules/nf-core/medaka/meta.yml cpan

modules/nf-core/megahit/meta.yml cpan

modules/nf-core/metabat2/jgisummarizebamcontigdepths/meta.yml cpan

modules/nf-core/metabat2/metabat2/meta.yml cpan

modules/nf-core/multiqc/meta.yml cpan

modules/nf-core/nonpareil/curve/meta.yml cpan

modules/nf-core/nonpareil/nonpareil/meta.yml cpan

modules/nf-core/nonpareil/nonpareilcurvesr/meta.yml cpan

modules/nf-core/porechop/porechop/meta.yml cpan

modules/nf-core/quast/meta.yml cpan

modules/nf-core/samtools/fastq/meta.yml cpan

modules/nf-core/samtools/index/meta.yml cpan

modules/nf-core/samtools/sort/meta.yml cpan

modules/nf-core/spades/meta.yml cpan

pyproject.toml pypi

modules/nf-core/checkm2/predict/meta.yml cpan

modules/nf-core/krakentools/extractkrakenreads/meta.yml cpan

modules/nf-core/samtools/coverage/meta.yml cpan

modules/nf-core/samtools/depth/meta.yml cpan

modules/nf-core/blast/makeblastdb/environment.yml pypi

modules/nf-core/busco/busco/environment.yml pypi

modules/nf-core/checkm2/predict/environment.yml pypi

modules/nf-core/chopper/environment.yml pypi

modules/nf-core/flye/environment.yml pypi

modules/nf-core/krakentools/extractkrakenreads/environment.yml pypi

modules/nf-core/krakentools/kreport2krona/environment.yml pypi

modules/nf-core/krona/krona_db/environment.yml pypi

modules/nf-core/krona/ktimporttaxonomy/environment.yml pypi

modules/nf-core/krona/ktimporttext/environment.yml pypi

modules/nf-core/maxbin2/environment.yml pypi

modules/nf-core/medaka/environment.yml pypi

modules/nf-core/megahit/environment.yml pypi

modules/nf-core/metabat2/jgisummarizebamcontigdepths/environment.yml pypi

modules/nf-core/metabat2/metabat2/environment.yml pypi

modules/nf-core/nonpareil/curve/environment.yml pypi

modules/nf-core/nonpareil/nonpareil/environment.yml pypi

modules/nf-core/nonpareil/nonpareilcurvesr/environment.yml pypi

modules/nf-core/porechop/porechop/environment.yml pypi

modules/nf-core/samtools/coverage/environment.yml pypi

modules/nf-core/samtools/fastq/environment.yml pypi

modules/nf-core/samtools/index/environment.yml pypi

modules/nf-core/samtools/sort/environment.yml pypi

modules/nf-core/spades/environment.yml pypi

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science