meta-ont

Nf-core based workflow for Nanopore metagenomic reads

https://github.com/srusher/meta-ont

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.2%) to scientific vocabulary
Last synced: 7 months ago · JSON representation ·

Repository

Nf-core based workflow for Nanopore metagenomic reads

Basic Info
  • Host: GitHub
  • Owner: srusher
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 2.29 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Meta-ONT Workflow Diagram

Meta-ONT workflow diagram

Summary

Meta-ONT is a bioinformatics workflow that accepts ONT reads as input and runs them through the following processes:

  1. OPTIONAL: Subsampling (BBMap)

  2. Long read QC

  3. Read Alignment (minimap2)

  4. Alignment-based Classification (samtools) plus custom scripts

 - _OPTIONAL_: Read filtering by taxonomic ID (custom module)
  1. K-mer based Classification (Kraken2)
 - _OPTIONAL_: Read filtering by taxonomic ID ([`KrakenTools`](https://github.com/jenniferlu717/KrakenTools))
  1. OPTIONAL: K-mer based Taxonomic Distribution Visualization (Krona)

  2. OPTIONAL: Alignment-based and K-mer Classifier Comparison Barplot (custom module)

  3. Assembly with one of 3 tools:

 - [`Flye`](https://github.com/fenderglass/Flye)

 - [`Megahit`](https://github.com/voutcn/megahit)

 - [`Spades`](https://github.com/ablab/spades) - NOTE: Spades assembler should only be used with long reads when performing a hybrid assembly
  1. Assembly QC (quast)

  2. OPTIONAL: Assembly Polishing (Medaka)

  3. OPTIONAL: Polished Assembly QC (quast)

  4. Binning (Maxbin2)

  5. Contig Alignment and Identification (blast)

  6. Contig Translation (Nucl -> AA) and Protein Identification (blastx)

  7. Workflow Summary (MultiQC)

Setup

This workflow uses assets and depencies native to the CDC's SciComp environment. If you do not have access to the SciComp environment, you can request an account here.

Usage

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

csv sample,fastq_long SAMPLE_1,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample1-long-read-ont.fastq.gz SAMPLE_2,/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/long-reads/sample2-long-read-ont.fastq.gz

The top row is the header row ("sample,fastqlong") and should never be altered. Each row below the header, represents a fastq file with a unique identifier in the "sample" column (SAMPLE1 and SAMPLE_2 in the example above). Each fastq file needs to be gzipped/compressed to prevent validation errors from occuring at the initialization of the pipeline

There is an example samplesheet located under the assets folder (assets/samplesheet.csv) that you can view and edit yourself. NOTE If you use this samplesheet, please make a back up copy of it as it will be overwritten each time you pull an updated version of this repository.

Once the samplesheet has been formatted, we can run the workflow using one of the 3 methods methods listed below.

Method 1: Cluster Submission:

The qsub method allows you to submit the job to SciComp's high memory cluster computing nodes for fast performance and load distribution. This is a good "fire and forget" method for new users who aren't as familiar with SciComp's compute environment

Format: bash bash ./run_qsub.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_qsub.sh --input "assets/samplesheet.csv" --outdir "results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 2: Local Execution:

The local method may be a better option if you are experiencing technical issues with the qsub method. qsub adds additonal layers of complexity to workflow execution, while local simply runs the workflow on your local machine or the host that you're connected to, provided it has sufficient memory/RAM and CPUs to execute the workflow

Format: bash bash ./run_local.sh --input "/path/to/samplesheet" --outdir "/path/to/output/directory" "<additional-parameters>"

Example: bash bash ./run_local.sh --input "./assets/samplesheet.csv" --outdir "./results/test" "--skip_subsample false --num_subsamples 1000 --skip_kraken2 false"

Method 3: Native Nextflow Execution:

If you are familiar with nextflow and Scicomp's computing environment, you can invoke the nextflow command straight from the terminal. NOTE: if you are using this method you will need to load up a nextflow environment via module load or conda

Format: bash nextflow run main.nf -profile singularity,local --input "/path/to/samplesheet" --outdir "/path/to/output/directory" \<additional flags\>

Example: bash nextflow run main.nf -profile singularity,local --input "./assets/samplesheet.csv" --outdir "./results/test" --skip_subsample false --num_subsamples 1000 --skip_kraken2 false

Update: Sample barplot from classification comparison module

The following barplot was generated using my new classification comparison module which is intended to plot the top 10 classification hits from alignment-based classification and kraken2 classification. The data used to generate this graph was a 10,000 read subsample from a ZYMO mock community sample:

Classification-comparison-barplot

Parameters

See below for all possible input parameters:

Global Variables: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --metagenomic_sample | boolean | true |

Workflow processes: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --skip_subsample | boolean | true | | --skip_porechop | boolean | true | | --skip_metamaps | boolean | true | | --skip_nonpareil | boolean | true | | --skip_alignment_based_filtering | boolean | true | | --skip_fastq_screen | boolean | true | | --skip_kraken2 | boolean | true | | --skip_kraken2_parse_reads_by_taxon | boolean | true | | --skip_kraken2_protein | boolean | true | | --skip_filter_by_kraken2_protein | boolean | true | | --skip_assembly | boolean | true | | --skip_medaka | boolean | true | | --skip_binning | boolean | true | | --skip_blast | boolean | true |

BBmap subsampling parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --num_subsamples | integer | 1000 |

Chopper quality trimming parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --chopper_q | integer | 20 | | --chopper_min_len | integer | 1000 | | --chopper_max_len | integer | 2147483647 |

Minimap2 parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --minimap2_index | string | '/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/minimap2/index/acastellanineff.mmi' | | --minimap2_mismatch_penalty | integer | 4 | | --seq2tax_map | string | "/scicomp/home-pure/rtq0/EMEL-GWA/Projects/LongReadAnalysis/data/kraken-db/bactarchvirfungiamoeba-DB41-mer/seqid2taxid.map" | | `--taxids| string | "./assets/tax_ids_acanth-verm-naegleria.txt" | |--mapping_quality` | integer | 10 |

MetaMaps parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --metamaps_db | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/metamaps/databases/refseqcomplete" | | `--metamapsthreads| integer | 16 | |--metamaps_mem` | integer | 120 |

Kraken2 parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --kraken_db_main | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/kraken-db/bactarchvirfungiamoeba-DB41-mer" | | `--krakendbprotein` | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/kraken-db/proteinalaninetRNAligase" | | --kraken_tax_ids | string | "./assets/taxidsacanth-verm-naegleria.txt" | | --kraken_custom_params | string | "" |

FastqScreen parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --fastq_screen_conf | string | "./assets/fastq_screen.conf" |

Assembler paramaters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --assembler | string | 'flye' |

BLAST parameters: | Parameter | Data Type | Default Value | |:---------:|:---------:|:-------------:| | --blast_db | string | "/scicomp/groups-pure/OID/NCEZID/DFWED/WDPB/EMEL/Projects/LongReadAnalysis/data/blast/arch-bact-fung-hum-amoebarefseq/arch-bact-fung-hum-amoebarefseq" | | --blast_evalue | string | "1e-10" | | --blast_perc_identity | string | "95" | | --blast_target_seqs | string | "5" |

Credits

Meta-ONT was originally written by Sam Rusher (rtq0@cdc.gov).

Citations

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Samuel Rusher
  • Login: srusher
  • Kind: user
  • Location: Frankfort, KY
  • Company: Bioinformatics Specialist with Leidos

Programs with an emphasis on bioinformatics | Experience with C#, Python, Java, SQL, R, HTML, and CSS

Citation (CITATIONS.md)

# scicomp/nfcoreskeleton: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Watch event: 1
  • Push event: 11
  • Create event: 2
Last Year
  • Watch event: 1
  • Push event: 11
  • Create event: 2

Dependencies

.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • nf-core/setup-nextflow v1 composite
.github/workflows/clean-up.yml actions
  • actions/stale v7 composite
.github/workflows/fix-linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
.github/workflows/linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • mshick/add-pr-comment v1 composite
  • nf-core/setup-nextflow v1 composite
  • psf/black stable composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact v2 composite
  • marocchino/sticky-pull-request-comment v2 composite
.github/workflows/release-announcments.yml actions
  • actions/setup-python v4 composite
  • rzr/fediverse-action master composite
  • zentered/bluesky-post-action v0.0.2 composite
modules/nf-core/blast/makeblastdb/meta.yml cpan
modules/nf-core/busco/busco/meta.yml cpan
modules/nf-core/chopper/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/flye/meta.yml cpan
modules/nf-core/krakentools/kreport2krona/meta.yml cpan
modules/nf-core/krona/krona_db/meta.yml cpan
modules/nf-core/krona/ktimporttaxonomy/meta.yml cpan
modules/nf-core/krona/ktimporttext/meta.yml cpan
modules/nf-core/maxbin2/meta.yml cpan
modules/nf-core/medaka/meta.yml cpan
modules/nf-core/megahit/meta.yml cpan
modules/nf-core/metabat2/jgisummarizebamcontigdepths/meta.yml cpan
modules/nf-core/metabat2/metabat2/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
modules/nf-core/nonpareil/curve/meta.yml cpan
modules/nf-core/nonpareil/nonpareil/meta.yml cpan
modules/nf-core/nonpareil/nonpareilcurvesr/meta.yml cpan
modules/nf-core/porechop/porechop/meta.yml cpan
modules/nf-core/quast/meta.yml cpan
modules/nf-core/samtools/fastq/meta.yml cpan
modules/nf-core/samtools/index/meta.yml cpan
modules/nf-core/samtools/sort/meta.yml cpan
modules/nf-core/spades/meta.yml cpan
pyproject.toml pypi
modules/nf-core/checkm2/predict/meta.yml cpan
modules/nf-core/krakentools/extractkrakenreads/meta.yml cpan
modules/nf-core/samtools/coverage/meta.yml cpan
modules/nf-core/samtools/depth/meta.yml cpan
modules/nf-core/blast/makeblastdb/environment.yml pypi
modules/nf-core/busco/busco/environment.yml pypi
modules/nf-core/checkm2/predict/environment.yml pypi
modules/nf-core/chopper/environment.yml pypi
modules/nf-core/flye/environment.yml pypi
modules/nf-core/krakentools/extractkrakenreads/environment.yml pypi
modules/nf-core/krakentools/kreport2krona/environment.yml pypi
modules/nf-core/krona/krona_db/environment.yml pypi
modules/nf-core/krona/ktimporttaxonomy/environment.yml pypi
modules/nf-core/krona/ktimporttext/environment.yml pypi
modules/nf-core/maxbin2/environment.yml pypi
modules/nf-core/medaka/environment.yml pypi
modules/nf-core/megahit/environment.yml pypi
modules/nf-core/metabat2/jgisummarizebamcontigdepths/environment.yml pypi
modules/nf-core/metabat2/metabat2/environment.yml pypi
modules/nf-core/nonpareil/curve/environment.yml pypi
modules/nf-core/nonpareil/nonpareil/environment.yml pypi
modules/nf-core/nonpareil/nonpareilcurvesr/environment.yml pypi
modules/nf-core/porechop/porechop/environment.yml pypi
modules/nf-core/samtools/coverage/environment.yml pypi
modules/nf-core/samtools/fastq/environment.yml pypi
modules/nf-core/samtools/index/environment.yml pypi
modules/nf-core/samtools/sort/environment.yml pypi
modules/nf-core/spades/environment.yml pypi