16s_nbk

A Nextflow pipeline to classify bacteria based on 16S genes sequenced with ONTs native barcoding kit

https://github.com/birgitrijvers/16s_nbk

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 4 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.6%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

A Nextflow pipeline to classify bacteria based on 16S genes sequenced with ONTs native barcoding kit

Basic Info
  • Host: GitHub
  • Owner: BirgitRijvers
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 188 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed about 1 year ago
Metadata Files
Readme Changelog License Citation

README.md

Introduction

EMC/NBKsixteenS is a bioinformatics pipeline designed for processing 16S sequences generated using Oxford Nanopore Technologies (ONT). The pipeline includes host depletion, making it suitable for experiments that use the Native Barcoding Kit (NBK) instead of ONT's dedicated 16S kit.

As input it requires a samplesheet with paths to long read, compressed, FASTQ files. The pipeline performs quality control and trimming on the reads, filters out reads mapping to a specified host reference genome and taxonomically classifies the remaining reads.

NBKsisteenS includes: 1. Raw read quality control (FastQC) 2. Adapter trimming (Porechop) 3. Filtering by quality (Filtlong) 4. Visualize QC'ed data (NanoPlot) 5. Filter out reads mapping to a reference genome (minimap2 and Samtools) 6. Convert SAM file to FASTQ (Samtools) 7. Taxonomic classification (Kraken2) 8. Generate report with quality metrics and used tools (MultiQC)

To be added: - Summarize mapping statistics (Samtools) - Visualize Kraken2 output with Krona (KrakenTools and Krona) - Re-estimation of microbial abundances (Bracken)

Usage

To use NBKsixteenS on your machine, follow the steps below: 1. Make sure you have correctly set-up Nextflow and it's dependencies

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

  1. Clone this GitHub repository
  2. Prepare a samplesheet like the example below: samplesheet.csv: csv sample,fastq_1 CONTROL_1,BR_PVP_0705.fastq.gz Each row represents a fastq file. <!-- > [!TIP] > If you don't have data available yet, or you want to test the pipeline first on a small dataset, use the data that comes with this repo. This data is subsampled from 3 RNA-seq samples with varying host contents, created by Marques et al. . -->

[!TIP] You can use the "samplesheeter.py" script that comes with another repo, a small command line tool that prepares the samplesheet for you based on a supplied data directory.

<!-- TODO nf-core: Add documentation about samplesheeter and testdata --> 4. Download a FASTA file containing the reference genome you want to use for host depletion, for example GRCh38.

Optionally, create a minimap2 index of this reference file and build your preferred Kraken2/Bracken database. If you don't supply these to the pipeline, NBKsixteenS will index your reference genome for you and build the Kraken2/Bracken standard database.

  1. Now, you can run the NBKsixteenS pipeline using:

    bash nextflow run <path/to/EMC-NBKsixteenS/directory/> \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR> \ --fasta <path/to/reference_genome_fasta>

    If you have a pre-built minimap2 index or Kraken2/Bracken database, use a command like this:

    bash nextflow run <path/to/EMC-NBKsixteenS/directory/> \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR> \ --fasta <path/to/reference_genome_fasta> \ --minimap2_index <path/to/bwa_mem2_index> \ --kraken2_db <path/to/kraken2_db> \ --bracken_db <path/to/bracken_db>

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. see docs.

Credits

EMC/nbksixteens was originally written by BirgitRijvers.

Support

Please open an issue in this repository if you experience problems or have development suggestions.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Login: BirgitRijvers
  • Kind: user

Citation (CITATIONS.md)

# EMC/nbksixteens: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [Porechop](https://github.com/rrwick/Porechop)

- [NanoPlot](https://github.com/wdecoster/NanoPlot)

  >Wouter De Coster, Rosa Rademakers, NanoPack2: population-scale evaluation of long-read sequencing data, Bioinformatics, Volume 39, Issue 5, May 2023, btad311, https://doi.org/10.1093/bioinformatics/btad311

- [Filtlong](https://github.com/rrwick/Filtlong)

- [minimap2](https://github.com/lh3/minimap2)
  
  >Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. doi:10.1093/bioinformatics/bty191

- [Samtools](https://www.htslib.org/doc/samtools.html)
  
  >Twelve years of SAMtools and BCFtools
  Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li
  GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

- [Kraken2](https://github.com/DerrickWood/kraken2) 
  
  > Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0

- [Bracken](https://github.com/jenniferlu717/Bracken)
  
  > Lu J, Breitwieser FP, Thielen P, Salzberg SL. 2017. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3:e104 https://doi.org/10.7717/peerj-cs.104

- [Krona](https://github.com/marbl/Krona)
  
  > Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. https://doi.org/10.1186/1471-2105-12-385

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Push event: 4
  • Create event: 2
Last Year
  • Push event: 4
  • Create event: 2

Dependencies

modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/minimap2/align/meta.yml cpan
modules/nf-core/minimap2/index/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan
modules/nf-core/fastqc/environment.yml pypi
modules/nf-core/minimap2/align/environment.yml pypi
modules/nf-core/minimap2/index/environment.yml pypi
modules/nf-core/multiqc/environment.yml pypi