EMC-MetaMicrobes

Nextflow pipeline to detect and classify microbial reads in sequencing data from human samples.

https://github.com/ErasmusMC-Bioinformatics/EMC-MetaMicrobes

Science Score: 54.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.7%) to scientific vocabulary

Keywords

bioinformatics-pipeline metatranscriptomics microbiome-analysis-pipelines nextflow-pipeline
Last synced: 6 months ago · JSON representation ·

Repository

Nextflow pipeline to detect and classify microbial reads in sequencing data from human samples.

Basic Info
  • Host: GitHub
  • Owner: ErasmusMC-Bioinformatics
  • License: mit
  • Language: Nextflow
  • Default Branch: master
  • Homepage:
  • Size: 10.3 MB
Statistics
  • Stars: 3
  • Watchers: 1
  • Forks: 2
  • Open Issues: 1
  • Releases: 0
Topics
bioinformatics-pipeline metatranscriptomics microbiome-analysis-pipelines nextflow-pipeline
Created almost 2 years ago · Last pushed 6 months ago
Metadata Files
Readme License Citation

README.md

Introduction

EMC/MetaMicrobes is a bioinformatics pipeline that analyzes microbial signatures and AMR genes in metagenomic or metatranscriptomic data.

As input it requires a samplesheet with paths to paired-end, short-read, compressed FASTQ files. The pipeline performs quality control and trimming on the reads, filters out reads mapping to a specified host reference genome and taxonomically classifies the remaining reads. In addition it also detects antimicrobial resistance genes with two different approaches. As output, you receive all intermediate outputs as well as a BIOM file with the classifications and a MultiQC report of the QC metrics and tools used.

An overview of the steps implemented in MetaMicrobes is shown in the figure below: Metrochart_CanMic_overview-horizontal_mqc_amr_q2 drawio

And include the following:

  1. Quality control (Fastp and FastQC)
  2. Filter out reads mapping to a reference genome (BWA-MEM2 and Samtools)
  3. Summarize mapping statistics (Samtools)
  4. Convert SAM file to FASTQ (Samtools)
  5. Detect AMR genes based on Hidden Markov Models (fARGene)
  6. Taxonomic classification (Kraken2)
  7. Visualize Kraken2 output with Krona (KrakenTools and Krona)
  8. Re-estimation of microbial abundances (Bracken)
  9. Convert Kraken2 and Bracken outputs to BIOM (Kraken-biom)
  10. Decontaminate based on a blacklist and whitelist (QIIME2)
  11. Visualize microbial profiles with barcharts and heatmaps (QIIME2)
  12. Assess microbial alpha and beta diversity (QIIME2)
  13. Generate report with quality metrics and used tools (MultiQC)

Usage

To use MetaMicrobes on your machine, follow the steps below: 1. Make sure you have correctly set-up Nextflow and it's dependencies

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

  1. Clone this GitHub repository
  2. Prepare a samplesheet like the example below: samplesheet.csv: csv sample,fastq_1,fastq_2 CONTROL_1,BR_PVP_0705_R1.fastq.gz,BR_PVP_0705_R2.fastq.gz Each row represents a pair of fastq files. > [!TIP] > If you don't have data available yet, or you want to test the pipeline first on a small dataset, use the data that comes with this repo. This data is subsampled from 3 RNA-seq samples with varying host contents, created by Marques et al. .

[!TIP] You can use the "samplesheeter.py" script that comes with this repo, a small command line tool that prepares the samplesheet for you based on a supplied data directory.

<!-- TODO nf-core: Add documentation about samplesheeter and testdata --> 4. Download a FASTA file containing the reference genome you want to use for host depletion, for example GRCh38.

Optionally, create a BWA-MEM2 index of this reference file and built your preferred Kraken2/Bracken database. If you don't supply these to the pipeline, MetaMicrobes will index your reference genome for you and build the Kraken2/Bracken standard database.

  1. Now, you can run the MetaMicrobes pipeline using: <!-- TODO nf-core: Describe the minimum required steps to execute the pipeline, e.g. how to prepare samplesheets. Explain what rows and columns represent -->

    bash nextflow run <path/to/EMC-MetaMicrobes/directory/> \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR> \ --fasta <path/to/reference_genome_fasta>

    [!TIP] Save time by changing the default "null" values in "nextflow.config" to the paths you will use most often. Values in this file will be overwritten by the values specified in the command.

If you have a pre-built bwa-mem2 index or Kraken2/Bracken database, use a command like this: bash nextflow run <path/to/EMC-MetaMicrobes/directory/> \ -profile <docker/singularity/conda/.../institute> \ --input samplesheet.csv \ --outdir <OUTDIR> \ --fasta <path/to/reference_genome_fasta> \ --bwamem2_index <path/to/bwa_mem2_index> \ --kraken2_db <path/to/kraken2_db> \ --bracken_db <path/to/bracken_db>

If you want to change anything related to the QIIME2 downstream analysis or fARGene, use a command like this:
  ```bash
nextflow run <path/to/EMC-MetaMicrobes/directory/> \
   -profile <docker/singularity/conda/.../institute> \
   --input samplesheet.csv \
   --outdir <OUTDIR> \
   --fasta <path/to/reference_genome_fasta> \
   --bwamem2_index <path/to/bwa_mem2_index> \
   --kraken2_db <path/to/kraken2_db> \
   --bracken_db <path/to/bracken_db> \
   --whitelist <path/to/custom_whitelist> \
   --blacklist <path/to/custom_blacklist> \
   --sampling_dept 1000 \
   --metadata <path/to/metadata> \
   --fargene_hmmmodel "class_b_1_2"
```  

Credits

EMC/metamicrobes was originally written by Birgit Rijvers. <!-- TODO nf-core: If applicable, make list of people who have also contributed -->

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

A list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Erasmus Medical Center
  • Login: ErasmusMC-Bioinformatics
  • Kind: organization
  • Location: Rotterdam, The Netherlands

Bioinformatics Department

Citation (CITATIONS.md)

# EMC/metamicrobes: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools
- [Fastp](https://github.com/OpenGene/fastp)
  
  > Chen S, Zhou Y, Chen Y, Gu J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics. 2018 Sep 1;34(17):i884-i890. doi: 10.1093/bioinformatics/bty560. PMID: 30423086; PMCID: PMC6129281.


- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [BWA-MEM2](https://github.com/bwa-mem2/bwa-mem2)
  
  >Vasimuddin Md, Sanchit Misra, Heng Li, Srinivas Aluru. Efficient Architecture-Aware Acceleration of BWA-MEM for Multicore Systems. IEEE Parallel and Distributed Processing Symposium (IPDPS), 2019. 10.1109/IPDPS.2019.00041

- [Samtools](https://www.htslib.org/doc/samtools.html)
  
  >Twelve years of SAMtools and BCFtools
  Petr Danecek, James K Bonfield, Jennifer Liddle, John Marshall, Valeriu Ohan, Martin O Pollard, Andrew Whitwham, Thomas Keane, Shane A McCarthy, Robert M Davies, Heng Li
  GigaScience, Volume 10, Issue 2, February 2021, giab008, https://doi.org/10.1093/gigascience/giab008

- [fARGene](https://github.com/fannyhb/fargene)
  
  > Berglund, F., Österlund, T., Boulund, F. et al. Identification and reconstruction of novel antibiotic resistance genes from metagenomes. Microbiome 7, 52 (2019). https://doi.org/10.1186/s40168-019-0670-1

- [Kraken2](https://github.com/DerrickWood/kraken2) 
  
  > Wood, D.E., Lu, J. & Langmead, B. Improved metagenomic analysis with Kraken 2. Genome Biol 20, 257 (2019). https://doi.org/10.1186/s13059-019-1891-0

- [Bracken](https://github.com/jenniferlu717/Bracken)
  
  > Lu J, Breitwieser FP, Thielen P, Salzberg SL. 2017. Bracken: estimating species abundance in metagenomics data. PeerJ Computer Science 3:e104 https://doi.org/10.7717/peerj-cs.104

- [Krona](https://github.com/marbl/Krona)
  
  > Ondov BD, Bergman NH, Phillippy AM. Interactive metagenomic visualization in a Web browser. BMC Bioinformatics. 2011 Sep 30;12:385. https://doi.org/10.1186/1471-2105-12-385

- [kraken-biom](https://github.com/smdabdoub/kraken-biom)
  
  > Dabdoub, SM (2016). kraken-biom: Enabling interoperative format conversion for Kraken results (Version 1.2) [Software]. Available at https://github.com/smdabdoub/kraken-biom.

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [QIIME2](https://qiime2.org/)
  
  > Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, and Caporaso JG. 2019. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nature Biotechnology 37: 852–857. https://doi.org/10.1038/s41587-019-0209-9

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Push event: 2
Last Year
  • Push event: 2

Dependencies

modules/nf-core/fastp/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan
pyproject.toml pypi
modules/nf-core/bracken/bracken/meta.yml cpan
modules/nf-core/bracken/build/meta.yml cpan
modules/nf-core/bwamem2/index/meta.yml cpan
modules/nf-core/bwamem2/mem/meta.yml cpan
modules/nf-core/kraken2/kraken2/meta.yml cpan
modules/nf-core/samtools/fastq/meta.yml cpan
modules/local/krakenbiom/krakenbiom_com/meta.yml cpan
modules/nf-core/fastp/environment.yml conda
  • fastp 0.23.4.*
modules/nf-core/multiqc/environment.yml conda
  • multiqc 1.21.*
modules/local/samtools/flagstat/meta.yml cpan
modules/local/krakenbiom/krakenbiom_com/environment.yml pypi
modules/local/samtools/flagstat/environment.yml pypi
modules/nf-core/bracken/bracken/environment.yml pypi
modules/nf-core/bracken/build/environment.yml pypi
modules/nf-core/bwamem2/index/environment.yml pypi
modules/nf-core/bwamem2/mem/environment.yml pypi
modules/nf-core/kraken2/kraken2/environment.yml pypi
modules/nf-core/samtools/fastq/environment.yml pypi