euryale

A pipeline for taxonomic classification and functional annotation of metagenomic reads. Based on MEDUSA

https://github.com/dalmolingroup/euryale

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov
  • Committers with academic emails
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (11.3%) to scientific vocabulary

Keywords

diamond functional-annotation kaiju metagenomics nextflow pipeline snakemake taxonomic-classification
Last synced: 6 months ago · JSON representation ·

Repository

A pipeline for taxonomic classification and functional annotation of metagenomic reads. Based on MEDUSA

Basic Info
Statistics
  • Stars: 13
  • Watchers: 1
  • Forks: 0
  • Open Issues: 2
  • Releases: 4
Topics
diamond functional-annotation kaiju metagenomics nextflow pipeline snakemake taxonomic-classification
Created almost 3 years ago · Last pushed 10 months ago
Metadata Files
Readme Changelog License Citation

README.md

nf-core CI docs

Nextflow run with docker run with singularity

EURYALE Logo

Introduction

dalmolingroup/euryale is a pipeline for taxonomic classification and functional annotation of metagenomic reads. Based on MEDUSA.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

Pipeline summary

EURYALE diagram

Pre-processing

Assembly

  • (optionally) Read assembly (MEGAHIT)

Taxonomic classification

  • Sequence classification (Kaiju)
  • Sequence classification (Kraken2)
  • Visualization (Krona)

Functional annotation

  • Sequence alignment (DIAMOND)
  • Map alignment matches to functional database (annotate)

Quick Start

  1. Install Nextflow (>=22.10.1)

  2. Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).

  3. Download the pipeline and test it on a minimal dataset with a single command:

bash nextflow run dalmolingroup/euryale -profile test,YOURPROFILE --outdir <OUTDIR>

Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.

  • The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.
  • Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use -profile <institute> in your command. This will enable either docker or singularity and set the appropriate execution settings for your local compute environment.
  • If you are using singularity, please use the nf-core download command to download images first, before running the pipeline. Setting the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options enables you to store and re-use the images from a central location for future pipeline runs.
  • If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.
  • Start running your own analysis!

bash nextflow run dalmolingroup/euryale \ --input samplesheet.csv \ --outdir <OUTDIR> \ --kaiju_db kaiju_reference \ --reference_fasta diamond_fasta \ --host_fasta host_reference_fasta \ --id_mapping id_mapping_file \ -profile <docker/singularity/podman/shifter/charliecloud/conda/institute>

Databases and references

A question that pops up a lot is: Since Euryale requires a lot of reference parameters, where can I find these references?

One option is to execute EURYALE's download entry, which will download the necessary databases for you. This is the recommended way to get started with the pipeline. This uses the same sources as EURYALE's predecessor MEDUSA.

bash nextflow run dalmolingroup/euryale \ --download_functional \ --download_kaiju \ --download_host \ --outdir <output directory> \ -entry download \ -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> Check out the full documentation for a full list of EURYALE's download parameters. In case you download the Kraken2 database (--download_kraken), make sure to extract it using the following command before using it in the pipeline:

bash tar -xvf kraken2_db.tar.gz

Below we provide a short list of places where you can find these databases. But, of course, we're not limited to these references: Euryale should be able to process your own databases, should you want to build them yourself.

Alignment

For the alignment you can either provide --diamond_db for a pre-built DIAMOND database, or you can provide --reference_fasta. For reference fasta, by default Euryale expects something like NCBI-nr, but similarly formatted reference databases should also suffice.

Taxonomic classification

At its current version, Euryale doesn't build a reference taxonomic database, but pre-built ones are supported.

  • If you're using Kaiju (the default), you can provide a reference database with --kaiju_db and provide a .tar.gz file like the ones provided in the official Kaiju website. We have extensively tested Euryale with the 2021 version of the nr database and it should work as expected.
  • If you're using Kraken2 (By supplying --run_kraken2), we expect something like the pre-built .tar.gz databases provided by the Kraken2 developers to be provided to --kraken2_db.

Functional annotation

We expect an ID mapping reference to be used within annotate. Since we're already expecting by default the NCBI-nr to be used as the alignment reference, the ID mapping data file provided by Uniprot should work well when provided to --id_mapping.

Host reference

If you're using metagenomic reads that come from a known host's microbiome, you can also provide the host's genome FASTA to --host_fasta parameter in order to enable our decontamination subworkflow. Ensembl provides easy to download genomes that can be used for this purpose. Alternatively, you can provide a pre-built BowTie2 database directory to the --bowtie2_db parameter.

Documentation

The dalmolingroup/euryale documentation is split into the following pages:

  • Usage

    • An overview of how the pipeline works, how to run it and a description of all of the different command-line flags.
  • Output

    • An overview of the different results produced by the pipeline and how to interpret them.

Credits

dalmolingroup/euryale was originally written by João Cavalcante.

We thank the following people for their extensive assistance in the development of this pipeline:

  • Diego Morais (for developing the original MEDUSA pipeline)

Citations

J. V. F. Cavalcante, I. Dantas de Souza, D. A. A. Morais and R. J. S. Dalmolin, "EURYALE: A versatile Nextflow pipeline for taxonomic classification and functional annotation of metagenomics data," 2024 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Natal, Brazil, 2024, pp. 1-7, doi: 10.1109/CIBCB58642.2024.10702116.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Dalmolin Systems Biology Group
  • Login: dalmolingroup
  • Kind: organization
  • Location: Natal, RN - Brazil

Research group in Systems Biology at UFRN

Citation (CITATIONS.md)

# dalmolingroup/euryale: Citations

## [EURYALE](https://ieeexplore.ieee.org/document/10702116)

> J. V. F. Cavalcante, I. Dantas de Souza, D. A. A. Morais and R. J. S. Dalmolin, "EURYALE: A versatile Nextflow pipeline for taxonomic classification and functional annotation of metagenomics data," 2024 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology (CIBCB), Natal, Brazil, 2024, pp. 1-7, doi: 10.1109/CIBCB58642.2024.10702116.

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Create event: 6
  • Issues event: 2
  • Release event: 2
  • Watch event: 9
  • Delete event: 1
  • Push event: 28
  • Pull request review event: 1
  • Pull request event: 5
Last Year
  • Create event: 6
  • Issues event: 2
  • Release event: 2
  • Watch event: 9
  • Delete event: 1
  • Push event: 28
  • Pull request review event: 1
  • Pull request event: 5

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 35
  • Total Committers: 1
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.0
Past Year
  • Commits: 35
  • Committers: 1
  • Avg Commits per committer: 35.0
  • Development Distribution Score (DDS): 0.0
Top Committers
Name Email Commits
João Vitor j****v@g****m 35

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 2
  • Total pull requests: 28
  • Average time to close issues: 40 minutes
  • Average time to close pull requests: about 2 hours
  • Total issue authors: 2
  • Total pull request authors: 1
  • Average comments per issue: 0.0
  • Average comments per pull request: 0.0
  • Merged pull requests: 28
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • jvfe (2)
  • kunstner (1)
Pull Request Authors
  • jvfe (40)
  • mribeirodantas (1)
Top Labels
Issue Labels
enhancement (1)
Pull Request Labels

Dependencies

.github/workflows/ci.yml actions
  • actions/checkout v2 composite
  • eWaterCycle/setup-singularity v5 composite
  • nf-core/setup-nextflow v1 composite
.github/workflows/docs.yml actions
  • actions/checkout v2 composite
  • actions/setup-python v2 composite
modules/nf-core/bowtie2/align/meta.yml cpan
modules/nf-core/bowtie2/build/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/diamond/blastx/meta.yml cpan
modules/nf-core/diamond/makedb/meta.yml cpan
modules/nf-core/fastp/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/gunzip/meta.yml cpan
modules/nf-core/kaiju/kaiju/meta.yml cpan
modules/nf-core/kaiju/kaiju2krona/meta.yml cpan
modules/nf-core/kaiju/kaiju2table/meta.yml cpan
modules/nf-core/krona/ktimporttext/meta.yml cpan
modules/nf-core/megahit/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
modules/nf-core/samtools/bam2fq/meta.yml cpan
modules/nf-core/samtools/sort/meta.yml cpan
modules/nf-core/untar/meta.yml cpan
dockerfiles/annotate/Dockerfile docker
  • python 3.9.16-slim-buster build
dockerfiles/dictionary/Dockerfile docker
  • r-base 4.3.0 build
pyproject.toml pypi
modules/nf-core/kraken2/kraken2/meta.yml cpan
modules/nf-core/krakentools/kreport2krona/meta.yml cpan
modules/nf-core/fastx/collapser/meta.yml cpan
modules/nf-core/kraken2/kraken2/environment.yml conda
  • kraken2 2.1.2.*
  • pigz 2.6.*
modules/nf-core/krakentools/kreport2krona/environment.yml conda
  • krakentools 1.2.*
modules/nf-core/fastx/collapser/environment.yml pypi
modules/nf-core/multiqc/environment.yml pypi