TRANA

A pipeline based on EMU, a taxonomic profiler optimized for long 16S rRNA reads.

https://github.com/genomic-medicine-sweden/TRANA

Last synced: 11 months ago · JSON representation ·

Repository

A pipeline based on EMU, a taxonomic profiler optimized for long 16S rRNA reads.

Basic Info

Host: GitHub
Owner: genomic-medicine-sweden
License: gpl-3.0
Language: Nextflow
Default Branch: main
Homepage:
Size: 109 MB

Statistics

Stars: 15
Watchers: 10
Forks: 4
Open Issues: 35
Releases: 4

Created over 3 years ago · Last pushed 11 months ago

Metadata Files

Readme Changelog License Citation Codeowners

TRANA

Introduction

TRANA (previously known as gms_16S) bioinformatics analysis pipeline for the EMU tool.

This Nextflow pipeline utilizes FastQC, Nanoplot, MultiQC, PorechopABI, Longfilt, EMU, and Krona. EMU is the tool that does the taxonomic profiling of 16S rRNA reads. The results are displayed with Krona. Built with Nextflow, it ensures portability and reproducibility across different computational infrastructures. It has been tested on Linux and on mac M1 (not recommended, quite slow). FastQC and Nanoplot performs quality control, PorechopABI trims adapters (optional), Longfilt filters the fastq-files such that only reads that are close to 1500 bp are used (optional), EMU assigns taxonomic classifications, and Krona visualises the result table from EMU. The pipeline enables microbial community analysis, offering insights into the diversity in samples.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Pipeline summary

Pipeline overview image

The Nanopore and shortread workflow is available. Minor testing has been done for PacBio and it seems to work. MultiQC collects only info from FastQC and some information about software versions and pipeline info.

Krona plot

Likelihood heatmap per sample Heatmap generated from from likelihood data. Each read has a likelihood that it is derived from a certain taxon. Each row sums up to 1.

sample-control bar plot comparison. Abundance If your data has one or two controls e.g., a negative and a positive control or spike, then it is possible to generate bar plots for a quick comparison between each sample and each control. This is supported for absolute abundance and relative abundance (see image below)

sample-control bar plot comparison. Relative abundance Relative abundance comparison to controls

Quick Start

Install Nextflow (>=22.10.1)
Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort. See docs).
Add you samples to an input file e.g., sample_sheet.csv. See examples.
Run make install which will gunzip all gzipped files in the database directory (assets/databases/emu_database) and the krona/taxonomy directory (assets/databases/krona/taxonomy)
Run your command:

bash nextflow run main.nf \ --input sample_sheet.csv --outdir [absolute path]/trana/results \ --db /[absolute path]/trana/assets/databases/emu_database \ --seqtype map-ont \ -profile singularity,test \ --quality_filtering \ --longread_qc_qualityfilter_minlength 1200 \ --longread_qc_qualityfilter_maxlength 1800

Runs with Nanopore barcode directories

You can run with or without a sample sheet. If no samplesheet is used, the results will be named according to the barcode. If a sample sheet is used the results will be named after whats in the second column of the sample sheet. Note that the --input flag is not needed when `--mergefastq_pass` is defined.

Run without barcode sample sheet:

bash nextflow run main.nf \ --outdir [absolute path]/trana/results \ --db /[absolute path]/trana/assets/databases/emu_database \ --seqtype map-ont \ -profile singularity,test \ --quality_filtering \ --longread_qc_qualityfilter_minlength 1200 \ --longread_qc_qualityfilter_maxlength 1800 \ --merge_fastq_pass /[absolute path]/trana/fastq_pass/

Run with barcode sample sheet:

bash nextflow run main.nf \ --outdir /[absolute path to]/trana/results \ --db /[absolute path to database]/trana/assets/databases/emu_database \ --seqtype map-ont \ -profile singularity,test \ --quality_filtering \ --longread_qc_qualityfilter_minlength 1200 \ --longread_qc_qualityfilter_maxlength 1800 \ --merge_fastq_pass /[absolute path to fastq_pass]/fastq_pass/ \ --barcodes_samplesheet /[absolute path to barcode sample sheet]/sample_sheet_merge.csv

Runs with shortreads

When running TRANA with short reads, the primer sequences are trimmed using Cutadapt by default using the provided primer sequences. The primer sequences can be provided in the sample-sheet or passed as arguments (FWprimer, RVprimer). Primer trimming with Cutadapt can be skipped with --skip_cutadapt.

bash sample,fastq_1,fastq_2,FW_primer,RV_primer SAMPLE,/absolute_path/trana/Sample_R1_001.fastq.gz,/absolute_path/trana/Sample_R2_001.fastq.gz,GTGCCAGCMGCCGCGGTAA,GGACTACNVGGGTWTCTAAT

bash nextflow run main.nf \ --input sample_sheet.csv --outdir [absolute path]/trana/results \ --db /[absolute path]/trana/assets/databases/emu_database \ --seqtype sr \ -profile singularity

bash nextflow run main.nf \ --input sample_sheet.csv --outdir [absolute path]/trana/results \ --db /[absolute path]/trana/assets/databases/emu_database \ --seqtype sr \ -profile singularity \ --FW_primer AGCTGNCCTG\ --RV_primer TGCATNCTGA

Sample sheets

There are two types of sample sheets that can be used:

If the fastq files are already concatenated/merged i.e., the fastq-files in Nanopore barcode directories have been concatenated already, the --input can be used. --input expects a .csv sample sheet with 3 columns (note the header names). It looks like this (See also the examples directory): csv sample,fastq_1,fastq_2 SAMPLE_1,/absolute_path/trana/assets/test_assets/medium_Mock_dil_1_2_BC1.fastq.gz, SAMPLE_2,/absolute_path/trana/assets/test_assets/medium_Mock_dil_1_2_BC3.fastq.gz,
If the fastq files are separated in their respective barcode folder i.e., you have several fastq files for each sample and they are organized in barcode directories in a fastqpass directory. a) If you do not want to create a sample sheet for the barcodes, then the results will be named according to the barcode folders. flag `--mergefastqpassb) If you want your own sample names on the results, then use--mergefastqpassin combination with--barcodessamplesheet. This requires a barcode sample sheet which is comma separated. See example filesamplesheetmerge.csvinexamples` for a demonstration.

Useful env variables

config NXF_WORK = working directory. # work directory, # set this to a shared place. # export NXF_WORK=/path/to/your/working/dir APPTAINER_TMPDIR NXF_SINGULARITY_CACHEDIR APPTAINER_CACHEDIR

Multiqc report

Fastqc results will be shown only for unprocessed reads. For runs using the 'map-ont' flag, qc-results from nanoplot will be shown for unprocessed and processed reads.

Useful commands for developers

Note that there is a Makefile available with a few useful commands to use when developing:

make check to run most checks that are also run on CI: pre-commit/prettier, nf-core lint, and nf-test test).
- Note: It is a good idea to run this command before pushing your changes to a new pull request!
make precommit to only run pre-commit/prettier.
lint to run the nf-core lint checks.
test to run the nf-test tests.

Tip: To see which make commands are available, you can always type make and then hit TAB twice.

Credits

TRANA was originally written by @fwa93 and is further developed and maintained by gms-mikro from Genomic Medicine Sweden: @samuell @ryanjameskennedy @sofstam @AnderssonOlivia @kdannenberg @ikarls @bokelund

This pipeline is not a formal nf-core pipeline but it partly uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. In addition, references of tools and data used in this pipeline are as follows:

Nextflow

Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 2839>

Pipeline tools

FastQC
MultiQC

Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354.>

Software packaging/containerisation tools

Anaconda

Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

Bioconda

Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018>

BioContainers

da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-R>

Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; >

EMU > Kristen D. Curry et al., “Emu: Species-Level Microbial Community Profiling > of Full-Length 16S RRNA Oxford Nanopore Sequencing Data,” Nature Methods, > June 30, 2022, 1–9, https://doi.org/10.1038/s41592-022-015>

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

Name: Genomic Medicine Sweden
Login: genomic-medicine-sweden
Kind: organization
Location: Sweden

Website: https://genomicmedicine.se/en/
Repositories: 16
Profile: https://github.com/genomic-medicine-sweden

Citation (CITATIONS.md)

# TRANA: Citations

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) community, reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

> The nf-core framework for community-curated bioinformatics pipelines.
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
> In addition, references of tools and data used in this pipeline are as follows:

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200/479)

  > Marcel, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal 17.1 (2011): pp-10. doi: 10.14806/ej.17.1.200.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

- [EMU](https://gitlab.com/treangenlab/emu)
  > Kristen D. Curry et al., “Emu: Species-Level Microbial Community Profiling of Full-Length 16S RRNA Oxford Nanopore Sequencing Data,” Nature Methods, June 30, 2022, 1–9, https://doi.org/10.1038/s41592-022-01520-4

GitHub Events

Total

Create event: 5
Issues event: 6
Delete event: 4
Member event: 1
Issue comment event: 14
Push event: 11
Gollum event: 2
Pull request review event: 5
Pull request review comment event: 4
Pull request event: 17

Last Year

Create event: 5
Issues event: 6
Delete event: 4
Member event: 1
Issue comment event: 14
Push event: 11
Gollum event: 2
Pull request review event: 5
Pull request review comment event: 4
Pull request event: 17

Issues and Pull Requests

Last synced: 11 months ago

All Time

Total issues: 0
Total pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: 22 days
Total issue authors: 0
Total pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.29
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 0
Pull requests: 7
Average time to close issues: N/A
Average time to close pull requests: 22 days
Issue authors: 0
Pull request authors: 3
Average comments per issue: 0
Average comments per pull request: 0.29
Merged pull requests: 3
Bot issues: 0
Bot pull requests: 0

TRANA

Science Score: 67.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TRANA

Introduction

Pipeline summary

Quick Start

Runs with Nanopore barcode directories

Runs with shortreads

Sample sheets

Useful env variables

Multiqc report

Useful commands for developers

Credits

Nextflow

Pipeline tools

Software packaging/containerisation tools

Citations

Owner

Citation (CITATIONS.md)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels