oceanomics-amplicon-nf

Pipeline to analyse amplicon data from raw reads to ecologically informative phyloseq objects

https://github.com/minderoofoundation/oceanomics-amplicon-nf

Last synced: 6 months ago · JSON representation ·

Repository

Pipeline to analyse amplicon data from raw reads to ecologically informative phyloseq objects

Basic Info

Host: GitHub
Owner: MinderooFoundation
License: mit
Language: Nextflow
Default Branch: master
Homepage:
Size: 2.57 MB

Statistics

Stars: 6
Watchers: 2
Forks: 2
Open Issues: 0
Releases: 3

Created over 2 years ago · Last pushed 6 months ago

Metadata Files

Readme Changelog Contributing License Code of conduct Citation

Introduction

This pipeline is used to create ASVs and ZOTUs from eDNA amplicon data, assign taxonomy to those ASVs/ZOTUs and finally produce phyloseq objects.

OceanOmics-amplicon-nf creates a phyloseq object from eDNA amplicon data.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Pipeline summary

Read QC (FastQC)
Demultiplex and trim primers with Cutadapt (Cutadapt)
Optionally demultiplex with Obitools3 (Obitools3)
Additional QC with Seqkit Stats (Seqkit)
Optionally additional trimming with (Seqtk)
Optionally additional trimming with (Fastp)
Create ASVs with DADA2 (DADA2)
Create ZOTUs with VSEARCH (VSEARCH)
Optionally create ZOTUs with USEARCH (USEARCH)
Curate ASVs/ZOTUs with LULU (LULU)
Assign taxonomy with blastn (blastn)
Lowest Common Ancestor (LCA)
Phyloseq object creation (phyloseq)
Download aquamap probabilities (aquamaps)
Filtering of ASV read counts (Nester Filter)
Produce final QC report (MultiQC)

Quick Start

Install Nextflow (>=22.10.1)
Install any of Docker, Singularity (you can follow this tutorial), Podman, Shifter or Charliecloud for full pipeline reproducibility (you can use Conda both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).
Download the pipeline and test it on a minimal dataset with a single command:

bash nextflow run MinderooFoundation/OceanOmics-amplicon-nf -profile test,YOURPROFILE --outdir <OUTDIR>

Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.

The pipeline comes with config profiles called docker, singularity, podman, shifter, charliecloud and conda which instruct the pipeline to use the named tool for software management. For example, -profile test,docker.

If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.

Start running your own analysis!

bash nextflow run MinderooFoundation/OceanOmics-amplicon-nf --input samplesheet.csv --outdir <OUDIR> --bind_dir <BINDDIR> --dbfiles "<BLASTDBFILES>" -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --fw_primer <FWPRIMER> --rv_primer <RVPRIMER>

--fwprimer and --rvprimer parameters aren't needed if using the --assay parameter (supports 16SFish, MiFish, COILeray, 16SMam, and 12SV5)

bash nextflow run MinderooFoundation/OceanOmics-amplicon-nf --input samplesheet.csv --outdir <OUDIR> --bind_dir <BINDDIR> --dbfiles "<BLASTDBFILES>" -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --assay <ASSAY>

Documentation

The OceanOmics-amplicon-nf pipeline comes with documentation about the pipeline usage, parameters and output.

Credits

This pipeline incorporates aspects of eDNAFlow, which was written by Mahsa Mousavi. OceanOmics-amplicon-nf was written by Adam Bennett. Other people who have contributed to this pipeline include Sebastian Rauschert (conceptualisation), Philipp Bayer, and Jessica Pearce. This pipeline was built using the nf-core template.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

Name: Minderoo Foundation
Login: MinderooFoundation
Kind: organization
Location: Australia

Website: https://www.minderoo.org/
Repositories: 1
Profile: https://github.com/MinderooFoundation

Citation (CITATIONS.md)

# OceanOmics-amplicon-nf: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) initative, and reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE).

> The nf-core framework for community-curated bioinformatics pipelines.
>
> Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
>
> Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

In addition, references of tools and data used in this pipeline are as follows:

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [AdapterRemoval](https://pubmed.ncbi.nlm.nih.gov/26868221/)

  > Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016 Feb 12;9:88. doi: 10.1186/s13104-016-1900-2. PubMed PMID: 26868221; PubMed Central PMCID: PMC4751634.

- [Biostrings](https://bioconductor.org/packages/release/bioc/html/Biostrings.html)

  > Pagès H, Aboyoun P, Gentleman R, DebRoy S (2023). Biostrings: Efficient manipulation of biological strings. R package version 2.66.0, https://bioconductor.org/packages/Biostrings.

- [Blast](https://www.ncbi.nlm.nih.gov/books/NBK279690/)

- [Csvtk](https://bioinf.shenwei.me/csvtk/usage/)

- [Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200/479)

  > Martin, M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), pp. 10-12. doi:https://doi.org/10.14806/ej.17.1.200

- [DADA2](https://www.bioconductor.org/packages/release/bioc/html/dada2.html)

  > Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016). “DADA2: High-resolution sample inference from Illumina amplicon data.” Nature Methods, 13, 581-583. doi:10.1038/nmeth.3869.

- [DECIPHER](https://bioconductor.org/packages/release/bioc/html/DECIPHER.html)

  > Wright ES (2016). “Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R.” The R Journal, 8(1), 352-359.

- [Fastp](https://github.com/OpenGene/fastp)

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

- [joblib](https://joblib.readthedocs.io/en/stable/)

- [LCA](https://github.com/mahsa-mousavi/eDNAFlow/tree/master/LCA_taxonomyAssignment_scripts)

  > Mousavi-Derazmahalleh M, Stott A, Lines R, Peverley G, Nester G, Simpson T, Zawierta M, De La Pierre M, Bunce M, Christophersen CT. eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. Mol Ecol Resour. 2021 Jul;21(5):1697-1704. doi: 10.1111/1755-0998.13356. Epub 2021 Mar 9. PMID: 33580619.

- [LULU](https://www.nature.com/articles/s41467-017-01312-x)

  > Frøslev, T. G., Kjøller, R., Bruun, H. H., Ejrnæs, R., Brunbjerg, A. K., Pietroni, C., & Hansen, A. J. (2017). Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature Communications, 8(1), 1188.

- [Mmv](https://ss64.com/bash/mmv.html)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Obitools3](https://git.metabarcoding.org/obitools/obitools3)

- [Pandas](https://pandas.pydata.org/)

- [PEAR](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933873/)

  > Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics (Oxford, England), 30(5), 614–620. https://doi.org/10.1093/bioinformatics/btt593

- [phangorn](https://cran.r-project.org/web/packages/phangorn/index.html)

- [phyloseq](https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html)

  > McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217. http://dx.plos.org/10.1371/journal.pone.0061217.

- [Seqkit](https://bioinf.shenwei.me/seqkit/)

- [Seqtk](https://github.com/lh3/seqtk)

- [Usearch](https://www.drive5.com/usearch/)

- [Vsearch](https://pubmed.ncbi.nlm.nih.gov/27781170/)
  > Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. PMID: 27781170; PMCID: PMC5075697.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total

Release event: 1
Watch event: 3
Delete event: 1
Push event: 36
Pull request event: 1
Create event: 2

Last Year

Release event: 1
Watch event: 3
Delete event: 1
Push event: 36
Pull request event: 1
Create event: 2

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science