oceanomics-amplicon-nf
Pipeline to analyse amplicon data from raw reads to ecologically informative phyloseq objects
https://github.com/minderoofoundation/oceanomics-amplicon-nf
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 1 DOI reference(s) in README -
✓Academic publication links
Links to: ncbi.nlm.nih.gov, aps.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.9%) to scientific vocabulary
Repository
Pipeline to analyse amplicon data from raw reads to ecologically informative phyloseq objects
Basic Info
Statistics
- Stars: 6
- Watchers: 2
- Forks: 2
- Open Issues: 0
- Releases: 3
Metadata Files
README.md
Introduction
This pipeline is used to create ASVs and ZOTUs from eDNA amplicon data, assign taxonomy to those ASVs/ZOTUs and finally produce phyloseq objects.
OceanOmics-amplicon-nf creates a phyloseq object from eDNA amplicon data.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.
Pipeline summary
- Read QC (
FastQC) - Demultiplex and trim primers with Cutadapt (
Cutadapt) - Optionally demultiplex with Obitools3 (
Obitools3) - Additional QC with Seqkit Stats (
Seqkit) - Optionally additional trimming with (
Seqtk) - Optionally additional trimming with (
Fastp) - Create ASVs with DADA2 (
DADA2) - Create ZOTUs with VSEARCH (
VSEARCH) - Optionally create ZOTUs with USEARCH (
USEARCH) - Curate ASVs/ZOTUs with LULU (
LULU) - Assign taxonomy with blastn (
blastn) - Lowest Common Ancestor (
LCA) - Phyloseq object creation (
phyloseq) - Download aquamap probabilities (
aquamaps) - Filtering of ASV read counts (
Nester Filter) - Produce final QC report (
MultiQC)
Quick Start
Install
Nextflow(>=22.10.1)Install any of
Docker,Singularity(you can follow this tutorial),Podman,ShifterorCharliecloudfor full pipeline reproducibility (you can useCondaboth to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs).Download the pipeline and test it on a minimal dataset with a single command:
bash
nextflow run MinderooFoundation/OceanOmics-amplicon-nf -profile test,YOURPROFILE --outdir <OUTDIR>
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (YOURPROFILE in the example command above). You can chain multiple config profiles in a comma-separated string.
- The pipeline comes with config profiles called
docker,singularity,podman,shifter,charliecloudandcondawhich instruct the pipeline to use the named tool for software management. For example,-profile test,docker.- If you are using
conda, it is highly recommended to use theNXF_CONDA_CACHEDIRorconda.cacheDirsettings to store the environments in a central location for future pipeline runs.
- Start running your own analysis!
bash
nextflow run MinderooFoundation/OceanOmics-amplicon-nf --input samplesheet.csv --outdir <OUDIR> --bind_dir <BINDDIR> --dbfiles "<BLASTDBFILES>" -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --fw_primer <FWPRIMER> --rv_primer <RVPRIMER>
- --fwprimer and --rvprimer parameters aren't needed if using the --assay parameter (supports
16SFish,MiFish,COILeray,16SMam, and12SV5)
bash
nextflow run MinderooFoundation/OceanOmics-amplicon-nf --input samplesheet.csv --outdir <OUDIR> --bind_dir <BINDDIR> --dbfiles "<BLASTDBFILES>" -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --assay <ASSAY>
Documentation
The OceanOmics-amplicon-nf pipeline comes with documentation about the pipeline usage, parameters and output.
Credits
This pipeline incorporates aspects of eDNAFlow, which was written by Mahsa Mousavi. OceanOmics-amplicon-nf was written by Adam Bennett. Other people who have contributed to this pipeline include Sebastian Rauschert (conceptualisation), Philipp Bayer, and Jessica Pearce. This pipeline was built using the nf-core template.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Owner
- Name: Minderoo Foundation
- Login: MinderooFoundation
- Kind: organization
- Location: Australia
- Website: https://www.minderoo.org/
- Repositories: 1
- Profile: https://github.com/MinderooFoundation
Citation (CITATIONS.md)
# OceanOmics-amplicon-nf: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) This pipeline uses code and infrastructure developed and maintained by the [nf-core](https://nf-co.re) initative, and reused here under the [MIT license](https://github.com/nf-core/tools/blob/master/LICENSE). > The nf-core framework for community-curated bioinformatics pipelines. > > Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen. > > Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x. In addition, references of tools and data used in this pipeline are as follows: ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [AdapterRemoval](https://pubmed.ncbi.nlm.nih.gov/26868221/) > Schubert M, Lindgreen S, Orlando L. AdapterRemoval v2: rapid adapter trimming, identification, and read merging. BMC Res Notes. 2016 Feb 12;9:88. doi: 10.1186/s13104-016-1900-2. PubMed PMID: 26868221; PubMed Central PMCID: PMC4751634. - [Biostrings](https://bioconductor.org/packages/release/bioc/html/Biostrings.html) > Pagès H, Aboyoun P, Gentleman R, DebRoy S (2023). Biostrings: Efficient manipulation of biological strings. R package version 2.66.0, https://bioconductor.org/packages/Biostrings. - [Blast](https://www.ncbi.nlm.nih.gov/books/NBK279690/) - [Csvtk](https://bioinf.shenwei.me/csvtk/usage/) - [Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200/479) > Martin, M (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.journal, 17(1), pp. 10-12. doi:https://doi.org/10.14806/ej.17.1.200 - [DADA2](https://www.bioconductor.org/packages/release/bioc/html/dada2.html) > Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJA, Holmes SP (2016). “DADA2: High-resolution sample inference from Illumina amplicon data.” Nature Methods, 13, 581-583. doi:10.1038/nmeth.3869. - [DECIPHER](https://bioconductor.org/packages/release/bioc/html/DECIPHER.html) > Wright ES (2016). “Using DECIPHER v2.0 to Analyze Big Biological Sequence Data in R.” The R Journal, 8(1), 352-359. - [Fastp](https://github.com/OpenGene/fastp) - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) - [joblib](https://joblib.readthedocs.io/en/stable/) - [LCA](https://github.com/mahsa-mousavi/eDNAFlow/tree/master/LCA_taxonomyAssignment_scripts) > Mousavi-Derazmahalleh M, Stott A, Lines R, Peverley G, Nester G, Simpson T, Zawierta M, De La Pierre M, Bunce M, Christophersen CT. eDNAFlow, an automated, reproducible and scalable workflow for analysis of environmental DNA sequences exploiting Nextflow and Singularity. Mol Ecol Resour. 2021 Jul;21(5):1697-1704. doi: 10.1111/1755-0998.13356. Epub 2021 Mar 9. PMID: 33580619. - [LULU](https://www.nature.com/articles/s41467-017-01312-x) > Frøslev, T. G., Kjøller, R., Bruun, H. H., Ejrnæs, R., Brunbjerg, A. K., Pietroni, C., & Hansen, A. J. (2017). Algorithm for post-clustering curation of DNA amplicon data yields reliable biodiversity estimates. Nature Communications, 8(1), 1188. - [Mmv](https://ss64.com/bash/mmv.html) - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [Obitools3](https://git.metabarcoding.org/obitools/obitools3) - [Pandas](https://pandas.pydata.org/) - [PEAR](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3933873/) > Zhang, J., Kobert, K., Flouri, T., & Stamatakis, A. (2014). PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics (Oxford, England), 30(5), 614–620. https://doi.org/10.1093/bioinformatics/btt593 - [phangorn](https://cran.r-project.org/web/packages/phangorn/index.html) - [phyloseq](https://www.bioconductor.org/packages/release/bioc/html/phyloseq.html) > McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217. http://dx.plos.org/10.1371/journal.pone.0061217. - [Seqkit](https://bioinf.shenwei.me/seqkit/) - [Seqtk](https://github.com/lh3/seqtk) - [Usearch](https://www.drive5.com/usearch/) - [Vsearch](https://pubmed.ncbi.nlm.nih.gov/27781170/) > Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016 Oct 18;4:e2584. doi: 10.7717/peerj.2584. PMID: 27781170; PMCID: PMC5075697. ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 1
- Watch event: 3
- Delete event: 1
- Push event: 36
- Pull request event: 1
- Create event: 2
Last Year
- Release event: 1
- Watch event: 3
- Delete event: 1
- Push event: 36
- Pull request event: 1
- Create event: 2