ampliseq

Amplicon sequencing analysis workflow using DADA2 and QIIME2

https://github.com/nf-core/ampliseq

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 23 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, nature.com, zenodo.org
  • Committers with academic emails
    3 of 39 committers (7.7%) from academic institutions
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (9.6%) to scientific vocabulary

Keywords

16s 18s amplicon-sequencing edna illumina iontorrent its metabarcoding metagenomics microbiome nextflow nf-core pacbio pipeline qiime2 rrna taxonomic-classification taxonomic-profiling workflow

Keywords from Contributors

workflows nf-test dsl2 pipelines atac-seq chromatin-accessibiity qc nanopore demultiplexing alignment
Last synced: 4 months ago · JSON representation ·

Repository

Amplicon sequencing analysis workflow using DADA2 and QIIME2

Basic Info
  • Host: GitHub
  • Owner: nf-core
  • License: mit
  • Language: Nextflow
  • Default Branch: master
  • Homepage: https://nf-co.re/ampliseq
  • Size: 17.5 MB
Statistics
  • Stars: 213
  • Watchers: 165
  • Forks: 141
  • Open Issues: 21
  • Releases: 27
Topics
16s 18s amplicon-sequencing edna illumina iontorrent its metabarcoding metagenomics microbiome nextflow nf-core pacbio pipeline qiime2 rrna taxonomic-classification taxonomic-profiling workflow
Created over 7 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

nf-core/ampliseq

GitHub Actions CI Status GitHub Actions Linting StatusAWS CInf-test

Cite with ZenodoCite Publication

Nextflow run with conda run with docker run with singularity Launch on Seqera Platform

Get help on SlackFollow on TwitterFollow on MastodonWatch on YouTubeWatch on YouTube

Introduction

nfcore/ampliseq is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and supports a variety of taxonomic databases for taxonomic assignment including 16S, ITS, CO1 and 18S. Phylogenetic placement is also possible. Multiple region analysis such as 5R is implemented. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.

A video about relevance, usage and output of the pipeline (version 2.1.0; 26th Oct. 2021) can also be found in YouTube and billibilli, the slides are deposited at figshare.

nf-core/ampliseq workflow overview

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website.

Pipeline summary

By default, the pipeline currently performs the following:

  • Sequencing quality control (FastQC)
  • Trimming of reads (Cutadapt)
  • Infer Amplicon Sequence Variants (ASVs) (DADA2)
  • Optional post-clustering with VSEARCH
  • Predict whether ASVs are ribosomal RNA sequences (Barrnap)
  • Phylogenetic placement (EPA-NG)
  • Taxonomical classification using DADA2; alternatives are SINTAX, Kraken2, and QIIME2
  • Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof (QIIME2)
  • Creates phyloseq R objects (Phyloseq and TreeSE)
  • Pipeline QC summaries (MultiQC)
  • Pipeline summary report (R Markdown)

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, you need to know whether the sequencing files at hand are expected to contain primer sequences (usually yes) and if yes, what primer sequences. In the example below, the paired end sequencing data was produced with 515f (GTGYCAGCMGCCGCGGTAA) and 806r (GGACTACNVGGGTWTCTAAT) primers of the V4 region of the 16S rRNA gene. Please note, that those sequences should not contain any sequencing adapter sequences, only the sequence that matches the biological amplicon.

Next, the data needs to be organized in a folder, here data, or detailed in a samplesheet (see input documentation).

Now, you can run the pipeline using:

bash nextflow run nf-core/ampliseq \ -profile <docker/singularity/.../institute> \ --input "data" \ --FW_primer "GTGYCAGCMGCCGCGGTAA" \ --RV_primer "GGACTACNVGGGTWTCTAAT" \ --outdir <OUTDIR>

[!NOTE] Adding metadata will considerably increase the output, see metadata documentation.

[!TIP] By default the taxonomic assignment will be performed with DADA2 on SILVA database, but there are various tools and databases readily available, see taxonomic classification documentation. Differential abundance testing with (ANCOM) or (ANCOM-BC) when opting in.

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details and further functionality, please refer to the usage documentation and the parameter documentation.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation.

Credits

nf-core/ampliseq was originally written by Daniel Straub (@d4straub) and Alexander Peltzer (@apeltzer) for use at the Quantitative Biology Center (QBiC) and Microbial Ecology, Center for Applied Geosciences, part of Eberhard Karls Universität Tübingen (Germany). Daniel Lundin @erikrikarddaniel (Linnaeus University, Sweden) joined before pipeline release 2.0.0 and helped to improve the pipeline considerably.

We thank the following people for their extensive assistance in the development of this pipeline (in alphabetical order):

Adam Bennett, Diego Brambilla, Emelie Nilsson, Jeanette Tångrot, Lokeshwaran Manoharan, Marissa Dubbelaar, Sabrina Krakau, Sam Minot, Till Englert

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

For further information or help, don't hesitate to get in touch on the Slack #ampliseq channel (you can join with this invite).

Citations

If you use nf-core/ampliseq for your analysis, please cite the ampliseq article as follows:

Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline

Daniel Straub, Nia Blackwell, Adrian Langarica-Fuentes, Alexander Peltzer, Sven Nahnsen, Sara Kleindienst

Frontiers in Microbiology 2020, 11:2652 doi: 10.3389/fmicb.2020.550420.

You can cite the nf-core/ampliseq zenodo record for a specific version using the following doi: 10.5281/zenodo.1493841

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: nf-core
  • Login: nf-core
  • Kind: organization
  • Email: core@nf-co.re

A community effort to collect a curated set of analysis pipelines built using Nextflow.

Citation (CITATIONS.md)

# nf-core/ampliseq: Citations

## [nf-core/ampliseq](https://pubmed.ncbi.nlm.nih.gov/33193131/)

> Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, Kleindienst S. Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline. Front Microbiol. 2020 Oct 23;11:550420. doi: 10.3389/fmicb.2020.550420. PMID: 33193131; PMCID: PMC7645116.

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

### Core tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [Cutadapt](https://journal.embnet.org/index.php/embnetjournal/article/view/200/479)

  > Marcel, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet. journal 17.1 (2011): pp-10. doi: 10.14806/ej.17.1.200.

- [Barrnap](https://github.com/tseemann/barrnap)

  > Seemann T. barrnap 0.9 : rapid ribosomal RNA prediction.

- [DADA2](https://pubmed.ncbi.nlm.nih.gov/27214047/)
  > Callahan BJ, McMurdie PJ, Rosen MJ, Han AW, Johnson AJ, Holmes SP. DADA2: High-resolution sample inference from Illumina amplicon data. Nat Methods. 2016 Jul;13(7):581-3. doi: 10.1038/nmeth.3869. Epub 2016 May 23. PMID: 27214047; PMCID: PMC4927377.

### Taxonomic classification and databases

- Classification by [QIIME2 classifier](https://pubmed.ncbi.nlm.nih.gov/29773078/)

  > Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, Huttley GA, Gregory Caporaso J. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin. Microbiome. 2018 May 17;6(1):90. doi: 10.1186/s40168-018-0470-z. PMID: 29773078; PMCID: PMC5956843.

- default: [SILVA](https://pubmed.ncbi.nlm.nih.gov/23193283/)

  > Quast C, Pruesse E, Yilmaz P, Gerken J, Schweer T, Yarza P, Peplies J, Glöckner FO. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 2013 Jan;41(Database issue):D590-6. doi: 10.1093/nar/gks1219. Epub 2012 Nov 28. PMID: 23193283; PMCID: PMC3531112.

- [Greengenes2](https://doi.org/10.1038/s41587-023-01845-1)

  > McDonald, D., Jiang, Y., Balaban, M. et al. Greengenes2 unifies microbial data in a single reference tree. Nat Biotechnol (2023). https://doi.org/10.1038/s41587-023-01845-1

- [PR2 - Protist Reference Ribosomal Database](https://pubmed.ncbi.nlm.nih.gov/23193267/)

  > Guillou L, Bachar D, Audic S, Bass D, Berney C, Bittner L, Boutte C, Burgaud G, de Vargas C, Decelle J, Del Campo J, Dolan JR, Dunthorn M, Edvardsen B, Holzmann M, Kooistra WH, Lara E, Le Bescot N, Logares R, Mahé F, Massana R, Montresor M, Morard R, Not F, Pawlowski J, Probert I, Sauvadet AL, Siano R, Stoeck T, Vaulot D, Zimmermann P, Christen R. The Protist Ribosomal Reference database (PR2): a catalog of unicellular eukaryote small sub-unit rRNA sequences with curated taxonomy. Nucleic Acids Res. 2013 Jan;41(Database issue):D597-604. doi: 10.1093/nar/gks1160. Epub 2012 Nov 27. PMID: 23193267; PMCID: PMC3531120.

- [GTDB - Genome Taxonomy Database](https://pubmed.ncbi.nlm.nih.gov/30148503/)

  > Parks DH, Chuvochina M, Waite DW, Rinke C, Skarshewski A, Chaumeil PA, Hugenholtz P. A standardized bacterial taxonomy based on genome phylogeny substantially revises the tree of life. Nat Biotechnol. 2018 Nov;36(10):996-1004. doi: 10.1038/nbt.4229. Epub 2018 Aug 27. PMID: 30148503.

- [SBDI-GTDB](https://scilifelab.figshare.com/articles/dataset/SBDI_Sativa_curated_16S_GTDB_database/14869077)

  > Lundin D, Andersson A. SBDI Sativa curated 16S GTDB database. FigShare. doi: 10.17044/scilifelab.14869077.v1

- [RDP - Ribosomal Database Project](https://pubmed.ncbi.nlm.nih.gov/24288368/)

  > Cole JR, Wang Q, Fish JA, Chai B, McGarrell DM, Sun Y, Brown CT, Porras-Alfaro A, Kuske CR, Tiedje JM. Ribosomal Database Project: data and tools for high throughput rRNA analysis. Nucleic Acids Res. 2014 Jan;42(Database issue):D633-42. doi: 10.1093/nar/gkt1244. Epub 2013 Nov 27. PMID: 24288368; PMCID: PMC3965039.

- [UNITE - eukaryotic nuclear ribosomal ITS region](https://pubmed.ncbi.nlm.nih.gov/15869663/)

  > Kõljalg U, Larsson KH, Abarenkov K, Nilsson RH, Alexander IJ, Eberhardt U, Erland S, Høiland K, Kjøller R, Larsson E, Pennanen T, Sen R, Taylor AF, Tedersoo L, Vrålstad T, Ursing BM. UNITE: a database providing web-based methods for the molecular identification of ectomycorrhizal fungi. New Phytol. 2005 Jun;166(3):1063-8. doi: 10.1111/j.1469-8137.2005.01376.x. PMID: 15869663.

- [MIDORI2 - a collection of reference databases](https://doi.org/10.1002/edn3.303/)

  > Leray, M., Knowlton, N., & Machida, R. J. (2022). MIDORI2: A collection of quality controlled, preformatted, and regularly updated reference databases for taxonomic assignment of eukaryotic mitochondrial sequences. Environmental DNA, 4, 894– 907. doi: https://doi.org/10.1002/edn3.303.

- [COIDB - CO1 Taxonomy Database](https://doi.org/10.17044/scilifelab.20514192.v2)

  > Sundh J, Manoharan L, Iwaszkiewicz-Eggebrecht E, Miraldo A, Andersson A, Ronquist F. COI reference sequences from BOLD DB. doi: https://doi.org/10.17044/scilifelab.20514192.v2.

- [PhytoRef plastid 16S rRNA database for photosynthetic eukaryotes](https://pubmed.ncbi.nlm.nih.gov/25740460/)

  > Decelle J, Romac S, Stern RF, Bendif el M, Zingone A, Audic S, Guiry MD, Guillou L, Tessier D, Le Gall F, Gourvil P, Dos Santos AL, Probert I, Vaulot D, de Vargas C, Christen R. PhytoREF: a reference database of the plastidial 16S rRNA gene of photosynthetic eukaryotes with curated taxonomy. Mol Ecol Resour. 2015 Nov;15(6):1435-45. doi: 10.1111/1755-0998.12401. Epub 2015 Apr 6. PMID: 25740460.

- [Zehr lab nifH database](http://doi.org/10.5281/zenodo.7996213)

  > M. A. Moynihan & C. Furbo Reeder 2023. nifHdada2 GitHub repository, v2.0.5. Zenodo. doi: http://doi.org/10.5281/zenodo.7996213

- [BOLD Plantae](https://boldsystems.org/)

  > Kesisoglou, G., Keisaris, S., & Pechlivanis, N. (2025). BOLD (Plantae - ITS1, ITS2, trnL) training data formatted for DADA2 [Data set]. Zenodo. doi: https://doi.org/10.5281/zenodo.15089110

### Phylogenetic placement

- [nf-core/phyloplace](https://nf-co.re/phyloplace)

  > Daniel Lundin. (2023). nf-core/phyloplace: First release (1.0.0). Zenodo. doi: https://doi.org/10.5281/zenodo.7643948

- [HMMER](https://pubmed.ncbi.nlm.nih.gov/22039361/)

  > Eddy, Sean R. “Accelerated Profile HMM Searches.” PLoS Comput Biol 7, no. 10 (October 20, 2011): e1002195. doi: https://doi.org/10.1371/journal.pcbi.1002195.

- [MAFFT](https://pubmed.ncbi.nlm.nih.gov/12136088/)

  > Katoh, Kazutaka, Kazuharu Misawa, Kei‐ichi Kuma, and Takashi Miyata. “MAFFT: A Novel Method for Rapid Multiple Sequence Alignment Based on Fast Fourier Transform.” Nucleic Acids Research 30, no. 14 (July 15, 2002): 3059–66. doi: https://doi.org/10.1093/nar/gkf436.

- [EPA-NG](https://pubmed.ncbi.nlm.nih.gov/30165689/)

  > Barbera, Pierre, Alexey M Kozlov, Lucas Czech, Benoit Morel, Diego Darriba, Tomáš Flouri, and Alexandros Stamatakis. “EPA-Ng: Massively Parallel Evolutionary Placement of Genetic Sequences.” Systematic Biology 68, no. 2 (March 1, 2019): 365–69. doi: https://doi.org/10.1093/sysbio/syy054.

- [Gappa](https://pubmed.ncbi.nlm.nih.gov/32016344/)

  > Czech, Lucas, Pierre Barbera, and Alexandros Stamatakis. “Genesis and Gappa: Processing, Analyzing and Visualizing Phylogenetic (Placement) Data.” Bioinformatics 36, no. 10 (May 1, 2020): 3263–65. doi: https://doi.org/10.1093/bioinformatics/btaa070.

### Multi region analysis (also include Greengenes 13_8 or SILVA 128)

- [q2-sidle](https://doi.org/10.1101/2021.03.23.436606)

  > Debelius, J.W.; Robeson, M.; Lhugerth, L.W.; Boulund, F.; Ye, W.; Engstrand, L. "A comparison of approaches to scaffolding multiple regions along the 16S rRNA gene for improved resolution." Preprint in BioRxiv. doi: 10.1101/2021.03.23.436606

- [SMURF](https://doi.org/10.1186/s40168-017-0396-x)

  > Fuks, G.; Elgart, M.; Amir, A.; Zeisel, A.; Turnbaugh, P.J., Soen, Y.; and Shental, N. (2018). "Combining 16S rRNA gene variable regions enables high-resolution microbial community profiling." Microbiome. 6: 17. doi: 10.1186/s40168-017-0396-x

- [RESCRIPt](https://doi.org/10.1371/journal.pcbi.1009581)

  > Robeson MS 2nd, O'Rourke DR, Kaehler BD, Ziemski M, Dillon MR, Foster JT, Bokulich NA. RESCRIPt: Reproducible sequence taxonomy reference database management. PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581. PMID: 34748542; PMCID: PMC8601625.

- [SEPP](https://doi.org/10.1128/msystems.00021-18)

  > Janssen S, McDonald D, Gonzalez A, Navas-Molina JA, Jiang L, Xu ZZ, Winker K, Kado DM, Orwoll E, Manary M, Mirarab S, Knight R. Phylogenetic Placement of Exact Amplicon Sequences Improves Associations with Clinical Information. mSystems. 2018 Apr 17;3(3):e00021-18. doi: 10.1128/mSystems.00021-18. PMID: 29719869; PMCID: PMC5904434.

### Downstream analysis

- [QIIME2](https://pubmed.ncbi.nlm.nih.gov/31341288/)

  > Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, Alexander H, Alm EJ, Arumugam M, Asnicar F, Bai Y, Bisanz JE, Bittinger K, Brejnrod A, Brislawn CJ, Brown CT, Callahan BJ, Caraballo-Rodríguez AM, Chase J, Cope EK, Da Silva R, Diener C, Dorrestein PC, Douglas GM, Durall DM, Duvallet C, Edwardson CF, Ernst M, Estaki M, Fouquier J, Gauglitz JM, Gibbons SM, Gibson DL, Gonzalez A, Gorlick K, Guo J, Hillmann B, Holmes S, Holste H, Huttenhower C, Huttley GA, Janssen S, Jarmusch AK, Jiang L, Kaehler BD, Kang KB, Keefe CR, Keim P, Kelley ST, Knights D, Koester I, Kosciolek T, Kreps J, Langille MGI, Lee J, Ley R, Liu YX, Loftfield E, Lozupone C, Maher M, Marotz C, Martin BD, McDonald D, McIver LJ, Melnik AV, Metcalf JL, Morgan SC, Morton JT, Naimey AT, Navas-Molina JA, Nothias LF, Orchanian SB, Pearson T, Peoples SL, Petras D, Preuss ML, Pruesse E, Rasmussen LB, Rivers A, Robeson MS 2nd, Rosenthal P, Segata N, Shaffer M, Shiffer A, Sinha R, Song SJ, Spear JR, Swafford AD, Thompson LR, Torres PJ, Trinh P, Tripathi A, Turnbaugh PJ, Ul-Hasan S, van der Hooft JJJ, Vargas F, Vázquez-Baeza Y, Vogtmann E, von Hippel M, Walters W, Wan Y, Wang M, Warren J, Weber KC, Williamson CHD, Willis AD, Xu ZZ, Zaneveld JR, Zhang Y, Zhu Q, Knight R, Caporaso JG. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019 Aug;37(8):852-857. doi: 10.1038/s41587-019-0209-9. Erratum in: Nat Biotechnol. 2019 Sep;37(9):1091. PMID: 31341288; PMCID: PMC7015180.

- [MAFFT](https://pubmed.ncbi.nlm.nih.gov/23329690/)

  > Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013 Apr;30(4):772-80. doi: 10.1093/molbev/mst010. Epub 2013 Jan 16. PMID: 23329690; PMCID: PMC3603318.

- [ANCOM](https://pubmed.ncbi.nlm.nih.gov/26028277/)

  > Mandal S, Van Treuren W, White RA, Eggesbø M, Knight R, Peddada SD. Analysis of composition of microbiomes: a novel method for studying microbial composition. Microb Ecol Health Dis. 2015 May 29;26:27663. doi: 10.3402/mehd.v26.27663. PMID: 26028277; PMCID: PMC4450248.

- [ANCOM-BC](https://pubmed.ncbi.nlm.nih.gov/32665548/)

  > Lin H, Peddada SD. Analysis of compositions of microbiomes with bias correction. Nat Commun. 2020 Jul 14;11(1):3514. doi: 10.1038/s41467-020-17041-7. PMID: 32665548; PMCID: PMC7360769.

- [Adonis](https://doi.org/10.1111/j.1442-9993.2001.01070.pp.x) and [VEGAN](https://CRAN.R-project.org/package=vegan)

  > Marti J Anderson. A new method for non-parametric multivariate analysis of variance. Austral ecology, 26(1):32–46, 2001.

  > Jari Oksanen, F. Guillaume Blanchet, Michael Friendly, Roeland Kindt, Pierre Legendre, Dan McGlinn, Peter R. Minchin, R. B. O’Hara, Gavin L. Simpson, Peter Solymos, M. Henry H. Stevens, Eduard Szoecs, and Helene Wagner. vegan: Community Ecology Package. 2018. R package version 2.5-3.

- [Phyloseq](https://doi.org/10.1371/journal.pone.0061217)

  > McMurdie PJ, Holmes S (2013). “phyloseq: An R package for reproducible interactive analysis and graphics of microbiome census data.” PLoS ONE, 8(4), e61217.

- [TreeSummarizedExperiment](https://doi.org/10.12688/f1000research.26669.2)

  > Huang R, Soneson C, Ernst FGM et al. TreeSummarizedExperiment: a S4 class for data with hierarchical structure [version 2; peer review: 3 approved]. F1000Research 2021, 9:1246.

### Non-default tools

- [ITSx](https://besjournals.onlinelibrary.wiley.com/doi/10.1111/2041-210X.12073)

  > Bengtsson-Palme, J., Ryberg, M., Hartmann, M., Branco, S., Wang, Z., Godhe, A., De Wit, P., Sánchez-García, M., Ebersberger, I., de Sousa, F., Amend, A., Jumpponen, A., Unterseher, M., Kristiansson, E., Abarenkov, K., Bertrand, Y.J.K., Sanli, K., Eriksson, K.M., Vik, U., Veldre, V. and Nilsson, R.H.. Improved software detection and extraction of ITS1 and ITS2 from ribosomal ITS sequences of fungi and other eukaryotes for analysis of environmental sequencing data. Methods Ecol Evol 2013, 4: 914-919. doi: 10.1111/2041-210X.12073.

- [PICRUSt2](https://pubmed.ncbi.nlm.nih.gov/32483366/)

  > Douglas GM, Maffei VJ, Zaneveld JR, Yurgel SN, Brown JR, Taylor CM, Huttenhower C, Langille MGI. PICRUSt2 for prediction of metagenome functions. Nat Biotechnol. 2020 Jun;38(6):685-688. doi: 10.1038/s41587-020-0548-6. PMID: 32483366; PMCID: PMC7365738.

- PICRUSt2 is by default using [EPA-ng](https://pubmed.ncbi.nlm.nih.gov/30165689/)

  > Barbera P, Kozlov AM, Czech L, Morel B, Darriba D, Flouri T, Stamatakis A. EPA-ng: Massively Parallel Evolutionary Placement of Genetic Sequences. Syst Biol. 2019 Mar 1;68(2):365-369. doi: 10.1093/sysbio/syy054. PMID: 30165689; PMCID: PMC6368480.

- PICRUSt2 is by default using [MinPath](https://pubmed.ncbi.nlm.nih.gov/19680427/)

  > Ye Y, Doak TG. A parsimony approach to biological pathway reconstruction/inference for genomes and metagenomes. PLoS Comput Biol. 2009 Aug;5(8):e1000465. doi: 10.1371/journal.pcbi.1000465. Epub 2009 Aug 14. PMID: 19680427; PMCID: PMC2714467.

- [VSEARCH](https://peerj.com/articles/2584/)

  > Rognes T, Flouri T, Nichols B, Quince C, Mahé F. VSEARCH: a versatile open source tool for metagenomics. PeerJ. 2016 4:e2584. doi: 10.7717/peerj.2584

- VSEARCH option usearch_global implements the [USEARCH](https://doi.org/10.1093/bioinformatics/btq461) algorithm

  > Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics. 2010 26(19) 2460-2461

- VSEARCH option sintax implements the [SINTAX](https://doi.org/10.1101/074161) algorithm

  > Edgar RC. (2016) SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS sequences, BioRxiv, 074161. Preprint.

- [Kraken2](https://pubmed.ncbi.nlm.nih.gov/31779668/)

  > Wood, D. E., Lu, J., & Langmead, B. (2019). Improved metagenomic analysis with Kraken 2. Genome biology, 20(1), 257. https://doi.org/10.1186/s13059-019-1891-0

### Summarizing software

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

## Data

- [Full-size test data](https://doi.org/10.3389/fmicb.2020.550420)
  > Straub D, Blackwell N, Langarica-Fuentes A, Peltzer A, Nahnsen S, Kleindienst S. Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline. Front Microbiol. 2020 Oct 23;11:550420. doi: 10.3389/fmicb.2020.550420. PMID: 33193131; PMCID: PMC7645116.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Create event: 14
  • Release event: 3
  • Issues event: 86
  • Watch event: 26
  • Delete event: 16
  • Issue comment event: 286
  • Push event: 64
  • Pull request review event: 139
  • Pull request review comment event: 86
  • Pull request event: 124
  • Fork event: 28
Last Year
  • Create event: 14
  • Release event: 3
  • Issues event: 86
  • Watch event: 26
  • Delete event: 16
  • Issue comment event: 286
  • Push event: 64
  • Pull request review event: 139
  • Pull request review comment event: 86
  • Pull request event: 124
  • Fork event: 28

Committers

Last synced: about 2 years ago

All Time
  • Total Commits: 1,634
  • Total Committers: 39
  • Avg Commits per committer: 41.897
  • Development Distribution Score (DDS): 0.618
Past Year
  • Commits: 509
  • Committers: 15
  • Avg Commits per committer: 33.933
  • Development Distribution Score (DDS): 0.534
Top Committers
Name Email Commits
d4straub d****b@u****e 624
daniel d****b@g****m 237
jtangrot j****t@u****e 223
Alexander Peltzer a****r@g****m 110
Daniel Straub 4****b 83
Daniel Lundin m****s@g****m 57
Alexander Peltzer a****r@u****e 36
DiegoBrambilla d****a@l****e 32
Emelie Nilsso e****n@l****e 26
Daniel Lundin e****l@g****m 24
Sateesh Peri p****h@g****m 24
nf-core-bot c****e@n****e 24
johnne j****h@s****e 14
DiegoBrambilla 3****a 13
lokeshbio l****t@g****m 13
Till Englert t****6@g****m 13
Adam Bennett a****t@m****g 12
maxulysse m****a@g****m 12
Asaf Peer a****r@g****m 10
Alexander Peltzer a****r 8
Adam Bennett 4****0 7
Sam Minot s****t@f****g 5
Daniel Lundin d****n@l****e 4
Venkat Malladi v****i@m****m 3
PhilPalmer p****l@g****m 2
drpatelh d****l@g****m 2
Colin Davenport c****n@h****m 2
Asaf Peer a****p@m****m 2
Phil Ewels p****s@s****e 2
dariader d****b@g****m 1
and 9 more...

Issues and Pull Requests

Last synced: 4 months ago

All Time
  • Total issues: 101
  • Total pull requests: 146
  • Average time to close issues: 5 months
  • Average time to close pull requests: 8 days
  • Total issue authors: 55
  • Total pull request authors: 25
  • Average comments per issue: 4.17
  • Average comments per pull request: 2.12
  • Merged pull requests: 118
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 30
  • Pull requests: 55
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 11 days
  • Issue authors: 17
  • Pull request authors: 18
  • Average comments per issue: 2.27
  • Average comments per pull request: 1.42
  • Merged pull requests: 34
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • d4straub (24)
  • erikrikarddaniel (15)
  • a4000 (5)
  • marwa38 (4)
  • andand (3)
  • jtangrot (3)
  • agrier-wcm (3)
  • Sofokli5 (3)
  • skose82 (3)
  • ulrichmarkus (2)
  • Dedaniya08 (2)
  • pragermh (2)
  • sgaleraalq (2)
  • lch14forever (2)
  • enriiquee (2)
Pull Request Authors
  • d4straub (132)
  • nf-core-bot (17)
  • jtangrot (14)
  • erikrikarddaniel (7)
  • lokeshbio (3)
  • danilodileo (3)
  • dariader (2)
  • weber8thomas (2)
  • nhenry50 (2)
  • Sofokli5 (2)
  • vaulot (2)
  • sateeshperi (2)
  • a4000 (2)
  • sminot (2)
  • tillenglert (2)
Top Labels
Issue Labels
bug (67) enhancement (56) documentation (2) unconfirmed (1) not urgent (1)
Pull Request Labels
WIP (1) enhancement (1) bug (1) documentation (1)

Dependencies

.github/workflows/awsfulltest.yml actions
  • actions/upload-artifact v3 composite
  • nf-core/tower-action v3 composite
.github/workflows/awstest.yml actions
  • actions/upload-artifact v3 composite
  • nf-core/tower-action v3 composite
.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v3 composite
  • mikepenz/action-junit-report v3 composite
  • nf-core/setup-nextflow v1 composite
.github/workflows/clean-up.yml actions
  • actions/stale v7 composite
.github/workflows/fix-linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
.github/workflows/linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v3 composite
  • actions/setup-python v4 composite
  • actions/upload-artifact v3 composite
  • mshick/add-pr-comment v1 composite
  • nf-core/setup-nextflow v1 composite
  • psf/black stable composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact v2 composite
  • marocchino/sticky-pull-request-comment v2 composite
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/cutadapt/meta.yml cpan
modules/nf-core/epang/place/meta.yml cpan
modules/nf-core/epang/split/meta.yml cpan
modules/nf-core/fastqc/meta.yml cpan
modules/nf-core/gappa/examineassign/meta.yml cpan
modules/nf-core/gappa/examinegraft/meta.yml cpan
modules/nf-core/gappa/examineheattree/meta.yml cpan
modules/nf-core/hmmer/eslalimask/meta.yml cpan
modules/nf-core/hmmer/eslreformat/meta.yml cpan
modules/nf-core/hmmer/hmmalign/meta.yml cpan
modules/nf-core/hmmer/hmmbuild/meta.yml cpan
modules/nf-core/mafft/meta.yml cpan
modules/nf-core/multiqc/meta.yml cpan
modules/nf-core/vsearch/sintax/meta.yml cpan
modules/nf-core/vsearch/usearchglobal/meta.yml cpan
subworkflows/nf-core/fasta_newick_epang_gappa/meta.yml cpan
pyproject.toml pypi