tada

TADA - Targeted Amplicon Diversity Analysis - a DADA2-focused Nextflow workflow for any targeted amplicon region

https://github.com/h3abionet/tada

Science Score: 77.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
✓
DOI references
Found 3 DOI reference(s) in README
✓
Academic publication links
Links to: ncbi.nlm.nih.gov, zenodo.org
✓
Committers with academic emails
6 of 13 committers (46.2%) from academic institutions
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (10.9%) to scientific vocabulary

Last synced: 6 months ago · JSON representation ·

Repository

TADA - Targeted Amplicon Diversity Analysis - a DADA2-focused Nextflow workflow for any targeted amplicon region

Basic Info

Host: GitHub
Owner: h3abionet
License: mit
Language: Nextflow
Default Branch: main
Homepage:
Size: 2.37 MB

Statistics

Stars: 22
Watchers: 9
Forks: 14
Open Issues: 33
Releases: 1

Created over 6 years ago · Last pushed 6 months ago

Metadata Files

Readme License Citation

TADA - Targeted Amplicon Diversity Analysis using DADA2, implemented in Nextflow

NOTE: We are working on a DSL2 implementation using the nf-core tools on separate branches. Based on some differences in focus we don't currently anticipate combining this with the nf-core ampliseq workflow, though we may revisit this in the future. In the meantime: We will continue to address critical bugs on this branch, but the majority of effort will be in converting the workflow to DSL2

A dada2-based workflow using the Nextflow workflow manager for Targeted Amplicon Diversity Analysis.

Badges

| fair-software.nl recommendations | | | ------------------------------------------------------- | --------------------------- | |(1/5) code repository || |(2/5) license || |(3/5) community registry | bio.tools Registry | |(4/5) citation || |(5/5) checklist | | |overall | | GitHub Actions |Docker build | | |Continuous integration | |

Basic usage:

The latest help menu can be accessed using nextflow run h3abionet/TADA --help.

``` Usage:

This pipeline can be run specifying parameters in a config file or with command line flags. The typical example for running the pipeline with command line flags is as follows:

nextflow run h3abionet/TADA --reads '*_R{1,2}.fastq.gz' --trimFor 24 --trimRev 25 \
  --reference 'gg_13_8_train_set_97.fa.gz' -profile uct_hex

The typical command for running the pipeline with your own config (instead of command line flags) is as follows:

nextflow run h3abionet/TADA -c dada2_user_input.config -profile uct_hex

where 'dada2userinput.config' is the configuration file (see example 'dada2userinput.config')

NB: '-profile uct_hex' still needs to be specified from the command line

Parameters

Mandatory arguments: -profile Hardware config to use. Currently profile available for UCT's HPC 'ucthex' and UIUC's 'uiucsingularity' - create your own if necessary NB -profile should always be specified on the command line, not in the config file

Input (mandatory): Additionally, only one of the following must be specified: --reads Path to FASTQ read input data. If the data are single-end, set '--single-end' to true. --input Path to a sample sheet (CSV); sample sheet columns must have a headers with 'id,fastq1,fastq2'.
--seqTables Path to input R/dada2 sequence tables. Only sequence tables with the original ASV sequences as the identifier are supported

Output location: --outdir The output directory where the results will be saved

Read preparation parameters: --trimFor integer. Headcrop of read1 (set 0 if no trimming is needed) --trimRev integer. Headcrop of read2 (set 0 if no trimming is needed) --truncFor integer. Truncate read1 here (i.e. if you want to trim 10bp off the end of a 250bp R1, truncFor should be set to 240). Enforced before trimFor/trimRev --truncRev integer. Truncate read2 here ((i.e. if you want to trim 10bp off the end of a 250bp R2, truncRev should be set to 240). Enforced before trimFor/trimRev --maxEEFor integer. After truncation, R1 reads with higher than maxEE "expected errors" will be discarded. EE = sum(10^(-Q/10)), default=2 --maxEERev integer. After truncation, R1 reads with higher than maxEE "expected errors" will be discarded. EE = sum(10^(-Q/10)), default=2 --truncQ integer. Truncate reads at the first instance of a quality score less than or equal to truncQ; default=2 --maxN integer. Discard reads with more than maxN number of Ns in read; default=0 --maxLen integer. Maximum length of trimmed sequence; maxLen is enforced before trimming and truncation; default=Inf (no maximum) --minLen integer. Minimum length enforced after trimming and truncation; default=50 --rmPhiX {"T","F"}. remove PhiX from read

In addition due to modifications needed for variable-length sequences (ITS), the following are also supported.  Note if these are set,
one should leave '--trimFor/--trimRev' set to 0.

--fwdprimer                   Provided when sequence-specific trimming is required (e.g. ITS sequences using cutadapt).  Experimental
--revprimer                   Provided when sequence-specific trimming is required (e.g. ITS sequences using cutadapt).  Experimental

Read merging: --minOverlap integer. minimum length of the overlap required for merging R1 and R2; default=20 (dada2 package default=12) --maxMismatch integer. The maximum mismatches allowed in the overlap region; default=0 --trimOverhang {"T","F"}. If "T" (true), "overhangs" in the alignment between R1 and R2 are trimmed off. "Overhangs" are when R2 extends past the start of R1, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region. Default="F" (false)

Error models: --qualityBinning Binned quality correction (e.g. NovaSeq/NextSeq). default: false --errorModel NYI. Error model to use (one of 'illumina', 'illumina-binned', 'pacbio-ccs', 'custom'). This will replace '--qualityBinning'

Denoising using dada: --dadaOpt.XXX Set as e.g. --dadaOpt.HOMOPOLYMERGAPPENALTY=-1 Global defaults for the dada function, see ?setDadaOpt in R for available options and their defaults --pool Should sample pooling be used to aid identification of low-abundance ASVs? Options are pseudo pooling: "pseudo", true: "T", false: "F"

Merging arguments (optional): --minOverlap The minimum length of the overlap required for merging R1 and R2; default=20 (dada2 package default=12) --maxMismatch The maximum mismatches allowed in the overlap region; default=0. --trimOverhang If "T" (true), "overhangs" in the alignment between R1 and R2 are trimmed off. "Overhangs" are when R2 extends past the start of R1, and vice-versa, as can happen when reads are longer than the amplicon and read into the other-direction primer region. Default="F" (false) --minMergedLen Minimum length of fragment after merging; default = 0 (no minimum) --maxMergedLen Maximum length of fragment after merging; default = 0 (no maximum)

ASV identifiers: --idType The ASV IDs are renamed to simplify downstream analysis, in particular with downstream tools. The default is "md5" which will run MD5 on the sequence and generate a QIIME2-like unique hash. Alternatively, this can be set to "ASV", which simply renames the sequences in sequencial order.

Taxonomic arguments. If unset, taxonomic assignment is skipped --taxassignment Taxonomic assignment method. default = 'rdp' --reference Path to taxonomic database to be used for annotation (e.g. gg138trainset_97.fa.gz). default = false --species Specify path to fasta file. See dada2 addSpecies() for more detail. default = false --minBoot Minimum bootstrap value. default = 50 --taxLevels Listing of taxonomic levels for 'assignTaxonomy'. Experimental.

Chimera detection: --skipChimeraDetection Skip chimera detection/removal; default = false --removeBimeraDenovoOpts Additional removeBimeraDenovo options; default = ''

ASV multiple sequence alignment: --skipAlignment Skip alignment step; note this also skips ML phylogenetic analysis. default = false --aligner Aligner to use, options are 'DECIPHER' or 'infernal'. default = 'DECIPHER' --infernalCM Covariance model (Rfam-compliant) to use. default = false.

Phylogenetic analysis: --runTree Tool for ML phylogenetic analysis. Options are 'phangorn' and 'fasttree'. default = 'phangorn'

Additional output: --toBIOM Generate a BIOM v1 compliant output. default = true --toQIIME2 Generate QZA artifacts for all data for use in QIIME2. default = false

Sample names: --sampleRegex Modify sample names based on a regular expression. default = false. Note this option is deprecated in favor of using a sample sheet.

Additional options: --email Set this parameter to your e-mail address to get a summary e-mail with details of the run sent to you when the workflow exits -name Name for the pipeline run. If not specified, Nextflow will automatically generate a random mnemonic.

Help: --help Will print out summary above when executing nextflow run uct-cbio/16S-rDNA-dada2-pipeline ```

Prerequisites

Nextflow (>=20.11.0) with either Singularity (>3.4.1) or Docker (>20.10.1, though we recommend the latest based on security updates). We don't directly support non-containerized options (locally installed tools, bioconda) though these could work based on configuration.

Documentation

The h3abionet/TADA pipeline comes with documentation about the pipeline, found in the docs/ directory:

Built With

Credits

The initial implementation of the DADA2 pipeline as a Nextflow workflow (https://github.com/HPCBio/16S-rDNA-dada2-pipeline) was done by Chris Fields from the High Performance Computating in Biology group at the University of Illinois (http://www.hpcbio.illinois.edu). Please remember to cite the authors of DADA2 when using this pipeline. Further development to the Nextflow workflow and containerisation in Docker and Singularity for implementation on UCT's HPC was done by Dr Katie Lennard and Gerrit Botha, with inspiration and code snippets from Phil Ewels http://nf-co.re/

Contributors

The following have contributed to the development, testing, and deployment of this workflow. For the most up-to-date listing see the Contributors link.

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Owner

Name: H3ABioNet
Login: h3abionet
Kind: organization

Website: https://www.h3abionet.org/
Twitter: H3ABioNet
Repositories: 37
Profile: https://github.com/h3abionet

Citation (CITATION.bib)

@article{10.1371/journal.pcbi.1008640,
    doi = {10.1371/journal.pcbi.1008640},
    author = {Ras, Verena AND Botha, Gerrit AND Aron, Shaun AND Lennard, Katie AND Allali, Imane AND Claassen-Weitz, Shantelle AND Mwaikono, Kilaza Samson AND Kennedy, Dane AND Holmes, Jessica R. AND Rendon, Gloria AND Panji, Sumir AND Fields, Christopher J AND Mulder, Nicola},
    journal = {PLOS Computational Biology},
    publisher = {Public Library of Science},
    title = {Using a multiple-delivery-mode training approach to develop local capacity and infrastructure for advanced bioinformatics in Africa},
    year = {2021},
    month = {02},
    volume = {17},
    url = {https://doi.org/10.1371/journal.pcbi.1008640},
    pages = {1-11},
    abstract = {With more microbiome studies being conducted by African-based research groups, there is an increasing demand for knowledge and skills in the design and analysis of microbiome studies and data. However, high-quality bioinformatics courses are often impeded by differences in computational environments, complicated software stacks, numerous dependencies, and versions of bioinformatics tools along with a lack of local computational infrastructure and expertise. To address this, H3ABioNet developed a 16S rRNA Microbiome Intermediate Bioinformatics Training course, extending its remote classroom model. The course was developed alongside experienced microbiome researchers, bioinformaticians, and systems administrators, who identified key topics to address. Development of containerised workflows has previously been undertaken by H3ABioNet, and Singularity containers were used here to enable the deployment of a standard replicable software stack across different hosting sites. The pilot ran successfully in 2019 across 23 sites registered in 11 African countries, with more than 200 participants formally enrolled and 106 volunteer staff for onsite support. The pulling, running, and testing of the containers, software, and analyses on various clusters were performed prior to the start of the course by hosting classrooms. The containers allowed the replication of analyses and results across all participating classrooms running a cluster and remained available posttraining ensuring analyses could be repeated on real data. Participants thus received the opportunity to analyse their own data, while local staff were trained and supported by experienced experts, increasing local capacity for ongoing research support. This provides a model for delivering topic-specific bioinformatics courses across Africa and other remote/low-resourced regions which overcomes barriers such as inadequate infrastructures, geographical distance, and access to expertise and educational materials.},
    number = {2},

}

GitHub Events

Total

Issues event: 27
Watch event: 2
Issue comment event: 19
Push event: 97
Create event: 5

Last Year

Issues event: 27
Watch event: 2
Issue comment event: 19
Push event: 97
Create event: 5

Committers

Last synced: over 2 years ago

All Time

Total Commits: 594
Total Committers: 13
Avg Commits per committer: 45.692
Development Distribution Score (DDS): 0.556

Past Year

Commits: 35
Committers: 3
Avg Commits per committer: 11.667
Development Distribution Score (DDS): 0.114

Top Committers

Name	Email	Commits
Chris Fields	c**s@i**u	264
Katie Lennard	k**n@g**m	229
Gerrit Botha	g**a@g**m	59
Chris Fields	c****s	15
Jessica Holmes	k**7@g**m	11
Wojtek Bazant	w**t@v**u	4
Gerrit Botha	g**a@u**a	2
Gloria Rendon	g**n@i**u	2
Paolo Di Tommaso	p**o@g**m	2
Wojtek Bażant	w**i@g**m	2
Ziyaad Parker	z**r@y**k	2
]	k**e@d**a	1
Lindsay Clark	l**k@i**u	1

Committer Domains (Top 20 + Academic)

illinois.edu: 3 dev-igisoro.cbio.uct.ac.za: 1 yahoo.co.uk: 1 uct.ac.za: 1 vet.upenn.edu: 1

Issues and Pull Requests

Last synced: 6 months ago

All Time

Total issues: 21
Total pull requests: 1
Average time to close issues: over 1 year
Average time to close pull requests: N/A
Total issue authors: 2
Total pull request authors: 1
Average comments per issue: 0.33
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 17
Pull requests: 1
Average time to close issues: about 22 hours
Average time to close pull requests: N/A
Issue authors: 2
Pull request authors: 1
Average comments per issue: 0.12
Average comments per pull request: 0.0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

View more stats

Top Authors

Issue Authors

cjfields (20)
NeginValizadegan (2)

Pull Request Authors

cjfields (1)
Charmy0619 (1)

Top Labels

Issue Labels

DSL2 (5) enhancement (2) good first issue (2) bug (1) high priority (1) documentation (1) wontfix (1) needs documentation (1)

Pull Request Labels

DSL2 (1)

Dependencies

.github/workflows/ci.yml actions

actions/checkout v2 composite
actions/setup-java v2 composite

.github/workflows/docker.yml actions

actions/checkout v2 composite
docker/build-push-action ad44023a93711e3deb337508980b4b5e9bcdc5dc composite
docker/login-action 28218f9b04b4f3f62068d7b6ce6ca5b26e35336c composite
docker/metadata-action 98669ae865ea3cffbcbaa878cf57c20bbf1c6c38 composite

dockerfiles/R/Dockerfile docker

rocker/verse 4.1.1 build

dockerfiles/qiime2/Dockerfile docker

qiime2/core latest build

tada

Science Score: 77.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

TADA - Targeted Amplicon Diversity Analysis using DADA2, implemented in Nextflow

Badges

Basic usage:

Prerequisites

Documentation

Built With

Credits

Contributors

License

Owner

Citation (CITATION.bib)

GitHub Events

Total

Last Year

Committers

All Time

Past Year

Top Committers

Committer Domains (Top 20 + Academic)

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels

Dependencies