assemblebac

avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes.

https://github.com/avantonder/assemblebac

Science Score: 67.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 5 DOI reference(s) in README
  • Academic publication links
    Links to: zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.1%) to scientific vocabulary

Keywords

assembly bacteria bacterial-genomes genomics illumina nextflow nextflow-pipeline sequencing
Last synced: 4 months ago · JSON representation ·

Repository

avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes.

Basic Info
  • Host: GitHub
  • Owner: avantonder
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Homepage:
  • Size: 4.08 MB
Statistics
  • Stars: 1
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 7
Topics
assembly bacteria bacterial-genomes genomics illumina nextflow nextflow-pipeline sequencing
Created over 4 years ago · Last pushed 5 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

avantonder/assembleBAC

avantonder/assembleBAC-ONT

Cite with Zenodo

Nextflow run with conda run with docker run with singularity

Introduction

avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes. It also predicts the Sequence Type (ST) and provides QC metrics with Quast and CheckM2.

  1. de novo genome assembly (Shovill).
  2. Sequence Type assignment (mlst)
  3. Annotation (Bakta)
  4. Assembly metrics (Quast)
  5. Assembly completeness (CheckM2)
  6. Assembly metrics, annotation and pipeline information (MultiQC)

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

You will need to Download the Bakta light database (Bakta version 1.10.4 is required to run the amrfinder_update command):

bash wget https://zenodo.org/record/7669534/files/db-light.tar.gz tar -xzf db-light.tar.gz rm db-light.tar.gz amrfinder_update --force_update --database db-light/amrfinderplus-db/

Additionally, you will need to download the CheckM2 database (CheckM2 is required):

bash checkm2 database --download --path path/to/checkm2db

An executable Python script called fastq_dir_to_samplesheet.py has been provided to auto-create an input samplesheet based on a directory containing FastQ files before you run the pipeline (requires Python 3 installed locally) e.g.

```bash wget -L https://github.com/avantonder/assembleBAC/blob/main/assets/fastqdirto_samplesheet.py

python fastqdirtosamplesheet.py <FASTQDIR> \ samplesheet.csv \ -r1 \ -r2 ```

csv title="samplesheet.csv" sample,fastq_1,fastq_2 SAMPLE_PAIRED_END,/path/to/fastq/files/sample1_1.fastq.gz,/path/to/fastq/files/sample1_2.fastq.gz SAMPLE_SINGLE_END,/path/to/fastq/files/sample2.fastq.gz,

Alternatively the samplesheet.csv file created by nf-core/fetchngs can also be used.

Now you can run the pipeline using:

bash nextflow run avantonder/assembleBAC \ -profile singularity \ -c <INSTITUTION>.config \ --input samplesheet.csv \ --genome_size <ESTIMATED GENOME SIZE e.g. 4M> \ --outdir <OUTDIR> \ --baktadb path/to/baktadb/dir \ --checkm2db path/to/checkm2db/diruniref100.KO.1.dmnd \ -resume

See usage docs for all of the available options when running the pipeline.

Documentation

The avantonder/assembleBAC pipeline comes with documentation about the pipeline usage, parameters and output.

Credits

avantonder/assembleBAC was originally written by Andries van Tonder. I wouldn't have been able to write this pipeline with out the tools, documentation, pipelines and modules made available by the fantastic nf-core community.

Feedback

If you have any issues, questions or suggestions for improving assembleBAC, please submit them to the Issue Tracker.

Citations

If you use the avantonder/assembleBAC pipeline, please cite it using the following doi: 10.5281/zenodo.15046190

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

  • Login: avantonder
  • Kind: user

Citation (CITATIONS.md)

# avantonder/assemblebac: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [Bakta](https://pubmed.ncbi.nlm.nih.gov/34739369/)
  > Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom. 2021 Nov;7(11):000685. doi: 10.1099/mgen.0.000685. PMID: 34739369; PMCID: PMC8743544.

- [CheckM2](https://pubmed.ncbi.nlm.nih.gov/37500759/)
  >Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods. 2023 Aug;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. Epub 2023 Jul 27. Erratum in: Nat Methods. 2024 Apr;21(4):735. doi: 10.1038/s41592-024-02248-z. PMID: 37500759.

- [mlst](https://github.com/tseemann/mlst)

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)
  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [Quast](https://pubmed.ncbi.nlm.nih.gov/29949969/)
  > Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266. PMID: 29949969; PMCID: PMC6022658.

- [Shovill](https://github.com/tseemann/shovill)

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Create event: 2
  • Release event: 1
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 30
  • Pull request event: 10
  • Fork event: 1
Last Year
  • Create event: 2
  • Release event: 1
  • Issues event: 1
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 30
  • Pull request event: 10
  • Fork event: 1