assemblebac
avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes.
Science Score: 67.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 5 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.1%) to scientific vocabulary
Keywords
Repository
avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes.
Basic Info
Statistics
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 7
Topics
Metadata Files
README.md
avantonder/assembleBAC
Introduction
avantonder/assembleBAC is a bioinformatics best-practise analysis pipeline for assembling and annotating bacterial genomes. It also predicts the Sequence Type (ST) and provides QC metrics with Quast and CheckM2.
- de novo genome assembly (
Shovill). - Sequence Type assignment (
mlst) - Annotation (
Bakta) - Assembly metrics (
Quast) - Assembly completeness (
CheckM2) - Assembly metrics, annotation and pipeline information (
MultiQC)
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
You will need to Download the Bakta light database (Bakta version 1.10.4 is required to run the amrfinder_update command):
bash
wget https://zenodo.org/record/7669534/files/db-light.tar.gz
tar -xzf db-light.tar.gz
rm db-light.tar.gz
amrfinder_update --force_update --database db-light/amrfinderplus-db/
Additionally, you will need to download the CheckM2 database (CheckM2 is required):
bash
checkm2 database --download --path path/to/checkm2db
An executable Python script called fastq_dir_to_samplesheet.py has been provided to auto-create an input samplesheet based on a directory containing FastQ files before you run the pipeline (requires Python 3 installed locally) e.g.
```bash wget -L https://github.com/avantonder/assembleBAC/blob/main/assets/fastqdirto_samplesheet.py
python fastqdirtosamplesheet.py <FASTQDIR> \
samplesheet.csv \
-r1
csv title="samplesheet.csv"
sample,fastq_1,fastq_2
SAMPLE_PAIRED_END,/path/to/fastq/files/sample1_1.fastq.gz,/path/to/fastq/files/sample1_2.fastq.gz
SAMPLE_SINGLE_END,/path/to/fastq/files/sample2.fastq.gz,
Alternatively the samplesheet.csv file created by nf-core/fetchngs can also be used.
Now you can run the pipeline using:
bash
nextflow run avantonder/assembleBAC \
-profile singularity \
-c <INSTITUTION>.config \
--input samplesheet.csv \
--genome_size <ESTIMATED GENOME SIZE e.g. 4M> \
--outdir <OUTDIR> \
--baktadb path/to/baktadb/dir \
--checkm2db path/to/checkm2db/diruniref100.KO.1.dmnd \
-resume
See usage docs for all of the available options when running the pipeline.
Documentation
The avantonder/assembleBAC pipeline comes with documentation about the pipeline usage, parameters and output.
Credits
avantonder/assembleBAC was originally written by Andries van Tonder. I wouldn't have been able to write this pipeline with out the tools, documentation, pipelines and modules made available by the fantastic nf-core community.
Feedback
If you have any issues, questions or suggestions for improving assembleBAC, please submit them to the Issue Tracker.
Citations
If you use the avantonder/assembleBAC pipeline, please cite it using the following doi: 10.5281/zenodo.15046190
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Owner
- Login: avantonder
- Kind: user
- Repositories: 6
- Profile: https://github.com/avantonder
Citation (CITATIONS.md)
# avantonder/assemblebac: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [Bakta](https://pubmed.ncbi.nlm.nih.gov/34739369/) > Schwengers O, Jelonek L, Dieckmann MA, Beyvers S, Blom J, Goesmann A. Bakta: rapid and standardized annotation of bacterial genomes via alignment-free sequence identification. Microb Genom. 2021 Nov;7(11):000685. doi: 10.1099/mgen.0.000685. PMID: 34739369; PMCID: PMC8743544. - [CheckM2](https://pubmed.ncbi.nlm.nih.gov/37500759/) >Chklovski A, Parks DH, Woodcroft BJ, Tyson GW. CheckM2: a rapid, scalable and accurate tool for assessing microbial genome quality using machine learning. Nat Methods. 2023 Aug;20(8):1203-1212. doi: 10.1038/s41592-023-01940-w. Epub 2023 Jul 27. Erratum in: Nat Methods. 2024 Apr;21(4):735. doi: 10.1038/s41592-024-02248-z. PMID: 37500759. - [mlst](https://github.com/tseemann/mlst) - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [Quast](https://pubmed.ncbi.nlm.nih.gov/29949969/) > Mikheenko A, Prjibelski A, Saveliev V, Antipov D, Gurevich A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics. 2018 Jul 1;34(13):i142-i150. doi: 10.1093/bioinformatics/bty266. PMID: 29949969; PMCID: PMC6022658. - [Shovill](https://github.com/tseemann/shovill) ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Create event: 2
- Release event: 1
- Issues event: 1
- Watch event: 1
- Issue comment event: 2
- Push event: 30
- Pull request event: 10
- Fork event: 1
Last Year
- Create event: 2
- Release event: 1
- Issues event: 1
- Watch event: 1
- Issue comment event: 2
- Push event: 30
- Pull request event: 10
- Fork event: 1
