Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (12.4%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: PHemarajata
  • License: apache-2.0
  • Language: HTML
  • Default Branch: main
  • Size: 23.5 MB
Statistics
  • Stars: 0
  • Watchers: 0
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created 7 months ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

bacterial-genomics/wf-assembly-snps

GitHub release (latest by date) Nextflow run with docker run with singularity MegaLinter

workflow

General schematic of the steps in the workflow

Contents

Quick Start: Test

Run the built-in test set to confirm all parts are working as-expected. It will also download all dependencies to make subsequent runs much faster.

Pull workflow from GitHub

bash nextflow pull bacterial-genomics/wf-assembly-snps -r main

Run test workflow

bash nextflow run \ bacterial-genomics/wf-assembly-snps \ -r main \ -profile docker,test \ --outdir results

Quick Start: Run

Example command on FastAs in "new-fasta-dir" data using ParSNP with docker:

Run workflow

bash nextflow run \ bacterial-genomics/wf-assembly-snps \ -r main \ -profile docker \ --input new-fasta-dir \ --outdir my-results \ --snp_package parsnp

Run with PopPIPE-bp integration

```bash

Using all_clusters.txt with TSV file (recommended)

nextflow run \ main.nf \ -profile docker \ --poppipeoutput /path/to/poppipe/results \ --input /path/to/poppipe/rfiles.txt \ --outdir my-results \ --snppackage parsnp

Using all_clusters.txt with assembly directory

nextflow run \ main.nf \ -profile docker \ --poppipeoutput /path/to/poppipe/results \ --input /path/to/assemblies \ --outdir my-results \ --snppackage parsnp ```

Introduction

This workflow performs SNP analysis on assembled and/or annotated files (FastA/Genbank). The pipeline now includes integration with PopPIPE-bp outputs, allowing for cluster-based SNP analysis and tree grafting to generate comprehensive phylogenetic results.

Features

  • Standard SNP Analysis: Core genome alignment and phylogenetic reconstruction
  • PopPIPE-bp Integration: Full support for all_clusters.txt format with hierarchical clustering
  • Strain-Based Analysis: Parse PopPIPE-bp strain outputs and perform SNP analysis per strain
  • Tree Grafting: Combine strain-specific phylogenies using the PopPIPE algorithm
  • Recombination Detection: Identify and mask recombinant regions
  • Flexible Input: Supports both TSV files (rfiles.txt) and assembly directories
  • Relative Path Support: TSV files can use relative paths for better project organization
  • Comprehensive Outputs: Distance matrices, phylogenetic trees, and summary reports

Installation

Usage

bash nextflow run main.nf \ -profile docker \ --input <input directory> \ --ref <optional reference file> \ --outdir <directory for results> \ --snp_package <parsnp>

Please see the usage documentation for further information on using this workflow.

TSV Files with Relative Paths

The pipeline supports relative paths in TSV files (such as rfiles.txt or combined_rfile.txt). This allows for better project organization:

my_project/ ├── data/ │ └── rfiles.txt # TSV file with relative paths ├── assemblies/ │ ├── sample1.fasta │ └── sample2.fasta └── poppipe_output/ └── all_clusters.txt

Your data/rfiles.txt can reference files using relative paths: tsv sample1 ../assemblies/sample1.fasta sample2 ../assemblies/sample2.fasta

See docs/relative_paths.md for detailed information and examples.

Parameters

Note the "--" long name arguments (e.g., --help, --input, --outdir) are generally specific to this workflow's options, whereas "-" long name options (e.g., -help, -latest, -profile) are general nextflow options.

These are the most pertinent options for this workflow:

Required parameters

```console ============================================ Input/Output ============================================ --input Path to input data directory containing FastA/Genbank assemblies or samplesheet. Recognized extensions are: {fasta,fas,fna,fsa,fa} with optional gzip compression.

--ref Path to reference file in FastA format. Recognized extensions are: {fasta,fas,fna,fsa,fa} with optional gzip compression. [Default: NaN]

--outdir The output directory where the results will be saved.

============================================ Container platforms ============================================ -profile singularity Use Singularity images to run the workflow. Will pull and convert Docker images from Dockerhub if not locally available.

-profile docker Use Docker images to run the workflow. Will pull images from Dockerhub if not locally available.

============================================ Optional alignment tools ============================================ --snp_package Specify what algorithm should be used to compare input files. Recognized arguments are: parsnp. [Default: parsnp] ```

Additional parameters

View help menu of all workflow options:

bash nextflow run \ bacterial-genomics/wf-assembly-snps \ -r main \ --help \ --show_hidden_params

Resource Managers

The most well-tested and supported is a Univa Grid Engine (UGE) job scheduler with Singularity for dependency handling.

  1. UGE/SGE
    • Additional tips for UGE processing are here.
  2. No Scheduler
    • It has also been confirmed to work on desktop and laptop environments without a job scheduler using Docker with more tips here.

Output

Please see the output documentation for a table of all outputs created by this workflow.

Troubleshooting

Q: It failed, how do I find out what went wrong?

A: View file contents in the <outdir>/pipeline_info directory.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

  • Login: PHemarajata
  • Kind: user

Citation (CITATIONS 2.md)

# wf-assembly-snps: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [BioPython](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2682512/)

  > Cock PJ, Antao T, Chang JT, et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics. Jun 1 2009;25(11):1422-3. doi:10.1093/bioinformatics/btp163

- [ClonalFrameML](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4326465/)

  > Didelot X, Wilson DJ. ClonalFrameML: efficient inference of recombination in whole bacterial genomes. PLoS Comput Biol. Feb 2015;11(2):e1004041. doi:10.1371/journal.pcbi.1004041

- [Gubbins](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4330336/)

  > Croucher NJ, Page AJ, Connor TR, et al. Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins. Nucleic Acids Res. Feb 18 2015;43(3):e15. doi:10.1093/nar/gku1196

- [MUSCLE](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC390337/)

  > Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004;32(5):1792-7. doi:10.1093/nar/gkh340

- [RAxML-NG](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6821337/)

  > Kozlov AM, Darriba D, Flouri T, Morel B, Stamatakis A. RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference. Bioinformatics. Nov 01 2019;35(21):4453-4455. doi:10.1093/bioinformatics/btz305

- [Parsnp 2.0](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10862825/)

  > Kille B, Nute MG, Huang V, Kim E, Phillippy AM, Treangen TJ. Parsnp 2.0: Scalable Core-Genome Alignment for Massive Microbial Datasets. bioRxiv. Jan 31 2024;doi:10.1101/2024.01.30.577458

- [snp-dists](https://github.com/tseemann/snp-dists)

  > Seemann T. snp-dists. <https://github.com/tseemann/snp-dists>

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

### Citation Information

Citations were created with the help of EndNote and were exported in AMA format.

GitHub Events

Total
  • Push event: 4
  • Create event: 2
Last Year
  • Push event: 4
  • Create event: 2

Dependencies

modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/environment.yml pypi