automated-window-sliding

Bioinformatics pipeline using a sliding window approach to create a sequence of trees from multiple sequence alignments.

https://github.com/ggruber193/automated-window-sliding

Science Score: 18.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.8%) to scientific vocabulary

Keywords

nextflow phylogenetics pipeline sliding-window tree-inference workflow
Last synced: 9 months ago · JSON representation ·

Repository

Bioinformatics pipeline using a sliding window approach to create a sequence of trees from multiple sequence alignments.

Basic Info
  • Host: GitHub
  • Owner: ggruber193
  • Language: Nextflow
  • Default Branch: master
  • Homepage:
  • Size: 2.01 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Topics
nextflow phylogenetics pipeline sliding-window tree-inference workflow
Created over 2 years ago · Last pushed over 2 years ago
Metadata Files
Readme Citation

README.md

Introduction

Automated-Window-Sliding is a bioinformatics pipeline that can be used as a starting point for sliding window based phylogenetic analysis. For this the input alignment is split into several subalignments using a sliding window approach or alternatively custom alignment ranges provided in a CSV file. For each of the subalignment windows a tree is reconstructed and in the end all trees are collected in a single Nexus/Newick file. This file can then be used to study effects that change the phylogenetic signal along the alignment, e.g. recombinations, reassortment and selection effects.

Pipeline Diagram 1 Pipeline Diagram 2

  1. Split alignment into subalignments (Python Script)
  2. Find best-fit evolutionary model for whole alignment or subalignments (IQ-TREE ModelFinder)
  3. Run tree inference on each subalignment
    1. IQ-TREE
    2. RAxML-ng
  4. Collect reconstructed trees in a single file (Python Script)
    1. Nexus
    2. Newick
    3. Both

Installation

This pipeline runs on the Nextflow Workflow System. For the installation of Nextflow, please refer to this page.

To run the pipeline the following programs are required: * Python (tested with 3.11) * Python packages: DendroPy, Biopython * IQ-TREE (RAxML-ng optional: if used for tree reconstruction)

If you do not want to manually install these dependencies you can run the pipeline with docker, singularity or conda by using -profile <docker/singularity/podman/apptainer/conda/mamba>. If you want to use your locally installed programs omit this parameter.

The pipeline can be downloaded via the nextflow pull command bash nextflow pull ggruber193/automated-window-sliding which automatically pulls the latest version of the pipeline into the folder $HOME/.nextflow/assets on your computer. This command is also used to update the pipeline to the latest version.

Alternatively the nextflow run command can be used to pull the pipeline and then run it immediately bash nextflow run ggruber193/automated-window-sliding <additional options>

You can also clone the repository and use the pipeline this way. Here you have to use the nextflow run and provide the path to the main.nf file.

bash git clone https://github.com/ggruber193/automated-window-sliding.git

bash nextflow run <path/to/cloned/repository>/main.nf <additional options>

Usage

To check if everything works correctly the pipeline can be run on a minimal test case by using -profile test:

bash nextflow run ggruber193/automated-window-sliding -profile test -outdir <OUTDIR>

You can use multiple profile options in a single run, for example -profile test,docker to run the minimal test case with docker.

To run the pipeline on your own data provide a multiple sequence alignment (only accepts FASTA, PHYLIP, NEXUS, MSF, CLUSTAL) with --input:

bash nextflow run ggruber193/automated-window-sliding --input <ALIGNMENT> --outdir <OUTDIR>

To view available pipeline parameters use: bash nextflow run ggruber193/automated-window-sliding --help

For more information about the usage and output of the pipeline refer to the full Documentation of this project.

In addition to the pipeline specific parameters there are several parameters that Nextflow provides. These are invoked with a single dash, e.g. -resume to resume a previously failed pipeline run or -qs <int> to limit the number of parallel processes. For a full overview of Nextflow CLI parameters please refer to this page or use nextflow run -h

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Owner

  • Login: ggruber193
  • Kind: user

Citation (CITATIONS.md)

# ggruber193/automated-window-sliding: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [IQ-TREE](https://doi.org/10.1093/molbev/msaa015)

  > Bui Quang Minh, Heiko A Schmidt, Olga Chernomor, Dominik Schrempf, Michael D Woodhams, Arndt von Haeseler, Robert Lanfear, IQ-TREE 2: New Models and Efficient Methods for Phylogenetic Inference in the Genomic Era, Molecular Biology and Evolution, Volume 37, Issue 5, May 2020, Pages 1530–1534, https://doi.org/10.1093/molbev/msaa015

- [IQ-TREE ModelFinder](https://doi.org/10.1038/nmeth.4285)
  > Kalyaanamoorthy, S., Minh, B., Wong, T. et al. ModelFinder: fast model selection for accurate phylogenetic estimates. Nat Methods 14, 587–589 (2017). https://doi.org/10.1038/nmeth.4285

- [RAxML-ng](https://doi.org/10.1093/bioinformatics/btz305)
  > Alexey M Kozlov, Diego Darriba, Tomáš Flouri, Benoit Morel, Alexandros Stamatakis, RAxML-NG: a fast, scalable and user-friendly tool for maximum likelihood phylogenetic inference, Bioinformatics, Volume 35, Issue 21, November 2019, Pages 4453–4455, https://doi.org/10.1093/bioinformatics/btz305

- [Biopython](https://doi.org/10.1093/bioinformatics/btp163)
  > Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon, Biopython: freely available Python tools for computational molecular biology and bioinformatics, Bioinformatics, Volume 25, Issue 11, June 2009, Pages 1422–1423, https://doi.org/10.1093/bioinformatics/btp163

- [DendroPy](10.1093/bioinformatics/btq228)
  > Sukumaran J, Holder MT. DendroPy: a Python library for phylogenetic computing. Bioinformatics. 2010 Jun 15;26(12):1569-71. doi: 10.1093/bioinformatics/btq228. Epub 2010 Apr 25. PMID: 20421198.

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Feb. 2024. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
  • Fork event: 2
Last Year
  • Issues event: 1
  • Watch event: 4
  • Fork event: 2

Dependencies

base_image/Dockerfile docker
  • debian bookworm-slim build