treeval

Pipelines for the production of Treeval data

https://github.com/sanger-tol/treeval

Science Score: 65.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 10 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
    Organization sanger-tol has institutional domain (www.sanger.ac.uk)
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.5%) to scientific vocabulary

Keywords

curation genome-alignment genome-assembly genomics nextflow pipeline quality-control synteny
Last synced: 6 months ago · JSON representation ·

Repository

Pipelines for the production of Treeval data

Basic Info
Statistics
  • Stars: 26
  • Watchers: 7
  • Forks: 5
  • Open Issues: 38
  • Releases: 8
Topics
curation genome-alignment genome-assembly genomics nextflow pipeline quality-control synteny
Created over 3 years ago · Last pushed 6 months ago
Metadata Files
Readme Changelog Contributing License Code of conduct Citation

README.md

sanger-tol/treeval

GitHub Actions CI Status GitHub Actions Linting StatusCite with Zenodo nf-test

Nextflow nf-core template version run with conda run with docker run with singularity Launch on Seqera Platform

Introduction

sanger-tol/treeval [1.2.0 - Ancient Destiny-] is a bioinformatics best-practice analysis pipeline for the generation of data supplemental to the curation of reference quality genomes. This pipeline has been written to generate flat files compatible with JBrowse2 as well as HiC maps for use in Juicebox, PretextView and HiGlass.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!

You can also set up and attempt to run the pipeline here: https://gitpod.io/#https://github.com/BGAcademy23/treeval-curation This is a gitpod set up for BGA23 with a version of TreeVal, although for now gitpod will not run a nextflow pipeline die to issues with using singularity. We will be replacing this with an AWS instance soon.

The treeval pipeline has a sister pipeline currently named curationpretext which acts to regenerate the pretext maps and accessory files during genomic curation in order to confirm interventions. This pipeline is sufficiently different to the treeval implementation that it is written as it's own pipeline.

  1. Parse input yaml ( YAML_INPUT )
  2. Generate my.genome file ( GENERATE_GENOME )
  3. Generate insilico digests of the input assembly ( INSILICO_DIGEST )
  4. Generate gene alignments with high quality data against the input assembly ( GENE_ALIGNMENT )
  5. Generate a repeat density graph ( REPEAT_DENSITY )
  6. Generate a gap track ( GAP_FINDER )
  7. Generate a map of self complementary sequence ( SELFCOMP )
  8. Generate syntenic alignments with a closely related high quality assembly ( SYNTENY )
  9. Generate a coverage track using PacBio data ( LONGREAD_COVERAGE )
  10. Generate HiC maps, pretext and higlass using HiC cram files ( HIC_MAPPING )
  11. Generate a telomere track based on input motif ( TELO_FINDER )
  12. Run Busco and convert results into bed format ( BUSCO_ANNOTATION )
  13. Ancestral Busco linkage if available for clade ( BUSCOANNOTATION:ANCESTRALGENE )
  14. Count KMERs with FastK and plot the spectra using MerquryFK ( KMER )
  15. Generate a coverge track using KMER data ( KMERREADCOVERAGE )

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Currently, it is advised to run the pipeline with docker or singularity as a small number of major modules do not currently have a conda env associated with them.

Now, you can run the pipeline using:

```bash

For the FULL pipeline

nextflow run main.nf -profile singularity --input treeval.yaml --outdir {OUTDIR}

For the RAPID subset

nextflow run main.nf -profile singularity --input treeval.yaml -entry RAPID --outdir {OUTDIR} ```

An example treeval.yaml can be found here.

Further documentation about the pipeline can be found in the following files: usage, parameters and output.

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

Credits

sanger-tol/treeval has been written by Damon-Lee Pointon (@DLBPointon), Yumi Sims (@yumisims) and William Eagles (@weaglesBio).

We thank the following people for their extensive assistance in the development of this pipeline:

  • @gq1 - For building the infrastructure around TreeVal and helping with code review
  • @ksenia-krasheninnikova - For help with C code implementation and YAML parsing
  • @mcshane - For guidance on algorithms
  • @muffato - For code reviews and code support
  • @priyanka-surana - For help with the majority of code reviews and code support

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

If you use sanger-tol/treeval for your analysis, please cite it using the following doi: 10.5281/zenodo.10047653.

Tools

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows: This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: Tree of Life programme
  • Login: sanger-tol
  • Kind: organization
  • Location: United Kingdom

The Tree of Life Programme investigates the diversity of complex organisms (eukaryotes) through sequencing and cellular technology

Citation (CITATIONS.md)

# sanger-tol/treeval: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels, P. et al. 2020. ‘The NF-core framework for community-curated bioinformatics pipelines’, Nature Biotechnology, 38(3), pp. 276–278. doi:10.1038/s41587-020-0439-x.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso, P. et al. 2017. ‘Nextflow enables reproducible computational workflows’, Nature Biotechnology, 35(4), pp. 316–319. doi:10.1038/nbt.3820.

## Pipeline tools

- [Bedtools](https://bedtools.readthedocs.io/en/latest/)

  > Quinlan, A.R. and Hall, I.M. 2010. ‘BEDTools: A flexible suite of utilities for comparing genomic features’, Bioinformatics, 26(6), pp. 841–842. doi:10.1093/bioinformatics/btq033.

- [BUSCO](https://busco.ezlab.org)

  > Manni, M. et al. 2021. Busco update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes, Molecular biology and evolution. Available at: https://pubmed.ncbi.nlm.nih.gov/34320186/ (Accessed: 22 June 2023).

- [bwa-mem2](https://ieeexplore.ieee.org/document/8820962)

  > Vasimuddin, Md. et al. 2019. ‘Efficient architecture-aware acceleration of BWA-mem for multicore systems’, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) [Preprint]. doi:10.1109/ipdps.2019.00041.

- [Cooler](https://github.com/open2c/cooler)

  > Abdennur, N. and Mirny, L.A. 2019. ‘Cooler: Scalable storage for hi-C data and other genomically labeled arrays’, Bioinformatics, 36(1), pp. 311–316. doi:10.1093/bioinformatics/btz540.

- [Find Telomere](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere)

  > VGP. 2022. vgp-assembly telomere [online]. https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere. (Accessed on 28th February 2023).

- [Juicer](https://github.com/aidenlab/juicer)

  > Durand, N.C. et al. 2016. ‘Juicer provides a one-click system for analyzing loop-resolution hi-C experiments’, Cell Systems, 3(1), pp. 95–98. doi:10.1016/j.cels.2016.07.002.

- [Minimap2](https://pubmed.ncbi.nlm.nih.gov/34623391/)

  > Li, H. 2021. ‘New strategies to improve MINIMAP2 alignment accuracy’, Bioinformatics, 37(23), pp. 4572–4574. doi:10.1093/bioinformatics/btab705.

- [Miniprot](https://arxiv.org/abs/2210.08052)

  > Li, H. 2022. Protein-to-genome alignment with miniprot (v2). Arxiv [e-journal]. (Accessed on 25th January 2023). doi: 10.48550/arXiv.2210.08052.

- [Mummer](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005944)

  > Marçais, G. et al. 2018. ‘Mummer4: A fast and versatile genome alignment system’, PLOS Computational Biology, 14(1). doi:10.1371/journal.pcbi.1005944.

- [Pandas](https://pandas.pydata.org/)

  > Redback, J. 2022. pandas-dev/pandas: Pandas 1.4.3 [online]. Zenodo. doi: 10.5281/zenodo.6702671. (Accessed on 28th February 2023).

- [Perl](https://perldoc.perl.org/perl)

  > Perl Organisation. 2023. Perl Language Reference v5.36.0. https://perldoc.perl.org/perl. (Accessed 28th February 2023).

- [PretextMap](https://github.com/wtsi-hpag/PretextMap)

  > Harry, E. 2022. PretextView [online]. https://github.com/wtsi-hpag/PretextView. (Accessed on 7th June 2023).

- [Pybedtools](https://github.com/daler/pybedtools)

  > Daler. 2022. pybedtools [online]. https://github.com/daler/pybedtools. (Accessed on 7th June 2023).

- [Python: 3.10](https://docs.python.org/3.10/whatsnew/3.10.html)

  > Python Software Foundation. 2023. Python Language Reference v3.10. https://docs.python.org/3.10/whatsnew/3.10.html. (Accessed 28th February 2023).

- [PyFasta](https://github.com/brentp/pyfasta/)

  > Brentp. 2018. pyfasta [online]. http://github.com/brentp/pyfasta/. (Accessed on 7th June 2023).

- [Samtools](https://pubmed.ncbi.nlm.nih.gov/33590861/)

  > Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820.

- [SeqTK](https://github.com/lh3/seqtk)

  > Li, Heng. 2023. seqtk [online]. https://github.com/lh3/seqtk. (Accessed on 7th June 2023).

- [staden_io_lib / iolib](https://github.com/jkbonfield/io_lib)

  > Bonfield JK. 2023. io_lib [online]. https://github.com/jkbonfield/io_lib. (Accessed on 7th June 2023).

- [Tabix](http://www.htslib.org/doc/tabix.html)

  > Li, Heng. 2023. tabix [online]. http://www.htslib.org/doc/tabix.html. (Accessed on 7th June 2023).

- [UCSC tools](https://github.com/ucscGenomeBrowser/kent/tree/master)

  > UCSC Genome Browser Group. 2023. kent [online]. https://github.com/ucscGenomeBrowser/kent/tree/master. (Accessed on 7th June 2023).

- [WindowMasker](https://pubmed.ncbi.nlm.nih.gov/16287941/)

  > Morgulis, A., et al. 2006. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 22(2). pp.134–141. doi: 10.1093/bioinformatics/bti774.

- [lep_busco_painter](https://www.biorxiv.org/content/10.1101/2023.05.12.540473v1.full.pdf)

  > Wright, C. et al. 2023. Chromosome evolution in Lepidoptera. bioRxiv. 540473. https://doi.org/10.1101/2023.05.12.540473

- [Java](https://docs.oracle.com/javase/8/docs/api/overview-summary.html)

  > Oracle. 2023. Java Documentation. https://docs.oracle.com/javase/8/docs/index.html. (Accessed on 25th September 2023).

- [coreutils](https://github.com/coreutils/coreutils)

  > GNU Coreutils. 2023. coreutils [online]. https://github.com/coreutils/coreutils/releases/tag/v9.4. (Accessed on 25th September 2023).

## Software packaging/containerisation tools

- [Conda](https://conda.org/)

  > conda contributors. conda: A system-level, binary package and environment manager running on all major operating systems and platforms. Computer software. https://github.com/conda/conda

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > Grüning, Björn, et al. 2018. “Bioconda: sustainable and comprehensive software distribution for the life sciences". Nature Methods, 15, pp. 475-6, https://doi.org/10.1038/s41592-018-0046-7.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, Dirk, et al. 2014. “Docker: Lightweight Linux Containers for Consistent Development and Deployment.". Association for Computing Machinery. 2014(239)

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)
  > Kurtzer, Gregory M., et al. 2017. “Singularity: Scientific containers for mobility of compute.", PLOS ONE, 12(5), pp. e0177459, https://doi.org/10.1371/journal.pone.0177459.

GitHub Events

Total
  • Create event: 57
  • Release event: 8
  • Issues event: 49
  • Watch event: 3
  • Delete event: 46
  • Member event: 1
  • Issue comment event: 162
  • Push event: 288
  • Pull request review comment event: 79
  • Pull request review event: 111
  • Pull request event: 131
  • Fork event: 2
Last Year
  • Create event: 57
  • Release event: 8
  • Issues event: 49
  • Watch event: 3
  • Delete event: 46
  • Member event: 1
  • Issue comment event: 162
  • Push event: 288
  • Pull request review comment event: 79
  • Pull request review event: 111
  • Pull request event: 131
  • Fork event: 2

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 23
  • Total pull requests: 54
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 days
  • Total issue authors: 7
  • Total pull request authors: 5
  • Average comments per issue: 0.78
  • Average comments per pull request: 1.19
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 1
Past Year
  • Issues: 23
  • Pull requests: 54
  • Average time to close issues: about 1 month
  • Average time to close pull requests: 3 days
  • Issue authors: 7
  • Pull request authors: 5
  • Average comments per issue: 0.78
  • Average comments per pull request: 1.19
  • Merged pull requests: 35
  • Bot issues: 0
  • Bot pull requests: 1
Top Authors
Issue Authors
  • DLBPointon (38)
  • yumisims (20)
  • weaglesBio (10)
  • muffato (2)
  • gitforp (2)
  • Rathnayaka1988 (1)
  • gq1 (1)
  • mahesh-panchal (1)
  • Surbhigrewal (1)
  • JWrighty97 (1)
Pull Request Authors
  • DLBPointon (84)
  • yumisims (35)
  • weaglesBio (25)
  • gq1 (2)
  • mahesh-panchal (1)
  • Surbhigrewal (1)
  • dependabot[bot] (1)
  • tkchafin (1)
Top Labels
Issue Labels
enhancement (38) bug (28) Release 1.1 (3) Release 2 (2) adjust resources (2) Module (1) documentation (1)
Pull Request Labels
enhancement (31) bug (20) documentation (7) adjust resources (5) READY TO REVIEW (3) Release 1.1 (3) dependencies (1) github_actions (1) Release 2 (1)

Dependencies

.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
.github/workflows/ci.yml actions
  • actions/checkout v2 composite
.github/workflows/fix-linting.yml actions
  • actions/checkout v3 composite
  • actions/setup-node v2 composite
.github/workflows/linting.yml actions
  • actions/checkout v2 composite
  • actions/setup-node v2 composite
  • actions/setup-python v3 composite
  • actions/upload-artifact v2 composite
.github/workflows/linting_comment.yml actions
  • dawidd6/action-download-artifact v2 composite
  • marocchino/sticky-pull-request-comment v2 composite
.github/workflows/clean-up.yml actions
  • actions/stale v7 composite
.github/workflows/sanger_test_full.yml actions
  • actions/upload-artifact v3 composite
  • seqeralabs/action-tower-launch v2 composite
modules/nf-core/bedtools/bamtobed/meta.yml cpan
modules/nf-core/bedtools/genomecov/meta.yml cpan
modules/nf-core/bedtools/intersect/meta.yml cpan
modules/nf-core/bedtools/makewindows/meta.yml cpan
modules/nf-core/bedtools/map/meta.yml cpan
modules/nf-core/bedtools/merge/meta.yml cpan
modules/nf-core/bedtools/sort/meta.yml cpan
modules/nf-core/busco/meta.yml cpan
modules/nf-core/bwamem2/index/meta.yml cpan
modules/nf-core/cat/cat/meta.yml cpan
modules/nf-core/cooler/cload/meta.yml cpan
modules/nf-core/cooler/zoomify/meta.yml cpan
modules/nf-core/custom/dumpsoftwareversions/meta.yml cpan
modules/nf-core/custom/getchromsizes/meta.yml cpan
modules/nf-core/fastk/fastk/meta.yml cpan
modules/nf-core/gnu/sort/meta.yml cpan
modules/nf-core/merquryfk/merquryfk/meta.yml cpan
modules/nf-core/minimap2/align/meta.yml cpan
modules/nf-core/minimap2/index/meta.yml cpan
modules/nf-core/miniprot/align/meta.yml cpan
modules/nf-core/miniprot/index/meta.yml cpan
modules/nf-core/mummer/meta.yml cpan
modules/nf-core/paftools/sam2paf/meta.yml cpan
modules/nf-core/pretextmap/meta.yml cpan
modules/nf-core/pretextsnapshot/meta.yml cpan
modules/nf-core/samtools/faidx/meta.yml cpan
modules/nf-core/samtools/markdup/meta.yml cpan
modules/nf-core/samtools/merge/meta.yml cpan
modules/nf-core/samtools/sort/meta.yml cpan
modules/nf-core/samtools/view/meta.yml cpan
modules/nf-core/seqtk/cutn/meta.yml cpan
modules/nf-core/tabix/bgziptabix/meta.yml cpan
modules/nf-core/ucsc/bedgraphtobigwig/meta.yml cpan
modules/nf-core/ucsc/bedtobigbed/meta.yml cpan
modules/nf-core/windowmasker/ustat/meta.yml cpan
pyproject.toml pypi
modules/nf-core/samtools/index/meta.yml cpan
modules/nf-core/windowmasker/mkcounts/meta.yml cpan
modules/nf-core/bedtools/bamtobed/environment.yml pypi
modules/nf-core/bedtools/genomecov/environment.yml pypi
modules/nf-core/bedtools/intersect/environment.yml pypi
modules/nf-core/bedtools/makewindows/environment.yml pypi
modules/nf-core/bedtools/map/environment.yml pypi
modules/nf-core/bedtools/merge/environment.yml pypi
modules/nf-core/bedtools/sort/environment.yml pypi
modules/nf-core/busco/environment.yml pypi
modules/nf-core/bwamem2/index/environment.yml pypi
modules/nf-core/cat/cat/environment.yml pypi
modules/nf-core/cooler/cload/environment.yml pypi
modules/nf-core/cooler/zoomify/environment.yml pypi
modules/nf-core/custom/dumpsoftwareversions/environment.yml pypi
modules/nf-core/custom/getchromsizes/environment.yml pypi
modules/nf-core/fastk/fastk/environment.yml pypi
modules/nf-core/gnu/sort/environment.yml pypi
modules/nf-core/merquryfk/merquryfk/environment.yml pypi
modules/nf-core/minimap2/index/environment.yml pypi
modules/nf-core/miniprot/align/environment.yml pypi
modules/nf-core/paftools/sam2paf/environment.yml pypi
modules/nf-core/samtools/faidx/environment.yml pypi
modules/nf-core/samtools/index/environment.yml pypi
modules/nf-core/samtools/markdup/environment.yml pypi
modules/nf-core/samtools/merge/environment.yml pypi
modules/nf-core/samtools/sort/environment.yml pypi
modules/nf-core/samtools/view/environment.yml pypi
modules/nf-core/seqtk/cutn/environment.yml pypi
modules/nf-core/tabix/bgziptabix/environment.yml pypi
modules/nf-core/ucsc/bedgraphtobigwig/environment.yml pypi
modules/nf-core/windowmasker/mkcounts/environment.yml pypi
modules/nf-core/windowmasker/ustat/environment.yml pypi