Science Score: 65.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 10 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
✓Institutional organization owner
Organization sanger-tol has institutional domain (www.sanger.ac.uk) -
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Repository
Pipelines for the production of Treeval data
Basic Info
- Host: GitHub
- Owner: sanger-tol
- License: other
- Language: Nextflow
- Default Branch: main
- Homepage: https://pipelines.tol.sanger.ac.uk/treeval
- Size: 56.6 MB
Statistics
- Stars: 26
- Watchers: 7
- Forks: 5
- Open Issues: 38
- Releases: 8
Topics
Metadata Files
README.md
sanger-tol/treeval
Introduction
sanger-tol/treeval [1.2.0 - Ancient Destiny-] is a bioinformatics best-practice analysis pipeline for the generation of data supplemental to the curation of reference quality genomes. This pipeline has been written to generate flat files compatible with JBrowse2 as well as HiC maps for use in Juicebox, PretextView and HiGlass.
The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
You can also set up and attempt to run the pipeline here: https://gitpod.io/#https://github.com/BGAcademy23/treeval-curation This is a gitpod set up for BGA23 with a version of TreeVal, although for now gitpod will not run a nextflow pipeline die to issues with using singularity. We will be replacing this with an AWS instance soon.
The treeval pipeline has a sister pipeline currently named curationpretext which acts to regenerate the pretext maps and accessory files during genomic curation in order to confirm interventions. This pipeline is sufficiently different to the treeval implementation that it is written as it's own pipeline.
- Parse input yaml ( YAML_INPUT )
- Generate my.genome file ( GENERATE_GENOME )
- Generate insilico digests of the input assembly ( INSILICO_DIGEST )
- Generate gene alignments with high quality data against the input assembly ( GENE_ALIGNMENT )
- Generate a repeat density graph ( REPEAT_DENSITY )
- Generate a gap track ( GAP_FINDER )
- Generate a map of self complementary sequence ( SELFCOMP )
- Generate syntenic alignments with a closely related high quality assembly ( SYNTENY )
- Generate a coverage track using PacBio data ( LONGREAD_COVERAGE )
- Generate HiC maps, pretext and higlass using HiC cram files ( HIC_MAPPING )
- Generate a telomere track based on input motif ( TELO_FINDER )
- Run Busco and convert results into bed format ( BUSCO_ANNOTATION )
- Ancestral Busco linkage if available for clade ( BUSCOANNOTATION:ANCESTRALGENE )
- Count KMERs with FastK and plot the spectra using MerquryFK ( KMER )
- Generate a coverge track using KMER data ( KMERREADCOVERAGE )
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile testbefore running the workflow on actual data.
Currently, it is advised to run the pipeline with docker or singularity as a small number of major modules do not currently have a conda env associated with them.
Now, you can run the pipeline using:
```bash
For the FULL pipeline
nextflow run main.nf -profile singularity --input treeval.yaml --outdir {OUTDIR}
For the RAPID subset
nextflow run main.nf -profile singularity --input treeval.yaml -entry RAPID --outdir {OUTDIR} ```
An example treeval.yaml can be found here.
Further documentation about the pipeline can be found in the following files: usage, parameters and output.
Warning: Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
Credits
sanger-tol/treeval has been written by Damon-Lee Pointon (@DLBPointon), Yumi Sims (@yumisims) and William Eagles (@weaglesBio).
We thank the following people for their extensive assistance in the development of this pipeline:
- @gq1 - For building the infrastructure around TreeVal and helping with code review
- @ksenia-krasheninnikova - For help with C code implementation and YAML parsing
- @mcshane - For guidance on algorithms
- @muffato - For code reviews and code support
- @priyanka-surana - For help with the majority of code reviews and code support
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
If you use sanger-tol/treeval for your analysis, please cite it using the following doi: 10.5281/zenodo.10047653.
Tools
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
You can cite the nf-core publication as follows:
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: Tree of Life programme
- Login: sanger-tol
- Kind: organization
- Location: United Kingdom
- Website: https://www.sanger.ac.uk/programme/tree-of-life/
- Twitter: sangertol
- Repositories: 15
- Profile: https://github.com/sanger-tol
The Tree of Life Programme investigates the diversity of complex organisms (eukaryotes) through sequencing and cellular technology
Citation (CITATIONS.md)
# sanger-tol/treeval: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels, P. et al. 2020. ‘The NF-core framework for community-curated bioinformatics pipelines’, Nature Biotechnology, 38(3), pp. 276–278. doi:10.1038/s41587-020-0439-x. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso, P. et al. 2017. ‘Nextflow enables reproducible computational workflows’, Nature Biotechnology, 35(4), pp. 316–319. doi:10.1038/nbt.3820. ## Pipeline tools - [Bedtools](https://bedtools.readthedocs.io/en/latest/) > Quinlan, A.R. and Hall, I.M. 2010. ‘BEDTools: A flexible suite of utilities for comparing genomic features’, Bioinformatics, 26(6), pp. 841–842. doi:10.1093/bioinformatics/btq033. - [BUSCO](https://busco.ezlab.org) > Manni, M. et al. 2021. Busco update: Novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes, Molecular biology and evolution. Available at: https://pubmed.ncbi.nlm.nih.gov/34320186/ (Accessed: 22 June 2023). - [bwa-mem2](https://ieeexplore.ieee.org/document/8820962) > Vasimuddin, Md. et al. 2019. ‘Efficient architecture-aware acceleration of BWA-mem for multicore systems’, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS) [Preprint]. doi:10.1109/ipdps.2019.00041. - [Cooler](https://github.com/open2c/cooler) > Abdennur, N. and Mirny, L.A. 2019. ‘Cooler: Scalable storage for hi-C data and other genomically labeled arrays’, Bioinformatics, 36(1), pp. 311–316. doi:10.1093/bioinformatics/btz540. - [Find Telomere](https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere) > VGP. 2022. vgp-assembly telomere [online]. https://github.com/VGP/vgp-assembly/tree/master/pipeline/telomere. (Accessed on 28th February 2023). - [Juicer](https://github.com/aidenlab/juicer) > Durand, N.C. et al. 2016. ‘Juicer provides a one-click system for analyzing loop-resolution hi-C experiments’, Cell Systems, 3(1), pp. 95–98. doi:10.1016/j.cels.2016.07.002. - [Minimap2](https://pubmed.ncbi.nlm.nih.gov/34623391/) > Li, H. 2021. ‘New strategies to improve MINIMAP2 alignment accuracy’, Bioinformatics, 37(23), pp. 4572–4574. doi:10.1093/bioinformatics/btab705. - [Miniprot](https://arxiv.org/abs/2210.08052) > Li, H. 2022. Protein-to-genome alignment with miniprot (v2). Arxiv [e-journal]. (Accessed on 25th January 2023). doi: 10.48550/arXiv.2210.08052. - [Mummer](https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1005944) > Marçais, G. et al. 2018. ‘Mummer4: A fast and versatile genome alignment system’, PLOS Computational Biology, 14(1). doi:10.1371/journal.pcbi.1005944. - [Pandas](https://pandas.pydata.org/) > Redback, J. 2022. pandas-dev/pandas: Pandas 1.4.3 [online]. Zenodo. doi: 10.5281/zenodo.6702671. (Accessed on 28th February 2023). - [Perl](https://perldoc.perl.org/perl) > Perl Organisation. 2023. Perl Language Reference v5.36.0. https://perldoc.perl.org/perl. (Accessed 28th February 2023). - [PretextMap](https://github.com/wtsi-hpag/PretextMap) > Harry, E. 2022. PretextView [online]. https://github.com/wtsi-hpag/PretextView. (Accessed on 7th June 2023). - [Pybedtools](https://github.com/daler/pybedtools) > Daler. 2022. pybedtools [online]. https://github.com/daler/pybedtools. (Accessed on 7th June 2023). - [Python: 3.10](https://docs.python.org/3.10/whatsnew/3.10.html) > Python Software Foundation. 2023. Python Language Reference v3.10. https://docs.python.org/3.10/whatsnew/3.10.html. (Accessed 28th February 2023). - [PyFasta](https://github.com/brentp/pyfasta/) > Brentp. 2018. pyfasta [online]. http://github.com/brentp/pyfasta/. (Accessed on 7th June 2023). - [Samtools](https://pubmed.ncbi.nlm.nih.gov/33590861/) > Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820. - [SeqTK](https://github.com/lh3/seqtk) > Li, Heng. 2023. seqtk [online]. https://github.com/lh3/seqtk. (Accessed on 7th June 2023). - [staden_io_lib / iolib](https://github.com/jkbonfield/io_lib) > Bonfield JK. 2023. io_lib [online]. https://github.com/jkbonfield/io_lib. (Accessed on 7th June 2023). - [Tabix](http://www.htslib.org/doc/tabix.html) > Li, Heng. 2023. tabix [online]. http://www.htslib.org/doc/tabix.html. (Accessed on 7th June 2023). - [UCSC tools](https://github.com/ucscGenomeBrowser/kent/tree/master) > UCSC Genome Browser Group. 2023. kent [online]. https://github.com/ucscGenomeBrowser/kent/tree/master. (Accessed on 7th June 2023). - [WindowMasker](https://pubmed.ncbi.nlm.nih.gov/16287941/) > Morgulis, A., et al. 2006. WindowMasker: window-based masker for sequenced genomes. Bioinformatics. 22(2). pp.134–141. doi: 10.1093/bioinformatics/bti774. - [lep_busco_painter](https://www.biorxiv.org/content/10.1101/2023.05.12.540473v1.full.pdf) > Wright, C. et al. 2023. Chromosome evolution in Lepidoptera. bioRxiv. 540473. https://doi.org/10.1101/2023.05.12.540473 - [Java](https://docs.oracle.com/javase/8/docs/api/overview-summary.html) > Oracle. 2023. Java Documentation. https://docs.oracle.com/javase/8/docs/index.html. (Accessed on 25th September 2023). - [coreutils](https://github.com/coreutils/coreutils) > GNU Coreutils. 2023. coreutils [online]. https://github.com/coreutils/coreutils/releases/tag/v9.4. (Accessed on 25th September 2023). ## Software packaging/containerisation tools - [Conda](https://conda.org/) > conda contributors. conda: A system-level, binary package and environment manager running on all major operating systems and platforms. Computer software. https://github.com/conda/conda - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Di Tommaso, Paolo, et al. 2017. “Nextflow Enables Reproducible Computational Workflows.” Nature Biotechnology, 35(4), pp. 316–19, https://doi.org/10.1038/nbt.3820. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > Grüning, Björn, et al. 2018. “Bioconda: sustainable and comprehensive software distribution for the life sciences". Nature Methods, 15, pp. 475-6, https://doi.org/10.1038/s41592-018-0046-7. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, Dirk, et al. 2014. “Docker: Lightweight Linux Containers for Consistent Development and Deployment.". Association for Computing Machinery. 2014(239) - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer, Gregory M., et al. 2017. “Singularity: Scientific containers for mobility of compute.", PLOS ONE, 12(5), pp. e0177459, https://doi.org/10.1371/journal.pone.0177459.
GitHub Events
Total
- Create event: 57
- Release event: 8
- Issues event: 49
- Watch event: 3
- Delete event: 46
- Member event: 1
- Issue comment event: 162
- Push event: 288
- Pull request review comment event: 79
- Pull request review event: 111
- Pull request event: 131
- Fork event: 2
Last Year
- Create event: 57
- Release event: 8
- Issues event: 49
- Watch event: 3
- Delete event: 46
- Member event: 1
- Issue comment event: 162
- Push event: 288
- Pull request review comment event: 79
- Pull request review event: 111
- Pull request event: 131
- Fork event: 2
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 23
- Total pull requests: 54
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Total issue authors: 7
- Total pull request authors: 5
- Average comments per issue: 0.78
- Average comments per pull request: 1.19
- Merged pull requests: 35
- Bot issues: 0
- Bot pull requests: 1
Past Year
- Issues: 23
- Pull requests: 54
- Average time to close issues: about 1 month
- Average time to close pull requests: 3 days
- Issue authors: 7
- Pull request authors: 5
- Average comments per issue: 0.78
- Average comments per pull request: 1.19
- Merged pull requests: 35
- Bot issues: 0
- Bot pull requests: 1
Top Authors
Issue Authors
- DLBPointon (38)
- yumisims (20)
- weaglesBio (10)
- muffato (2)
- gitforp (2)
- Rathnayaka1988 (1)
- gq1 (1)
- mahesh-panchal (1)
- Surbhigrewal (1)
- JWrighty97 (1)
Pull Request Authors
- DLBPointon (84)
- yumisims (35)
- weaglesBio (25)
- gq1 (2)
- mahesh-panchal (1)
- Surbhigrewal (1)
- dependabot[bot] (1)
- tkchafin (1)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- mshick/add-pr-comment v1 composite
- actions/checkout v2 composite
- actions/checkout v3 composite
- actions/setup-node v2 composite
- actions/checkout v2 composite
- actions/setup-node v2 composite
- actions/setup-python v3 composite
- actions/upload-artifact v2 composite
- dawidd6/action-download-artifact v2 composite
- marocchino/sticky-pull-request-comment v2 composite
- actions/stale v7 composite
- actions/upload-artifact v3 composite
- seqeralabs/action-tower-launch v2 composite