minion-qcbench
Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 4 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.0%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ththng
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 11.3 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 2
- Releases: 0
Metadata Files
README.md
Introduction
minion-qcbench is a bioinformatics pipeline that benchmarks different quality control tools on long-read sequencing data. It takes a samplesheet and sequencing data (FASTQ files) as input, pre-processes them with different quality control tools, assembles these pre-processed reads using Flye and compares the resulting assemblies using QUAST, which computes various quality metrics and summarises them in reports.
- Filter reads using
ChopperorPRINSEQ++by a minimum average phred score or leave the reads unfiltered - Assemble the preprocessed sequencing data using
Flye - Calculate quality metrics for the assemblies
QUAST
Usage
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.
First, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
csv
sample,fastq,subsampling
sample1,sample1.fastq.gz,
sample1,sample1_80.fastq.gz,80
sample2,sample2.fastq.gz,
Each row represents a sample with the sample ID, the path to the respective FASTQ file and how it was subsampled.
Assuming the following folder structure: ``` . ├── data # Data folder containing the samplesheet │ ├── samplesheet.csv # Samplesheet referencing the FASTQ files │ └── ... └── minion-qcbench # This project └── ...
```
After navigating to the parent directory of the minion-qcbench project, you can run the pipeline using the minimal example command below, which includes the essential parameters.
bash
nextflow run minion-qcbench \
-profile singularity \
--input data/samplesheet.csv \
--outdir results
--quality_scores 13,15 # Minimum Phred average quality scores
--flye_modes nano-corr,nano-hq # Flye modes used for assembly
Output
The final step of the pipeline is the execution of QUAST, which evaluates the quality of the assembled genome. QUAST generates a comprehensive report that provides insights into the accuracy and completeness of the assembly. This report includes various metrics such as contig counts, N50, GC content, and alignment statistics against the reference genome (if provided). For more information about QUAST reports, see https://quast.sourceforge.net/docs/manual.html.
Upon completion of the pipeline, the QUAST reports can be found in the directory <OUTDIR>/quast. The directory will contain separate subdirectories for each sample, with an individual QUAST report generated for each sample.
Example ``` . ├── data # Data folder containing the samplesheet ├── minion-qcbench # This project └── results # --outdir is set to "results" ├── ... └── quast ├── sample1 └── sample2
```
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
In addition, references of tools and data used in this pipeline are as follows:
Owner
- Login: ththng
- Kind: user
- Repositories: 1
- Profile: https://github.com/ththng
Citation (CITATIONS.md)
# minion-qcbench: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## [nf-test](https://www.biorxiv.org/content/10.1101/2024.05.25.595877v1) > Forer, L., & Schönherr, S. (2024). Improving the Reliability and Quality of Nextflow Pipelines with nf-test. bioRxiv. https://doi.org/10.1101/2024.05.25.595877 ## Pipeline tools ## [Chopper](https://academic.oup.com/bioinformatics/article/39/5/btad311/7160911) > Wouter De Coster, Rosa Rademakers, NanoPack2: population-scale evaluation of long-read sequencing data, Bioinformatics, Volume 39, Issue 5, May 2023, btad311, https://doi.org/10.1093/bioinformatics/btad311 ## [PRINSEQ++](https://peerj.com/preprints/27553v1/) > Cantu VA, Sadural J, Edwards R. 2019. PRINSEQ++, a multi-threaded tool for fast and efficient quality control and preprocessing of sequencing datasets. PeerJ Preprints 7:e27553v1 ## [Flye](https://www.nature.com/articles/s41587-019-0072-8) > Mikhail Kolmogorov, Jeffrey Yuan, Yu Lin and Pavel Pevzner, "Assembly of Long Error-Prone Reads Using Repeat Graphs", Nature Biotechnology, 2019 doi:10.1038/s41587-019-0072-8 ## [QUAST](https://academic.oup.com/bioinformatics/article/34/13/i142/5045727) > Alla Mikheenko, Andrey Prjibelski, Vladislav Saveliev, Dmitry Antipov, Alexey Gurevich, Versatile genome assembly evaluation with QUAST-LG, Bioinformatics (2018) 34 (13): i142-i150. doi: 10.1093/bioinformatics/bty266\ First published online: June 27, 2018 ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Delete event: 1
- Push event: 16
- Public event: 1
- Pull request event: 3
- Create event: 3
Last Year
- Delete event: 1
- Push event: 16
- Public event: 1
- Pull request event: 3
- Create event: 3