Science Score: 57.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 7 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.4%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: ncezid-biome
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 496 KB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 2
Metadata Files
README.md
Introduction
ncezid-biome/stylo is a bioinformatics pipeline that can be used to filter, downsample, assemble, and QC ONT longreads. It takes a samplesheet and FASTQ files as input, performs read filtering, downsampling to specified coverage, assembly, and Quality Control (QC).

- Filters low quality reads (nanoq)
- Downsamples reads to specific coverage (rasusa)
- Assembles reads (Flye)
- Reorients assembly (Dnaapler)
- Error correction (Medaka)
- QCs assembly (busco)
Usage
[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.
[!NOTE] This pipeline was tested using the following, other versions may work but are currently untested
nf-core v2.14.1
nextflow v24.04.2
singularity v3.8.7
First, download this branch to your preferred directory
bash
cd /path/to/dir/
git clone -b nf-core-dev git@github.com:ncezid-biome/stylo.git
Second, prepare a samplesheet with your input data that looks as follows:
samplesheet.csv:
csv
sample,fastq,genus,species
sample1,/path/to/sample1.fastq.gz,Salmonella,enterica
sample2,/path/to/sample2.fastq.gz,Campylobacter,coli
sample3,/path/to/sample3.fastq.gz,Campylobacter,jejuni
sample4,/path/to/sample4.fastq.gz,Vibrio,-
sample5,/path/to/sample5.fastq.gz,Salmonella,enterica
Each row represents a fastq file (single-end) with the known genus and species.
[!NOTE] you can use
-where the species is unknown
Third, look at the lookup table to make sure that each genus listed in your samplesheet is present. If you'd like to add a row or edit the lookup table see Advanced Usage
Now, you can run the pipeline using:
bash
nextflow run /path/to/stylo/main.nf \
-profile singularity
--input samplesheet.csv \
--outdir <OUTDIR>
If you're on CDC servers use -profile rosalind
[!WARNING] Please provide pipeline parameters via the CLI or Nextflow
-params-fileoption. Custom config files including those provided by the-cNextflow option can be used to provide any configuration except for parameters; see docs.
For more details about generic usage see the Usage Page
Advanced Usage
Editing the Lookup Table
If a genus is missing, then you'll need to add a row to the lookup table prior to running the pipeline. In order to add a row to the lookup table you'll need the following information:
- genus (required)
- species (optional, use
-if you want the lookup table to accept all species within that genus) - genomes size (required, must follow the same format as the other rows in MBs)
model parameter
If the model parameter is left blank, the pipeline will choose the bacterial methylation model r1041_e82_400bps_bacterial_methylation.
It's best to let the pipleine use the bacterial methylation model, but if you must specify the model parameter make sure to use one of the models from the following list
r103_sup_g507
r1041_e82_260bps_fast_g632
r1041_e82_260bps_hac_g632
r1041_e82_260bps_hac_v4.0.0
r1041_e82_260bps_hac_v4.1.0
r1041_e82_260bps_joint_apk_ulk_v5.0.0
r1041_e82_260bps_sup_g632
r1041_e82_260bps_sup_v4.0.0
r1041_e82_260bps_sup_v4.1.0
r1041_e82_400bps_bacterial_methylation
r1041_e82_400bps_fast_g615
r1041_e82_400bps_fast_g632
r1041_e82_400bps_hac_g615
r1041_e82_400bps_hac_g632
r1041_e82_400bps_hac_v4.0.0
r1041_e82_400bps_hac_v4.1.0
r1041_e82_400bps_hac_v4.2.0
r1041_e82_400bps_hac_v4.3.0
r1041_e82_400bps_hac_v5.0.0
r1041_e82_400bps_hac_v5.0.0_rl_lstm384_dwells
r1041_e82_400bps_hac_v5.0.0_rl_lstm384_no_dwells
r1041_e82_400bps_hac_v5.2.0
r1041_e82_400bps_hac_v5.2.0_rl_lstm384_dwells
r1041_e82_400bps_hac_v5.2.0_rl_lstm384_no_dwells
r1041_e82_400bps_sup_g615
r1041_e82_400bps_sup_v4.0.0
r1041_e82_400bps_sup_v4.1.0
r1041_e82_400bps_sup_v4.2.0
r1041_e82_400bps_sup_v4.3.0
r1041_e82_400bps_sup_v5.0.0
r1041_e82_400bps_sup_v5.0.0_rl_lstm384_dwells
r1041_e82_400bps_sup_v5.0.0_rl_lstm384_no_dwells
r1041_e82_400bps_sup_v5.2.0
r1041_e82_400bps_sup_v5.2.0_rl_lstm384_dwells
r1041_e82_400bps_sup_v5.2.0_rl_lstm384_no_dwells
r104_e81_fast_g5015
r104_e81_hac_g5015
r104_e81_sup_g5015
r104_e81_sup_g610
r941_e81_fast_g514
r941_e81_hac_g514
r941_e81_sup_g514
r941_min_fast_g507
r941_min_hac_g507
r941_min_sup_g507
r941_prom_fast_g507
r941_prom_hac_g507
r941_prom_sup_g507
for more details about model selection in medaka, see medaka model documentation
Credits
ncezid-biome/stylo was originally written by Arzoo Patel, Mohit Thakur.
We thank the following people for their extensive assistance in the development of this pipeline:
Justin Kim, Jessica Chen, Peyton Smith, Lee S. Katz, Joe Wirth, Curtis Kapsak
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines.
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.
Owner
- Name: ncezid-biome
- Login: ncezid-biome
- Kind: organization
- Repositories: 3
- Profile: https://github.com/ncezid-biome
JOSS Publication
stylo: a lightweight nanopore assembly pipeline optimized for enteric bacteria
Authors
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, ASRT Inc., Contractor for National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, ASRT Inc., Contractor for National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, Theiagen Genomics, Highlands Ranch, Colorado, United States of America
Tags
nextflow bioinformatics genomics bacteria assembly nanopore long-reads quality control filtering downsampling ont nf-core styleCitation (CITATIONS.md)
# ncezid-biome/stylo: Citations ## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/) > Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031. ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline tools - [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. - [nanoq](https://joss.theoj.org/papers/10.21105/joss.02991) > Steinig et al., (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991, https://doi.org/10.21105/joss.02991 - [Rasusa](https://joss.theoj.org/papers/10.21105/joss.03941) > Hall, M. B., (2022). Rasusa: Randomly subsample sequencing reads to a specified coverage. Journal of Open Source Software, 7(69), 3941, https://doi.org/10.21105/joss.03941 - [Flye](https://www.nature.com/articles/s41587-019-0072-8) > Kolmogorov, M., Yuan, J., Lin, Y. et al. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019). https://doi.org/10.1038/s41587-019-0072-8 - [Dnaapler](https://joss.theoj.org/papers/10.21105/joss.05968) > George Bouras, Susanna R. Grigson, Bhavya Papudeshi, Vijini Mallawaarachchi, Michael J. Roach (2024). Dnaapler: A tool to reorient circular microbial genomes. Journal of Open Source Software, 9(93), 5968, https://doi.org/10.21105/joss.05968 - [Medaka](https://github.com/nanoporetech/medaka) > Medaka. Github. (2024). https://github.com/nanoporetech/medaka - [BUSCO](https://www.nature.com/articles/nbt.3820) > BUSCO. Github. (2021). https://github.com/metashot/busco ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506. - [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/) > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671. - [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241) > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241. - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
GitHub Events
Total
- Release event: 1
- Push event: 6
- Public event: 1
- Pull request event: 5
- Create event: 1
Last Year
- Release event: 1
- Push event: 6
- Public event: 1
- Pull request event: 5
- Create event: 1
Dependencies
- actions/checkout v4 composite
- actions/upload-artifact v4 composite
- openjournals/openjournals-draft-action master composite
- flye 2.9.5.*
- medaka 2.0.1.*
- busco 5.8.2.*
- nanoq 0.10.0.*
- rasusa 0.3.0.*