Science Score: 57.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 7 DOI reference(s) in README
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (14.4%) to scientific vocabulary
Last synced: 9 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: ncezid-biome
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 496 KB
Statistics
  • Stars: 0
  • Watchers: 3
  • Forks: 0
  • Open Issues: 0
  • Releases: 2
Created about 1 year ago · Last pushed 10 months ago
Metadata Files
Readme Changelog Contributing License Citation

README.md

Introduction

ncezid-biome/stylo is a bioinformatics pipeline that can be used to filter, downsample, assemble, and QC ONT longreads. It takes a samplesheet and FASTQ files as input, performs read filtering, downsampling to specified coverage, assembly, and Quality Control (QC).

Diagram of stylo steps

  1. Filters low quality reads (nanoq)
  2. Downsamples reads to specific coverage (rasusa)
  3. Assembles reads (Flye)
  4. Reorients assembly (Dnaapler)
  5. Error correction (Medaka)
  6. QCs assembly (busco)

Usage

[!NOTE] If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow.

[!NOTE] This pipeline was tested using the following, other versions may work but are currently untested

nf-core v2.14.1

nextflow v24.04.2

singularity v3.8.7

First, download this branch to your preferred directory bash cd /path/to/dir/ git clone -b nf-core-dev git@github.com:ncezid-biome/stylo.git

Second, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv:

csv sample,fastq,genus,species sample1,/path/to/sample1.fastq.gz,Salmonella,enterica sample2,/path/to/sample2.fastq.gz,Campylobacter,coli sample3,/path/to/sample3.fastq.gz,Campylobacter,jejuni sample4,/path/to/sample4.fastq.gz,Vibrio,- sample5,/path/to/sample5.fastq.gz,Salmonella,enterica

Each row represents a fastq file (single-end) with the known genus and species.

[!NOTE] you can use - where the species is unknown

Third, look at the lookup table to make sure that each genus listed in your samplesheet is present. If you'd like to add a row or edit the lookup table see Advanced Usage

Now, you can run the pipeline using:

bash nextflow run /path/to/stylo/main.nf \ -profile singularity --input samplesheet.csv \ --outdir <OUTDIR>

If you're on CDC servers use -profile rosalind

[!WARNING] Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters; see docs.

For more details about generic usage see the Usage Page

Advanced Usage

Editing the Lookup Table

If a genus is missing, then you'll need to add a row to the lookup table prior to running the pipeline. In order to add a row to the lookup table you'll need the following information:

  1. genus (required)
  2. species (optional, use - if you want the lookup table to accept all species within that genus)
  3. genomes size (required, must follow the same format as the other rows in MBs)

model parameter

If the model parameter is left blank, the pipeline will choose the bacterial methylation model r1041_e82_400bps_bacterial_methylation. It's best to let the pipleine use the bacterial methylation model, but if you must specify the model parameter make sure to use one of the models from the following list

r103_sup_g507 r1041_e82_260bps_fast_g632 r1041_e82_260bps_hac_g632 r1041_e82_260bps_hac_v4.0.0 r1041_e82_260bps_hac_v4.1.0 r1041_e82_260bps_joint_apk_ulk_v5.0.0 r1041_e82_260bps_sup_g632 r1041_e82_260bps_sup_v4.0.0 r1041_e82_260bps_sup_v4.1.0 r1041_e82_400bps_bacterial_methylation r1041_e82_400bps_fast_g615 r1041_e82_400bps_fast_g632 r1041_e82_400bps_hac_g615 r1041_e82_400bps_hac_g632 r1041_e82_400bps_hac_v4.0.0 r1041_e82_400bps_hac_v4.1.0 r1041_e82_400bps_hac_v4.2.0 r1041_e82_400bps_hac_v4.3.0 r1041_e82_400bps_hac_v5.0.0 r1041_e82_400bps_hac_v5.0.0_rl_lstm384_dwells r1041_e82_400bps_hac_v5.0.0_rl_lstm384_no_dwells r1041_e82_400bps_hac_v5.2.0 r1041_e82_400bps_hac_v5.2.0_rl_lstm384_dwells r1041_e82_400bps_hac_v5.2.0_rl_lstm384_no_dwells r1041_e82_400bps_sup_g615 r1041_e82_400bps_sup_v4.0.0 r1041_e82_400bps_sup_v4.1.0 r1041_e82_400bps_sup_v4.2.0 r1041_e82_400bps_sup_v4.3.0 r1041_e82_400bps_sup_v5.0.0 r1041_e82_400bps_sup_v5.0.0_rl_lstm384_dwells r1041_e82_400bps_sup_v5.0.0_rl_lstm384_no_dwells r1041_e82_400bps_sup_v5.2.0 r1041_e82_400bps_sup_v5.2.0_rl_lstm384_dwells r1041_e82_400bps_sup_v5.2.0_rl_lstm384_no_dwells r104_e81_fast_g5015 r104_e81_hac_g5015 r104_e81_sup_g5015 r104_e81_sup_g610 r941_e81_fast_g514 r941_e81_hac_g514 r941_e81_sup_g514 r941_min_fast_g507 r941_min_hac_g507 r941_min_sup_g507 r941_prom_fast_g507 r941_prom_hac_g507 r941_prom_sup_g507

for more details about model selection in medaka, see medaka model documentation

Credits

ncezid-biome/stylo was originally written by Arzoo Patel, Mohit Thakur.

We thank the following people for their extensive assistance in the development of this pipeline:

Justin Kim, Jessica Chen, Peyton Smith, Lee S. Katz, Joe Wirth, Curtis Kapsak

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines.

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

This pipeline uses code and infrastructure developed and maintained by the nf-core community, reused here under the MIT license.

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x.

Owner

  • Name: ncezid-biome
  • Login: ncezid-biome
  • Kind: organization

JOSS Publication

stylo: a lightweight nanopore assembly pipeline optimized for enteric bacteria
Published
June 05, 2026
Volume 11, Issue 122, Page 9695
Authors
Arzoo Patel ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, ASRT Inc., Contractor for National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Mohit Thakur ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Justin Kim ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, ASRT Inc., Contractor for National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Peyton Smith ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Lee S. Katz ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Curtis Kapsak ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America, Theiagen Genomics, Highlands Ranch, Colorado, United States of America
Jessica Chen ORCID
Enteric Diseases Laboratory Branch, Division of Foodborne, Waterborne, and Environmental Diseases, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention, Atlanta, Georgia, United States of America
Editor
Claudia Solis-Lemus ORCID
Tags
nextflow bioinformatics genomics bacteria assembly nanopore long-reads quality control filtering downsampling ont nf-core style

Citation (CITATIONS.md)

# ncezid-biome/stylo: Citations

## [nf-core](https://pubmed.ncbi.nlm.nih.gov/32055031/)

> Ewels PA, Peltzer A, Fillinger S, Patel H, Alneberg J, Wilm A, Garcia MU, Di Tommaso P, Nahnsen S. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol. 2020 Mar;38(3):276-278. doi: 10.1038/s41587-020-0439-x. PubMed PMID: 32055031.

## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/)

> Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.

## Pipeline tools

- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/)

  > Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online].

- [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/)

  > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924.

- [nanoq](https://joss.theoj.org/papers/10.21105/joss.02991)

  > Steinig et al., (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69), 2991, https://doi.org/10.21105/joss.02991

- [Rasusa](https://joss.theoj.org/papers/10.21105/joss.03941)

  > Hall, M. B., (2022). Rasusa: Randomly subsample sequencing reads to a specified coverage. Journal of Open Source Software, 7(69), 3941, https://doi.org/10.21105/joss.03941

- [Flye](https://www.nature.com/articles/s41587-019-0072-8)

  > Kolmogorov, M., Yuan, J., Lin, Y. et al. Assembly of long, error-prone reads using repeat graphs. Nat Biotechnol 37, 540–546 (2019). https://doi.org/10.1038/s41587-019-0072-8

- [Dnaapler](https://joss.theoj.org/papers/10.21105/joss.05968)

  > George Bouras, Susanna R. Grigson, Bhavya Papudeshi, Vijini Mallawaarachchi, Michael J. Roach (2024). Dnaapler: A tool to reorient circular microbial genomes. Journal of Open Source Software, 9(93), 5968, https://doi.org/10.21105/joss.05968

- [Medaka](https://github.com/nanoporetech/medaka)

  > Medaka. Github. (2024). https://github.com/nanoporetech/medaka

- [BUSCO](https://www.nature.com/articles/nbt.3820)

  > BUSCO. Github. (2021). https://github.com/metashot/busco

## Software packaging/containerisation tools

- [Anaconda](https://anaconda.com)

  > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web.

- [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/)

  > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.

- [BioContainers](https://pubmed.ncbi.nlm.nih.gov/28379341/)

  > da Veiga Leprevost F, Grüning B, Aflitos SA, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Alvarez RV, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341; PubMed Central PMCID: PMC5870671.

- [Docker](https://dl.acm.org/doi/10.5555/2600239.2600241)

  > Merkel, D. (2014). Docker: lightweight linux containers for consistent development and deployment. Linux Journal, 2014(239), 2. doi: 10.5555/2600239.2600241.

- [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/)

  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.

GitHub Events

Total
  • Release event: 1
  • Push event: 6
  • Public event: 1
  • Pull request event: 5
  • Create event: 1
Last Year
  • Release event: 1
  • Push event: 6
  • Public event: 1
  • Pull request event: 5
  • Create event: 1

Dependencies

.github/workflows/draft-pdf.yml actions
  • actions/checkout v4 composite
  • actions/upload-artifact v4 composite
  • openjournals/openjournals-draft-action master composite
modules/local/flye/meta.yml cpan
modules/local/medaka/meta.yml cpan
modules/nf-core/busco/busco/meta.yml cpan
modules/nf-core/nanoq/meta.yml cpan
modules/nf-core/rasusa/meta.yml cpan
subworkflows/nf-core/utils_nextflow_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfcore_pipeline/meta.yml cpan
subworkflows/nf-core/utils_nfvalidation_plugin/meta.yml cpan
modules/local/flye/environment.yml conda
  • flye 2.9.5.*
modules/local/medaka/environment.yml conda
  • medaka 2.0.1.*
modules/nf-core/busco/busco/environment.yml conda
  • busco 5.8.2.*
modules/nf-core/nanoq/environment.yml conda
  • nanoq 0.10.0.*
modules/nf-core/rasusa/environment.yml conda
  • rasusa 0.3.0.*