Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (8.5%) to scientific vocabulary
Last synced: 6 months ago · JSON representation ·

Repository

Basic Info
  • Host: GitHub
  • Owner: xiaoli-dong
  • License: mit
  • Language: Nextflow
  • Default Branch: main
  • Size: 2.84 MB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 1
  • Open Issues: 1
  • Releases: 2
Created almost 2 years ago · Last pushed 9 months ago
Metadata Files
Readme Changelog License Code of conduct Citation

README.md

nf-fluAB Pipeline

🕒 Last updated: April 14, 2025

Introduction

nf-fluAB is a bioinformatics pipeline for the assembly, typing, and lineage assignment of influenza NGS data generated from Illumina or Nanopore platforms. Built with Nextflow,it enables portable and scalable execution across a range of computing environments. The use of Docker and Singularity containers ensures easy installation and highly reproducible results.

Pipeline Summary

The pipeline takes a samplesheet and corresponding FASTQ files as input. It performs quality control (QC), identifies the closest publicly available reference(s), maps reads, calls variants, and generates a consensus contig for each segment. In cases of co-infection, multiple references are selected, and multiple consensus contigs are generated per segment. Segments are typed against a local database, and lineage is determined using Nextclade. At the end of the workflow, a comprehensive master summary report is generated for each sample, along with a consolidated analysis overview.

Pipeline Diagram

  • QC
  • dehost (hostile -> seqkit stats)
  • seek references (mash screen -> filter mash screen output)
  • mapping and post-processing bam files
  • variant calling and post-processing vcf files
    • Illumina data variant calling and post-processing (freebayes or bcftools -> bcftools sort, index -> bcftools norm -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
    • Nanopore data variant calling and post-processing (clair3 -> bcftools sort, index -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
  • consensus calling (bcftools consensus -> SEQKIT fx2tab (stats)
  • consensus typing (blastn against typing database)
  • Lineage determination (nextclade)
  • Summary report

Pipeline required reference sequences and databases

  1. Influenza A and B primer sequences: used for sequence data qulaity control process
  2. flu typing database: used for assembled segment typing
  3. nextcade dataset: used for flu A&B lineage determination
  4. Influenza A and B reference databases: used for seeking the closest related public available sequences and used as the references in the assembly. Go to reference database building guide

Build Influenza A and B reference databases

The following procedure will build the influenza A and B reference database. Then by using hte reference fasta sequences generated, the script also generated the mash sketch database (msh file) and also the snpEff flu database ```

create conda environment for creating database

mamba create -n '$env_name' mash=2.3 snpeff=5.2 vadr=1.6.4 biopython=1.84 entrez-direct=22.4 diamond=2.1.11 cd-hit=4.8.1 -y

running the script

pathtobindirectory/makedb.sh -i pathto/sequences.fasta -o outdir -c numberofcpus -g pathto/BVBRCgenome.csv -d outputdatabase_prefix

```


Quick Start

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Check pipeline command line options

You can download the nf-fluab from github to local computer or you can directly run the pipeline from github remotely. The following is an example how to check the command line options without downloading the pipeline locally:

```

running directly from github without downloading or cloning

nextflow run xiaoli-dong/nf-fluab -r 7f72d6c --help ``` To run the analysis with your data, prepare a csv format samplesheet, which contains the sequenence information for each of your samples, as input. The samplesheet can contain only Illumina data or only nanopore data, it cannot not accept data from both Illumina and nanopore data in one analysis. See below for what the samplesheet looks like:

Illumina data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq

comment lines will be ignoreed

sample1,sample1R1.fastq.gz,sample1R2.fastq.gz,NA sample2,sample2R1.fastq.gz,sample2R2.fastq.gz,NA sample3,sample3R1.fastq.gz,sample3R2.fastq.gz,NA ```

Nanopore data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq

comment lines will be ignoreed

sample1,NA,NA,sample1.fastq.gz sample2,NA,NA,sample2.fastq.gz sample3,NA,NA,sample3.fastq.gz ```

Now, you can run the pipeline using:

bash nextflow run xiaoli-dong/nf-fluAB --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --platform <illumina or nanopore>

Credits

  • nf-fluAB was written by Xiaoli Dong.
  • The Illumina part of the pipeline was primarily based on Dr. Matthew Croxen's flu pipeline.
  • Extensive support was provided by the ProvLab Research Team in Calgary for generating sequencing data and conducting the pipeline testing: >- Linda Lee >- Johanna M Thayer >- Fitsum Getachew >- Petya Kolva >- Anita Wong >- Kanti Pabbaraju
  • Dr. Tarah Lynch, and Dr. Matthew Croxen for extensive technical inputs.

Owner

  • Name: Xiaoli Dong
  • Login: xiaoli-dong
  • Kind: user
  • Location: Calgary
  • Company: Public Health Laboratory - South, Alberta Precision Laboratories

Bioinformatician

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use `nf-core tools` in your work, please cite the `nf-core` publication"
authors:
  - family-names: Ewels
    given-names: Philip
  - family-names: Peltzer
    given-names: Alexander
  - family-names: Fillinger
    given-names: Sven
  - family-names: Patel
    given-names: Harshil
  - family-names: Alneberg
    given-names: Johannes
  - family-names: Wilm
    given-names: Andreas
  - family-names: Garcia
    given-names: Maxime Ulysse
  - family-names: Di Tommaso
    given-names: Paolo
  - family-names: Nahnsen
    given-names: Sven
title: "The nf-core framework for community-curated bioinformatics pipelines."
version: 2.4.1
doi: 10.1038/s41587-020-0439-x
date-released: 2022-05-16
url: https://github.com/nf-core/tools
prefered-citation:
  type: article
  authors:
    - family-names: Ewels
      given-names: Philip
    - family-names: Peltzer
      given-names: Alexander
    - family-names: Fillinger
      given-names: Sven
    - family-names: Patel
      given-names: Harshil
    - family-names: Alneberg
      given-names: Johannes
    - family-names: Wilm
      given-names: Andreas
    - family-names: Garcia
      given-names: Maxime Ulysse
    - family-names: Di Tommaso
      given-names: Paolo
    - family-names: Nahnsen
      given-names: Sven
  doi: 10.1038/s41587-020-0439-x
  journal: nature biotechnology
  start: 276
  end: 278
  title: "The nf-core framework for community-curated bioinformatics pipelines."
  issue: 3
  volume: 38
  year: 2020
  url: https://dx.doi.org/10.1038/s41587-020-0439-x

GitHub Events

Total
  • Create event: 3
  • Issues event: 1
  • Release event: 3
  • Push event: 69
Last Year
  • Create event: 3
  • Issues event: 1
  • Release event: 3
  • Push event: 69

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 1
  • Total pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Total issue authors: 1
  • Total pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 1
  • Pull requests: 0
  • Average time to close issues: N/A
  • Average time to close pull requests: N/A
  • Issue authors: 1
  • Pull request authors: 0
  • Average comments per issue: 0.0
  • Average comments per pull request: 0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • idph-wgs (1)
Pull Request Authors
Top Labels
Issue Labels
Pull Request Labels