nf-fluab

https://github.com/xiaoli-dong/nf-fluab

Science Score: 44.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
✓
.zenodo.json file
Found .zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary

Last synced: 9 months ago · JSON representation ·

Repository

Basic Info

Host: GitHub
Owner: xiaoli-dong
License: mit
Language: Nextflow
Default Branch: main
Size: 2.84 MB

Statistics

Stars: 0
Watchers: 1
Forks: 1
Open Issues: 1
Releases: 2

Created about 2 years ago · Last pushed about 1 year ago

Metadata Files

Readme Changelog License Code of conduct Citation

nf-fluAB Pipeline

🕒 Last updated: April 14, 2025

Introduction

nf-fluAB is a bioinformatics pipeline for the assembly, typing, and lineage assignment of influenza NGS data generated from Illumina or Nanopore platforms. Built with Nextflow,it enables portable and scalable execution across a range of computing environments. The use of Docker and Singularity containers ensures easy installation and highly reproducible results.

Pipeline Summary

The pipeline takes a samplesheet and corresponding FASTQ files as input. It performs quality control (QC), identifies the closest publicly available reference(s), maps reads, calls variants, and generates a consensus contig for each segment. In cases of co-infection, multiple references are selected, and multiple consensus contigs are generated per segment. Segments are typed against a local database, and lineage is determined using Nextclade. At the end of the workflow, a comprehensive master summary report is generated for each sample, along with a consolidated analysis overview.

Pipeline Diagram

QC
- Illumina QC (fastp orBBDuk -> seqkit stats)
- Nanopore QC (Porechop -> chopper -> seqkit stats)
dehost (hostile -> seqkit stats)
seek references (mash screen -> filter mash screen output)
mapping and post-processing bam files
- Illumina mapping (bwa or minimap2 -> samtools sort, index -> picard MARKDUPLICATES -> samtools coverage -> bedtools genomecov)
- Nanopore mapping (minimap2 -> samtools sort, index -> samtools coverage -> samtools coverage -> bedtools genomecov )
variant calling and post-processing vcf files
- Illumina data variant calling and post-processing (freebayes or bcftools -> bcftools sort, index -> bcftools norm -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
- Nanopore data variant calling and post-processing (clair3 -> bcftools sort, index -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
consensus calling (bcftools consensus -> SEQKIT fx2tab (stats)
consensus typing (blastn against typing database)
Lineage determination (nextclade)
Summary report

Pipeline required reference sequences and databases

Influenza A and B primer sequences: used for sequence data qulaity control process
flu typing database: used for assembled segment typing
nextcade dataset: used for flu A&B lineage determination
Influenza A and B reference databases: used for seeking the closest related public available sequences and used as the references in the assembly. Go to reference database building guide

Build Influenza A and B reference databases

The following procedure will build the influenza A and B reference database. Then by using hte reference fasta sequences generated, the script also generated the mash sketch database (msh file) and also the snpEff flu database ```

create conda environment for creating database

mamba create -n '$env_name' mash=2.3 snpeff=5.2 vadr=1.6.4 biopython=1.84 entrez-direct=22.4 diamond=2.1.11 cd-hit=4.8.1 -y

running the script

pathtobindirectory/makedb.sh -i pathto/sequences.fasta -o outdir -c numberofcpus -g pathto/BVBRCgenome.csv -d outputdatabase_prefix

```

Quick Start

If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

Check pipeline command line options

You can download the nf-fluab from github to local computer or you can directly run the pipeline from github remotely. The following is an example how to check the command line options without downloading the pipeline locally:

```

running directly from github without downloading or cloning

nextflow run xiaoli-dong/nf-fluab -r 7f72d6c --help ``` To run the analysis with your data, prepare a csv format samplesheet, which contains the sequenence information for each of your samples, as input. The samplesheet can contain only Illumina data or only nanopore data, it cannot not accept data from both Illumina and nanopore data in one analysis. See below for what the samplesheet looks like:

Illumina data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq

comment lines will be ignoreed

sample1,sample1R1.fastq.gz,sample1R2.fastq.gz,NA sample2,sample2R1.fastq.gz,sample2R2.fastq.gz,NA sample3,sample3R1.fastq.gz,sample3R2.fastq.gz,NA ```

Nanopore data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq

comment lines will be ignoreed

sample1,NA,NA,sample1.fastq.gz sample2,NA,NA,sample2.fastq.gz sample3,NA,NA,sample3.fastq.gz ```

Now, you can run the pipeline using:

bash nextflow run xiaoli-dong/nf-fluAB --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --platform <illumina or nanopore>

Credits

nf-fluAB was written by Xiaoli Dong.
The Illumina part of the pipeline was primarily based on Dr. Matthew Croxen's flu pipeline.
Extensive support was provided by the ProvLab Research Team in Calgary for generating sequencing data and conducting the pipeline testing: >- Linda Lee >- Johanna M Thayer >- Fitsum Getachew >- Petya Kolva >- Anita Wong >- Kanti Pabbaraju
Dr. Tarah Lynch, and Dr. Matthew Croxen for extensive technical inputs.

Owner

Name: Xiaoli Dong
Login: xiaoli-dong
Kind: user
Location: Calgary
Company: Public Health Laboratory - South, Alberta Precision Laboratories

Website: http://people.ucalgary.ca/~xdong
Twitter: xiaoliidong
Repositories: 4
Profile: https://github.com/xiaoli-dong

Bioinformatician

Citation (CITATION.cff)

cff-version: 1.2.0
message: "If you use `nf-core tools` in your work, please cite the `nf-core` publication"
authors:
  - family-names: Ewels
    given-names: Philip
  - family-names: Peltzer
    given-names: Alexander
  - family-names: Fillinger
    given-names: Sven
  - family-names: Patel
    given-names: Harshil
  - family-names: Alneberg
    given-names: Johannes
  - family-names: Wilm
    given-names: Andreas
  - family-names: Garcia
    given-names: Maxime Ulysse
  - family-names: Di Tommaso
    given-names: Paolo
  - family-names: Nahnsen
    given-names: Sven
title: "The nf-core framework for community-curated bioinformatics pipelines."
version: 2.4.1
doi: 10.1038/s41587-020-0439-x
date-released: 2022-05-16
url: https://github.com/nf-core/tools
prefered-citation:
  type: article
  authors:
    - family-names: Ewels
      given-names: Philip
    - family-names: Peltzer
      given-names: Alexander
    - family-names: Fillinger
      given-names: Sven
    - family-names: Patel
      given-names: Harshil
    - family-names: Alneberg
      given-names: Johannes
    - family-names: Wilm
      given-names: Andreas
    - family-names: Garcia
      given-names: Maxime Ulysse
    - family-names: Di Tommaso
      given-names: Paolo
    - family-names: Nahnsen
      given-names: Sven
  doi: 10.1038/s41587-020-0439-x
  journal: nature biotechnology
  start: 276
  end: 278
  title: "The nf-core framework for community-curated bioinformatics pipelines."
  issue: 3
  volume: 38
  year: 2020
  url: https://dx.doi.org/10.1038/s41587-020-0439-x

GitHub Events

Total

Create event: 3
Issues event: 1
Release event: 3
Push event: 69

Last Year

Create event: 3
Issues event: 1
Release event: 3
Push event: 69

Issues and Pull Requests

Last synced: 9 months ago

All Time

Total issues: 1
Total pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Total issue authors: 1
Total pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

Past Year

Issues: 1
Pull requests: 0
Average time to close issues: N/A
Average time to close pull requests: N/A
Issue authors: 1
Pull request authors: 0
Average comments per issue: 0.0
Average comments per pull request: 0
Merged pull requests: 0
Bot issues: 0
Bot pull requests: 0

nf-fluab

Science Score: 44.0%

Repository

Basic Info

Statistics

Metadata Files

README.md

nf-fluAB Pipeline

Introduction

Pipeline Summary

Pipeline required reference sequences and databases

Build Influenza A and B reference databases

create conda environment for creating database

running the script

Quick Start

Check pipeline command line options

running directly from github without downloading or cloning

comment lines will be ignoreed

comment lines will be ignoreed

Credits

Owner

Citation (CITATION.cff)

GitHub Events

Total

Last Year

Issues and Pull Requests

All Time

Past Year

Top Authors

Issue Authors

Pull Request Authors

Top Labels

Issue Labels

Pull Request Labels