Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (8.5%) to scientific vocabulary
Repository
Basic Info
- Host: GitHub
- Owner: xiaoli-dong
- License: mit
- Language: Nextflow
- Default Branch: main
- Size: 2.84 MB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 1
- Open Issues: 1
- Releases: 2
Metadata Files
README.md
nf-fluAB Pipeline
🕒 Last updated: April 14, 2025
Introduction
nf-fluAB is a bioinformatics pipeline for the assembly, typing, and lineage assignment of influenza NGS data generated from Illumina or Nanopore platforms. Built with Nextflow,it enables portable and scalable execution across a range of computing environments. The use of Docker and Singularity containers ensures easy installation and highly reproducible results.
Pipeline Summary
The pipeline takes a samplesheet and corresponding FASTQ files as input. It performs quality control (QC), identifies the closest publicly available reference(s), maps reads, calls variants, and generates a consensus contig for each segment. In cases of co-infection, multiple references are selected, and multiple consensus contigs are generated per segment. Segments are typed against a local database, and lineage is determined using Nextclade. At the end of the workflow, a comprehensive master summary report is generated for each sample, along with a consolidated analysis overview.
- QC
- Illumina QC (fastp orBBDuk -> seqkit stats)
- Nanopore QC (Porechop -> chopper -> seqkit stats)
- dehost (hostile -> seqkit stats)
- seek references (mash screen -> filter mash screen output)
- mapping and post-processing bam files
- Illumina mapping (bwa or minimap2 -> samtools sort, index -> picard MARKDUPLICATES -> samtools coverage -> bedtools genomecov)
- Nanopore mapping (minimap2 -> samtools sort, index -> samtools coverage -> samtools coverage -> bedtools genomecov )
- variant calling and post-processing vcf files
- Illumina data variant calling and post-processing (freebayes or bcftools -> bcftools sort, index -> bcftools norm -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
- Nanopore data variant calling and post-processing (clair3 -> bcftools sort, index -> bcftools filter (low quality, low depth) -> snpeff -> bcftools filter (frameshift)
- consensus calling (bcftools consensus -> SEQKIT fx2tab (stats)
- consensus typing (blastn against typing database)
- Lineage determination (nextclade)
- Summary report
Pipeline required reference sequences and databases
- Influenza A and B primer sequences: used for sequence data qulaity control process
- flu typing database: used for assembled segment typing
- nextcade dataset: used for flu A&B lineage determination
- Influenza A and B reference databases: used for seeking the closest related public available sequences and used as the references in the assembly. Go to reference database building guide
Build Influenza A and B reference databases
The following procedure will build the influenza A and B reference database. Then by using hte reference fasta sequences generated, the script also generated the mash sketch database (msh file) and also the snpEff flu database ```
create conda environment for creating database
mamba create -n '$env_name' mash=2.3 snpeff=5.2 vadr=1.6.4 biopython=1.84 entrez-direct=22.4 diamond=2.1.11 cd-hit=4.8.1 -y
running the script
pathtobindirectory/makedb.sh -i pathto/sequences.fasta -o outdir -c numberofcpus -g pathto/BVBRCgenome.csv -d outputdatabase_prefix
```
Quick Start
If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.
Check pipeline command line options
You can download the nf-fluab from github to local computer or you can directly run the pipeline from github remotely. The following is an example how to check the command line options without downloading the pipeline locally:
```
running directly from github without downloading or cloning
nextflow run xiaoli-dong/nf-fluab -r 7f72d6c --help ``` To run the analysis with your data, prepare a csv format samplesheet, which contains the sequenence information for each of your samples, as input. The samplesheet can contain only Illumina data or only nanopore data, it cannot not accept data from both Illumina and nanopore data in one analysis. See below for what the samplesheet looks like:
Illumina data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq
comment lines will be ignoreed
sample1,sample1R1.fastq.gz,sample1R2.fastq.gz,NA sample2,sample2R1.fastq.gz,sample2R2.fastq.gz,NA sample3,sample3R1.fastq.gz,sample3R2.fastq.gz,NA ```
Nanopore data analysis sample sheet example ``` sample,fastq1,fastq2,long_fastq
comment lines will be ignoreed
sample1,NA,NA,sample1.fastq.gz sample2,NA,NA,sample2.fastq.gz sample3,NA,NA,sample3.fastq.gz ```
Now, you can run the pipeline using:
bash
nextflow run xiaoli-dong/nf-fluAB --input samplesheet.csv --outdir <OUTDIR> -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --platform <illumina or nanopore>
Credits
- nf-fluAB was written by Xiaoli Dong.
- The Illumina part of the pipeline was primarily based on Dr. Matthew Croxen's flu pipeline.
- Extensive support was provided by the ProvLab Research Team in Calgary for generating sequencing data and conducting the pipeline testing: >- Linda Lee >- Johanna M Thayer >- Fitsum Getachew >- Petya Kolva >- Anita Wong >- Kanti Pabbaraju
- Dr. Tarah Lynch, and Dr. Matthew Croxen for extensive technical inputs.
Owner
- Name: Xiaoli Dong
- Login: xiaoli-dong
- Kind: user
- Location: Calgary
- Company: Public Health Laboratory - South, Alberta Precision Laboratories
- Website: http://people.ucalgary.ca/~xdong
- Twitter: xiaoliidong
- Repositories: 4
- Profile: https://github.com/xiaoli-dong
Bioinformatician
Citation (CITATION.cff)
cff-version: 1.2.0
message: "If you use `nf-core tools` in your work, please cite the `nf-core` publication"
authors:
- family-names: Ewels
given-names: Philip
- family-names: Peltzer
given-names: Alexander
- family-names: Fillinger
given-names: Sven
- family-names: Patel
given-names: Harshil
- family-names: Alneberg
given-names: Johannes
- family-names: Wilm
given-names: Andreas
- family-names: Garcia
given-names: Maxime Ulysse
- family-names: Di Tommaso
given-names: Paolo
- family-names: Nahnsen
given-names: Sven
title: "The nf-core framework for community-curated bioinformatics pipelines."
version: 2.4.1
doi: 10.1038/s41587-020-0439-x
date-released: 2022-05-16
url: https://github.com/nf-core/tools
prefered-citation:
type: article
authors:
- family-names: Ewels
given-names: Philip
- family-names: Peltzer
given-names: Alexander
- family-names: Fillinger
given-names: Sven
- family-names: Patel
given-names: Harshil
- family-names: Alneberg
given-names: Johannes
- family-names: Wilm
given-names: Andreas
- family-names: Garcia
given-names: Maxime Ulysse
- family-names: Di Tommaso
given-names: Paolo
- family-names: Nahnsen
given-names: Sven
doi: 10.1038/s41587-020-0439-x
journal: nature biotechnology
start: 276
end: 278
title: "The nf-core framework for community-curated bioinformatics pipelines."
issue: 3
volume: 38
year: 2020
url: https://dx.doi.org/10.1038/s41587-020-0439-x
GitHub Events
Total
- Create event: 3
- Issues event: 1
- Release event: 3
- Push event: 69
Last Year
- Create event: 3
- Issues event: 1
- Release event: 3
- Push event: 69
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 1
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 1
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 1
- Pull request authors: 0
- Average comments per issue: 0.0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- idph-wgs (1)