nf-flu

Influenza genome analysis Nextflow workflow

https://github.com/peterk87/nf-flu

Science Score: 49.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
    Found .zenodo.json file
  • DOI references
    Found 38 DOI reference(s) in README
  • Academic publication links
    Links to: ncbi.nlm.nih.gov, zenodo.org
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.4%) to scientific vocabulary

Scientific Fields

Biochemistry, Genetics and Molecular Biology Life Sciences - 30% confidence
Last synced: 6 months ago · JSON representation

Repository

Influenza genome analysis Nextflow workflow

Basic Info
  • Host: GitHub
  • Owner: peterk87
  • License: mit
  • Language: Nextflow
  • Default Branch: master
  • Size: 3.41 MB
Statistics
  • Stars: 28
  • Watchers: 7
  • Forks: 22
  • Open Issues: 4
  • Releases: 2
Created almost 7 years ago · Last pushed 7 months ago
Metadata Files
Readme Changelog License Zenodo

README.md

CFIA-NCFAD/nf-flu - Influenza A and B Virus Genome Assembly Nextflow Workflow

DOI CI

Nextflow run with conda run with docker run with apptainer run with singularity run with podman

Introduction

nf-flu is a Nextflow bioinformatics analysis pipeline for assembly and analysis of Influenza A and B viruses from Illumina or Nanopore sequencing data or previously assembled FASTA sequences. Since Influenza has a segmented genome consisting of 8 gene segments, the pipeline will automatically select the top matching reference sequence from NCBI for each gene segment based on IRMA assembly and nucleotide BLAST against all Influenza sequences from NCBI. Users can also provide their own reference sequences to include in the top reference sequence selection process. After reference sequence selection, the pipeline performs read mapping to each reference sequence, variant calling and depth-masked consensus sequence generation.

Note: The officially supported version of the pipeline is CFIA-NCFAD/nf-flu. If you have issues with using the pipeline, please create an issue on CFIA-NCFAD/nf-flu repo.

Pipeline summary

  1. Download latest NCBI Influenza virus sequences and metadata (see docs for more details).
  2. Merge reads of re-sequenced samples (cat) (if needed).
  3. Assembly of Influenza gene segments with IRMA using the built-in FLU module
  4. Nucleotide BLAST search against NCBI Influenza DB sequences
  5. H/N subtype prediction and Excel XLSX report generation based on BLAST results.
  6. Automatically select top match reference sequences for segments
  7. Read mapping, variant calling and consensus sequence generation for each segment against top reference sequence based on BLAST results.
  8. Annotation of consensus sequences with VADR
  9. FluMut detection of molecular markers and mutation in Influenza A(H5N1) viruses.
  10. GenoFLU genotyping of North American H5 viruses. Genin2 genotyping using information from clade 2.3.4.4b H5Nx viruses collected in Europe since October 2020.
  11. HA cleavage site prediction and classification
  12. Nextclade clade assignment, mutation calling and sequence quality checks.
  13. MultiQC report generation.

nf-flu workflow

Quick Start

  1. Install Nextflow (>=22.10.1; latest stable release recommended!).
  2. Install any of Docker, [Apptainer][], Singularity, Podman, Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort)
  3. Download the pipeline and test it on a minimal dataset with a single command:

    For Illumina workflow test:

    bash nextflow run CFIA-NCFAD/nf-flu -profile test_illumina,<docker/apptainer/singularity/podman/shifter/charliecloud/conda> \ --max_cpus $(nproc) # use all available CPUs; default is 2

    For Nanopore workflow test:

    bash nextflow run CFIA-NCFAD/nf-flu -profile test_nanopore,<docker/apptainer/singularity/podman/shifter/charliecloud/conda> \ --max_cpus $(nproc) # use all available CPUs; default is 2

    • If you are using apptainer/singularity then the pipeline will auto-detect this and attempt to download the Apptainer/Singularity images directly as opposed to performing a conversion from Docker images. If you are persistently observing issues downloading Apptainer/Singularity images directly due to timeout or network issues then please use the --singularity_pull_docker_container parameter to pull and convert the Docker image instead. Alternatively, it is highly recommended to use the nf-core download command to pre-download all of the required containers before running the pipeline and to set the NXF_SINGULARITY_CACHEDIR or singularity.cacheDir Nextflow options to be able to store and re-use the images from a central location for future pipeline runs.
    • If you are using conda, it is highly recommended to use the NXF_CONDA_CACHEDIR or conda.cacheDir settings to store the environments in a central location for future pipeline runs.
  4. Run your own analysis

* [Optional] Generate an input samplesheet from a directory containing Illumina FASTQ files (e.g. `/path/to/illumina_run/Data/Intensities/Basecalls/`) with the included Python script [`fastq_dir_to_samplesheet.py`](bin/fastq_dir_to_samplesheet.py) **before** you run the pipeline (requires Python 3 installed locally) e.g.

    ```bash
    python ~/.nextflow/assets/CFIA-NCFAD/nf-flu/bin/fastq_dir_to_samplesheet.py \
      -i /path/to/illumina_run/Data/Intensities/Basecalls/ \
      -o samplesheet.csv
    ```

* Typical command for Illumina sequencing data

    ```bash
    nextflow run CFIA-NCFAD/nf-flu \
      --input samplesheet.csv \
      --platform illumina \
      --profile <docker/apptainer/singularity/podman/shifter/charliecloud/conda>
    ```

* Typical command for Nanopore sequencing data

  ```bash
  nextflow run CFIA-NCFAD/nf-flu \
    --input samplesheet.csv \
    --platform nanopore \
    --profile <docker/apptainer/singularity/conda>
  ```

* Run analysis on FASTA files within a directory

  ```bash
  nextflow run CFIA-NCFAD/nf-flu \
    --input /path/to/fasta_files/ \
    --platform assemblies \
    --profile <docker/apptainer/singularity/conda>
  ```

Documentation

The nf-flu pipeline comes with:

Resources and References

BcfTools and Samtools

text Danecek, P., Bonfield, J.K., Liddle, J., Marshall, J., Ohan, V., Pollard, M.O., Whitwham, A., Keane, T., McCarthy, S.A., Davies, R.M., Li, H., 2021. Twelve years of SAMtools and BCFtools. Gigascience 10, giab008. https://doi.org/10.1093/gigascience/giab008

BLAST Basic Local Alignment Search Tool

text Altschul, S.F., Gish, W., Miller, W., Myers, E.W., Lipman, D.J., 1990. Basic local alignment search tool. J. Mol. Biol. 215, 403–410. https://doi.org/10.1016/S0022-2836(05)80360-2

text Camacho, C., Coulouris, G., Avagyan, V., Ma, N., Papadopoulos, J., Bealer, K., Madden, T.L., 2009. BLAST+: architecture and applications. BMC Bioinformatics 10, 421. https://doi.org/10.1186/1471-2105-10-421

Clair3

text Zheng, Z., Li, S., Su, J., Leung, A.W.-S., Lam, T.-W., Luo, R., 2022. Symphonizing pileup and full-alignment for deep learning-based long-read variant calling. Nat Comput Sci 2, 797–803. https://doi.org/10.1038/s43588-022-00387-x

FluMut

FluMut is used to "search for molecular markers with potential impact on the biological characteristics of Influenza A viruses of the A(H5N1) subtype, starting from complete or partial nucleotide genome sequences".

text Edoardo Giussani, Alessandro Sartori, Angela Salomoni, Lara Cavicchio, Cristian de Battisti, Ambra Pastori, Maria Varotto, Bianca Zecchin, Joseph Hughes, Isabella Monne, Alice Fusaro. FluMut: a tool for mutation surveillance in highly pathogenic H5N1 genomes. Virus Evolution, Volume 11, Issue 1, 2025, veaf011. https://doi.org/10.1093/ve/veaf011

Freebayes

Freebayes is used for variant calling.

text Garrison, E., Marth, G., 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio]. https://doi.org/10.48550/arXiv.1207.3907

Genin2

Genin2 "predicts genotypes for clade 2.3.4.4b H5Nx viruses collected in Europe since October 2020".

text https://github.com/izsvenezie-virology/genin2

NOTE: Authors suggest citing the GitHub repo at this time.

GenoFLU

GenoFLU "identifies the genotype of North American H5 2.3.4.4b viruses as well as providing information on individual segments when a sequence does not belong to a defined genotype".

text Youk S, Torchetti MK, Lantz K, Lenoch JB, Killian ML, Leyson C, Bevins SN, Dilione K, Ip HS, Stallknecht DE, Poulson RL, Suarez DL, Swayne DE, Pantin-Jackwood MJ. H5N1 highly pathogenic avian influenza clade 2.3.4.4b in wild and domestic birds: Introductions into the United States and reassortments, December 2021-April 2022. Virology. 2023 Oct;587:109860. doi: 10.1016/j.virol.2023.109860. Epub 2023 Aug 2. PMID: 37572517.

NOTE: The authors recommend citing the Youk et al paper according to GenoFLU issue #10.

IRMA Iterative Refinement Meta-Assembler

text Shepard, S.S., Meno, S., Bahl, J., Wilson, M.M., Barnes, J., Neuhaus, E., 2016. Viral deep sequencing needs an adaptive approach: IRMA, the iterative refinement meta-assembler. BMC Genomics 17, 708. https://doi.org/10.1186/s12864-016-3030-6

Medaka

Medaka is deprecated in favour of Clair3 for variant calling of Nanopore data.

Minimap2

Minimap2 is used for rapid and accurate read alignment to reference sequences.

text Li, H., 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100. https://doi.org/10.1093/bioinformatics/bty191

Mosdepth

Mosdepth is used for rapid sequencing coverage calculation and summary statistics.

text Pedersen, B.S., Quinlan, A.R., 2017. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868. https://doi.org/10.1093/bioinformatics/btx699

MultiQC

MultiQC is used for generation of a single report for multiple tools.

text Ewels, P., Magnusson, M., Lundin, S., Käller, M., 2016. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics 32, 3047–3048. https://doi.org/10.1093/bioinformatics/btw354

NCBI Influenza Virus Resource

nf-flu relies on publicly available Influenza sequence data from NCBI available at the NCBI Influenza Virus Resource, which is downloaded from the FTP site.

NCBI Influenza Virus Resource:

text Bao, Y., Bolotov, P., Dernovoy, D., Kiryutin, B., Zaslavsky, L., Tatusova, T., Ostell, J., Lipman, D., 2008. The influenza virus resource at the National Center for Biotechnology Information. J Virol 82, 596–601. https://doi.org/10.1128/JVI.02005-07

NCBI Influenza Virus Sequence Annotation Tool:

text Bao, Y., Bolotov, P., Dernovoy, D., Kiryutin, B., Tatusova, T., 2007. FLAN: a web server for influenza virus genome annotation. Nucleic Acids Res 35, W280-284. https://doi.org/10.1093/nar/gkm354

Nextclade

nf-flu performs Nextclade clade assignment, mutation calling and sequence quality checks of assembled Influenza sequences against 30 Nextclade datasets for different subtypes and lineages of Influenza A and B virus.

text Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A., (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773, https://doi.org/10.21105/joss.03773

Nextflow

nf-flu is implemented in Nextflow.

text Tommaso, P.D., Chatzou, M., Floden, E.W., Barja, P.P., Palumbo, E., Notredame, C., 2017. Nextflow enables reproducible computational workflows. Nat Biotechnol 35, 316–319. https://doi.org/10.1038/nbt.3820

nf-core

nf-core is a great resource for building robust and reproducible bioinformatics pipelines.

text Ewels, P.A., Peltzer, A., Fillinger, S., Patel, H., Alneberg, J., Wilm, A., Garcia, M.U., Di Tommaso, P., Nahnsen, S., 2020. The nf-core framework for community-curated bioinformatics pipelines. Nat Biotechnol 38, 276–278. https://doi.org/10.1038/s41587-020-0439-x

seqtk

seqtk is used for rapid manipulation of FASTA/Q files. Available from GitHub at lh3/seqtk

VADR

VADR is used for annotation of Influenza virus sequences.

text Alejandro A Schäffer, Eneida L Hatcher, Linda Yankie, Lara Shonkwiler, J Rodney Brister, Ilene Karsch-Mizrachi, Eric P Nawrocki; VADR: validation and annotation of virus sequence submissions to GenBank. BMC Bioinformatics 21, 211 (2020). https://doi.org/10.1186/s12859-020-3537-3

table2asn

table2asn is used for converting the VADR Feature Table format output to Genbank format to help with conversion to other formats such as FASTA and GFF.

Contributors

Credits

Owner

  • Name: Peter Kruczkiewicz
  • Login: peterk87
  • Kind: user
  • Location: Canada
  • Company: Canadian Food Inspection Agency

GitHub Events

Total
  • Issues event: 1
  • Watch event: 4
  • Delete event: 8
  • Issue comment event: 14
  • Push event: 27
  • Pull request event: 4
  • Fork event: 1
  • Create event: 8
Last Year
  • Issues event: 1
  • Watch event: 4
  • Delete event: 8
  • Issue comment event: 14
  • Push event: 27
  • Pull request event: 4
  • Fork event: 1
  • Create event: 8

Issues and Pull Requests

Last synced: 6 months ago

All Time
  • Total issues: 0
  • Total pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Total issue authors: 0
  • Total pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Past Year
  • Issues: 0
  • Pull requests: 1
  • Average time to close issues: N/A
  • Average time to close pull requests: 1 day
  • Issue authors: 0
  • Pull request authors: 1
  • Average comments per issue: 0
  • Average comments per pull request: 1.0
  • Merged pull requests: 0
  • Bot issues: 0
  • Bot pull requests: 0
Top Authors
Issue Authors
  • kdl480 (1)
  • marchoeppner (1)
  • peterk87 (1)
  • EricFournier3 (1)
  • chendongo (1)
  • fanninpm (1)
Pull Request Authors
  • hkunerth (1)
  • peterk87 (1)
  • cerdelyan (1)
Top Labels
Issue Labels
bug (3) question (1) enhancement (1)
Pull Request Labels
question (1)

Dependencies

.github/workflows/ci.yml actions
  • actions/cache v3 composite
  • actions/checkout v2 composite
  • actions/upload-artifact v1.0.0 composite
.github/workflows/linting.yml actions
  • actions/checkout v2 composite
  • actions/checkout v1 composite
  • actions/setup-node v1 composite
  • mshick/add-pr-comment v1 composite
.github/workflows/branch.yml actions
  • mshick/add-pr-comment v1 composite
modules/nf-core/modules/custom/dumpsoftwareversions/meta.yml cpan