viruswarn-flu
A mutation-based alert system to prioritize concerning Influenza variants from sequencing data.
Science Score: 31.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (12.8%) to scientific vocabulary
Keywords
Repository
A mutation-based alert system to prioritize concerning Influenza variants from sequencing data.
Basic Info
- Host: GitHub
- Owner: rki-mf1
- License: mit
- Language: HTML
- Default Branch: main
- Homepage: https://github.com/rki-mf1/VirusWarn-Flu
- Size: 3.58 MB
Statistics
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
- Releases: 0
Topics
Metadata Files
README.md
VirusWarn-Flu
Mutation-Based Early Warning System to Prioritize Concerning Influenza Variants from Sequencing Data
The goal of VirusWarn-Flu is to detect concerning Influenza variants from sequencing data. It does so by parsing Influenza genomes and detecting amino acids mutations in the spike proteins that can be associated with a phenotypic change. The phenotypic changes are annotated according to the knowledge accumulated on previous variants. The tool is based on VirusWarn-SC2, which was invented for SARS-CoV-2.
Documentation
VirusWarn-Flu is part of VirusWarn
Check out the Video Tutorial
Getting Started
⚠️ Note: 🔌 Right now, VirusWarn-Flu is tested on Linux and Mac system only 💻
Quick Installation
To run the pipeline, you need to have Nextflow and either conda, Docker or Singularity.
Click! If you want to install Nextflow directly, you can use the following one-liner.
```bash
wget -qO- https://get.nextflow.io | bash
```
Click! If you want to set up conda to run the pipeline and install all other dependencies through it, you can use the following steps.
Use the following bash commands if you are working on **Linux**:
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh
```
Use the following bash commands if you are working on **Mac**:
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
bash Miniconda3-latest-MacOSX-arm64.sh
```
Then, `Nextflow` an be installed over `conda`:
```bash
conda create -n nextflow -c bioconda nextflow
conda activate nextflow
```
Get / Update VirusWarn-Flu
bash
nextflow pull rki-mf1/VirusWarn-Flu
Call help
bash
nextflow run rki-mf1/VirusWarn-Flu -r <version> --help
Running VirusWarn-Flu
Once nextflow, is installed you are good to go! VirusWarn-Flu is run with only one command, using either conda (or mamba), Docker or Singularity.
With a conda, please run:
bash
nextflow run rki-mf1/VirusWarn-Flu -r <version> \
-profile conda,local \
--fasta 'test/openflu_h1n1.fasta' \
--metadata 'test/metadata_h1n1.xlsx'
With a Docker, please run:
bash
nextflow run rki-mf1/VirusWarn-Flu -r <version> \
-profile docker,local \
--fasta 'test/openflu_h1n1.fasta' \
--metadata 'test/metadata_h1n1.xlsx'
With a Singularity, please run:
bash
nextflow run rki-mf1/VirusWarn-Flu -r <version> \
-profile singularity,local \
--fasta 'test/openflu_h1n1.fasta' \
--metadata 'test/metadata_h1n1.xlsx'
Running VirusWarn-Flu with splitting (Input from OpenFlu)
bash
nextflow run rki-mf1/VirusWarn-Flu -r <version> \
-profile conda,local \
--fasta 'test/openflu_combi.fasta' \
--metadata 'test/metadata_combi.xlsx' \
--split 'OpenFlu'
Parameter list
fasta REQUIRED! The path to the fasta file with the sequences for VirusWarn-Flu.
[ default: '' ]
ref If you want to use the recent references from Nextclade, choose ''.
H1N1: A/Wisconsin/588/2019 (MW626065)
H3N2: A/Darwin/6/2021 (EPI1857216)
If you want to use the older references for H1N1 and H3N2, choose 'old'.
H1N1: A/California/7/2009 (CY121680)
H3N2: A/Wisconsin/67/2005 (CY163680)
For Victoria, only B/Brisbane/60/2008 (KX058884) is available.
[ default: '' ]
metadata The path to a metadate file for the sequences with collection dates.
Required to generate a heatmap in the report.
[ default: '' ]
subtype If the input fasta file only contains sequences of one subtype,
define the subtype to choose the right references and tables.
The options are H1N1 and H3N2 for Influenza A,
Victoria for Influenza B.
[ default: 'h1n1' ]
split If the input fasta file contains sequences of more than one subtype,
enable the split parameter to write them into one file per subtype and
ensure the use of the right references and tables.
The options are FluPipe, GISAID and OpenFlu.
[ default: '' ]
qc If set to true, a QC report will be generated from the Nextclade output.
[ default: true ]
strict Run process with strict alert levels (without orange). Choose 'y'.
[ default: 'n' ]
season The Influenza season from which the input sequences are.
Important for checking on substitutions that are fixed in the population.
[ default: '23/24' ]
Data
For further information on the tables that are used for the ranking, please take a look at the subfolders A(H1N1)pdm09, A(H3N2) and B(Victoria) of the folder data depending on the subtype you are interested in and the READMEs that are provided there.
For further instructions for test runs and information on the used data, please take a look at the folder test and the README that is provided there.
An example of the HTML report can be found in the folder example.
How to interprete result.
VirusWarn-Flu output an alert level in four different colours which can be classified into 3 ratings.
| Alert color | Description | Impact | | ----------- | ----------- | ----------- | | Pink | Variants with fixed MOCs and ROIs from the previous season. | HIGH | | Red | Variants with a high number of MOCs and ROIs that can be dangerous. | HIGH | | Orange | Variants in the orange level have less MOC and ROI than the ones in the red level and are therefore considered less dangerous but still concerning. | MODERATE | | Yellow | Variants that accumulate a high number of ROIs or PMs are sorted in the pink level for further inspection. | MODERATE | | Grey | The remaining variants are assigned to the black category. | LOW |
Contact
Did you find a bug? 🐛 Suggestion/Feedback/Feature request? 👨💻 Please visit GitHub Issues
For business inquiries or professional support requests 🍺 Please feel free to contact us!
Acknowledgments
Original tool: VirusWarn-SC2 (former VOCAL - Variant Of Concern ALert and prioritization)
- Original Idea: SC2 Evolution Working group at the Robert Koch Institute in Berlin
- Funding: Supported by the European Centers for Disease Control [grant number ECDC GRANT/2021/008 ECD.12222].
Citations
An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.
Owner
- Name: RKI MF1 Bioinformatics
- Login: rki-mf1
- Kind: organization
- Location: Germany
- Repositories: 9
- Profile: https://github.com/rki-mf1
Bioinformatics code of MF1
Citation (CITATIONS.md)
# FluWarnSystem: Citations ## [VOCAL: Variant Of Concern ALert and prioritization](https://github.com/rki-mf1/vocal) ## [Nextflow](https://pubmed.ncbi.nlm.nih.gov/28398311/) > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311. ## Pipeline - [Nextclade](https://clades.nextstrain.org) > Aksamentov, I., Roemer, C., Hodcroft, E. B., & Neher, R. A., (2021). Nextclade: clade assignment, mutation calling and quality control for viral genomes. Journal of Open Source Software, 6(67), 3773, https://doi.org/10.21105/joss.03773 **The customized scripts in [`bin`](bin) use the following languages and packages:** ### [Python](https://www.python.org/) - [BioPython](https://biopython.org) > Cock PA, Antao T, Chang JT, Chapman BA, Cox CJ, Dalke A, Friedberg I, Hamelryck T, Kauff F, Wilczynski B and de Hoon MJL (2009) Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25, 1422-1423 - [NumPy](https://numpy.org) > Harris CR, Millman KJ, van der Walt SJ et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 10.1038/s41586-020-2649-2. - [Pandas](https://pandas.pydata.org) - [argparse](https://docs.python.org/3/library/argparse.html) - [itertools](https://docs.python.org/3/library/itertools.html) - [operator](https://docs.python.org/3/library/operator.html) - [sys](https://docs.python.org/3/library/sys.html) - [time](https://docs.python.org/3/library/time.html) - [typing](https://docs.python.org/3/library/typing.html) - [warnings](https://docs.python.org/3/library/warnings.html) ### [R](https://www.R-project.org/) > R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. - [rmarkdown](https://CRAN.R-project.org/package=rmarkdown) > Xie Y, Allaire J, Grolemund G (2018). R Markdown: The Definitive Guide. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9781138359338, https://bookdown.org/yihui/rmarkdown. > Xie Y, Dervieux C, Riederer E (2020). R Markdown Cookbook. Chapman and Hall/CRC, Boca Raton, Florida. ISBN 9780367563837, https://bookdown.org/yihui/rmarkdown-cookbook. > Allaire J, Xie Y, Dervieux C, McPherson J, Luraschi J, Ushey K, Atkins A, Wickham H, Cheng J, Chang W, Iannone R (2023). rmarkdown: Dynamic Documents for R. R package version 2.25, https://github.com/rstudio/rmarkdown. - [optparse](https://CRAN.R-project.org/package=optparse) - [dplyr](https://CRAN.R-project.org/package=dplyr) - [tidyr](https://CRAN.R-project.org/package=tidyr) - [readr](https://CRAN.R-project.org/package=readr) - [stringr](https://CRAN.R-project.org/package=stringr) - [purrr](https://CRAN.R-project.org/package=purrr) - [igraph](https://CRAN.R-project.org/package=igraph) > Csardi G, Nepusz T (2006). “The igraph software package for complex network research.” InterJournal, Complex Systems, 1695. https://igraph.org. > Csárdi G, Nepusz T, Traag V, Horvát S, Zanini F, Noom D, Müller K (2023). igraph: Network Analysis and Visualization in R. doi:10.5281/zenodo.7682609, R package version 1.6.0. - [forcats](https://CRAN.R-project.org/package=forcats) - [logger](https://CRAN.R-project.org/package=logger) - [tidyverse](https://CRAN.R-project.org/package=tidyverse) > Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” Journal of Open Source Software, 4(43), 1686. doi:10.21105/joss.01686. - [stringi](https://CRAN.R-project.org/package=stringi) > Gagolewski M (2022). “stringi: Fast and portable character string processing in R.” Journal of Statistical Software, 103(2), 1–59. doi:10.18637/jss.v103.i02. - [readxl](https://CRAN.R-project.org/package=readxl) - [seqinr](https://CRAN.R-project.org/package=seqinr) > Charif D, Lobry J (2007). “SeqinR 1.0-2: a contributed package to the R project for statistical computing devoted to biological sequences retrieval and analysis.” In Bastolla U, Porto M, Roman H, Vendruscolo M (eds.), Structural approaches to sequence evolution: Molecules, networks, populations, series Biological and Medical Physics, Biomedical Engineering, 207-232. Springer Verlag, New York. ISBN : 978-3-540-35305-8. - [ggplot2](https://CRAN.R-project.org/package=ggplot2) > Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. ISBN 978-3-319-24277-4, https://ggplot2.tidyverse.org. - [plotly](https://CRAN.R-project.org/package=plotly) > Sievert C (2020). Interactive Web-Based Data Visualization with R, plotly, and shiny. Chapman and Hall/CRC. ISBN 9781138331457, https://plotly-r.com. - [DT](https://CRAN.R-project.org/package=DT) ## Software packaging/containerisation tools - [Anaconda](https://anaconda.com) > Anaconda Software Distribution. Computer software. Vers. 2-2.4.0. Anaconda, Nov. 2016. Web. - [Bioconda](https://pubmed.ncbi.nlm.nih.gov/29967506/) > Grüning B, Dale R, Sjödin A, Chapman BA, Rowe J, Tomkins-Tinch CH, Valieris R, Köster J; Bioconda Team. Bioconda: sustainable and comprehensive software distribution for the life sciences. Nat Methods. 2018 Jul;15(7):475-476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
GitHub Events
Total
- Release event: 1
- Watch event: 2
- Delete event: 3
- Push event: 5
- Public event: 1
- Pull request event: 5
- Create event: 4
Last Year
- Release event: 1
- Watch event: 2
- Delete event: 3
- Push event: 5
- Public event: 1
- Pull request event: 5
- Create event: 4
Issues and Pull Requests
Last synced: 10 months ago
All Time
- Total issues: 0
- Total pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Total issue authors: 0
- Total pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 3
- Average time to close issues: N/A
- Average time to close pull requests: 1 day
- Issue authors: 0
- Pull request authors: 2
- Average comments per issue: 0
- Average comments per pull request: 0.0
- Merged pull requests: 2
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
Pull Request Authors
- chkirschbaum (2)
- huguesrichard (1)