https://github.com/cidgoh/virus-mvp
VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.
Science Score: 49.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
✓DOI references
Found 2 DOI reference(s) in README -
✓Academic publication links
Links to: zenodo.org -
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (14.5%) to scientific vocabulary
Keywords
Repository
VirusMVP is an interactive heatmap-centric app that integrates viral genomic mutations, lineage information and curated functional impact to study the spread and evolution of viruses in Canada and globally.
Basic Info
- Host: GitHub
- Owner: cidgoh
- License: mit
- Language: Python
- Default Branch: development
- Homepage: https://virusmvp.org/
- Size: 88.3 MB
Statistics
- Stars: 15
- Watchers: 2
- Forks: 2
- Open Issues: 35
- Releases: 37
Topics
Metadata Files
README.md
VIRUS-MVP
VIRUS-MVP is a heatmap-centric visualization web application that encodes mutational information across viral populations e.g., SARS-CoV-2 & MPOX. You can find deployed versions of VIRUS-MVP (without user upload functionality) at https://virusmvp.org/.

The data visualized by VIRUS-MVP is generated and annotated by two upstream components of this project:
- nf-ncov-voc - genomics workflow for variant calling and annotation
- pokay - mutation function annotation repository

Native vs Docker installation
VIRUS-MVP can be installed natively, or built through Docker.
A native installation will provide optimal performance on Windows and Mac machines, as there are varying performance costs associated with the Linux virtualization layer used to build Docker containers. VIRUS-MVP docker containers also use port mapping, which incurs additional performance costs.
However, Docker installations are highly portable. Installing VIRUS-MVP through Docker maintains a consistent and reproducible environment across systems, with minimal dependency errors. This consistency is especially useful when deploying the application, as variability between testing and production environments will be reduced.
Native installation steps
0. (If uploading your own data) Install Nextflow + Docker
File uploads trigger the nf-ncov-voc workflow written in Nextflow.
1. Clone the repository and its submodules
$ git clone git@github.com:cidgoh/VIRUS-MVP.git --recurse-submodules
2. Setup a venv environment
This does not provide the same performance overhead of a Docker container, as
all venv will do is create a unique folder for the dependencies you will
install natively on your operating system.
$ cd VIRUS-MVP
$ python3 -m venv myenv
$ source myenv/bin/activate
(myenv) $ pip install -r requirements.txt
3. Run the application
(myenv) $ python app.py
Go to http://0.0.0.0:8050/.
Note: Run the app from the root project directory to ensure all assets (e.g., JavaScript) load correctly.
Docker installation steps
It is a relatively simple setup. Just make sure you have Docker installed.
$ docker-compose build
$ docker-compose up
Warning: our docker setup bind mounts the host socket to the container. You should use a socket proxy prior to deployment.
One currently unresolved issue: If you upload a file while the application
is deployed through Docker, and then later attempt to upload a file while the
application is deployed natively, the application will likely run into
permission issues related to the nf-ncov-voc cache. You can fix this by
removing all cache files in the nf-ncov-voc/ directory:
$ rm -r results work .nextflow .nextflow.log* capsule framework plugins secrets tmp
You may have to use sudo.
Usage
A navbar at the top of the application has links to both this repository and the underlying genomics workflow. Clicking the "TUTORIAL" link will display this README in-app.

A legend at the top of application provides a detailed explanation of the heatmap view.

Heatmap view
The left axis encodes viral lineages. Lineages belonging to VOC are in bold, and lineages belonging to VOI are in italics. Actively circulating lineages are denoted with ⚠️.
The right axis encodes the number of genomic sequences analyzed for each lineage.
The top axis encodes the nucleotide position of lineage mutations, with respect to the reference genome.
The bottom axis encodes the amino acid position of lineage mutations, in the following format:
Genic mutations: {GENE}.{AMINO ACID POSITION WITHIN THAT GENE}
Intergenic: {NEAREST DOWNSTREAM GENE}. {NUMBER OF NUCLEOTIDES UPSTREAM}
The heatmap cells encode the presence of mutations. The color of these cells encodes mutation frequency. Insertions, deletions, functional mutations, and lineages with a sample size of one are encoded as follows:

Hovering over cells displays detailed mutation information. Clicking cells opens a modal with detailed mutation function descriptions, and their citations.

Histogram
The histogram bars encode the total number of mutations across all visualized lineages every 100 nucleotide positions. The little black bar at the bottom of the histogram view indicates which section of genome you have currently scrolled to in the heatmap.

To navigate the heatmap more quickly, you can click on the genes in the histogram view. Clicking a gene in the histogram will automatically scroll the heatmap to the left-most mutation in that gene.
Toolbar
There are several tools in the top of the interface that can be used to edit the visualization.

Clicking the select groups button opens a modal that allows you to rearrange and hide variants.

Clicking the upload button allows you to upload a FASTA or VCF file, which will then be processed by nf-ncov-voc to generate a new GVF file, and then rendered onto the heatmap. You can find examples of files users can upload in test_data/.
You must have Nextflow and Conda installed to upload files.
Your first upload will take a while. Subsequent uploads will be faster.

Clicking the download dropdown menu allows you to download surveillance reports generated by nf-ncov-voc for the lineages visible in the heatmap. You can also download a mutation index JSON file, which provides parsable data on all the information rendered in the heatmap.

Clicking the "search for mutations" button allows you to search for specific mutations by name, and automatically scroll the heatmap to that mutation.

Typing a nucleotide position in the "jump to nucleotide position" textbox will automatically scroll the heatmap to that specific nucleotide position.

The mutation frequency slider allows you to filter heatmap cells by mutation frequency.

The clade defining switch allows you to filter in and out heatmap cells corresponding to non-clade defining mutations.
Adapting the workflow to new a virus
To adapt this workflow for a new virus, users must provide a defined set of input files. These are processed through two modular components: - 🧪 Genomics Workflow - 🧬 Functional Annotation
The figure below summarizes all required inputs (🟩), outputs generated by the workflow (🟦), and hybrid files that require both manual curation and automated generation (🟪).
TODO commit asset

✅ Required user-provided files (🟩):
TODO fill out example links
Reference genome files
GFF— Reference genome annotations (e.g., from NCBI RefSeq or Ensembl)FASTA— Reference genome sequence
Ontology file
GenEpiOorGGO— Ontologies for mapping gene functions
Viral genome sequences
Functional annotations (optional)
TSV— Functional annotation file in Pokay or custom format
🟪 Most Important File: genome_config.JSON
⚠️ Critical for customization
genome_config.JSONis the most important file for adapting the workflow to a new virus. It acts as the central resource file connecting reference features, ontologies, and annotations.
- Partially auto-generated using provided inputs and helper scripts
- Manual curation is required for accurate ontology term mapping, virus-specific details, and layout configuration
- This file controls how genomic features and annotations are interpreted and visualized across all components of the framework
⚙️ Script-generated files (🟦):
GVF,Functional Annotation TSV, metadataTSVs, and summaryPDFs- These are produced automatically by the workflow after running the Genomics Workflow and Functional Annotation pipelines
Future directions
We plan to build an API, which users can call to retrieve a text-based representation of the information rendered in VIRUS-MVP. We have begun this process by introducing the ability to download a mutation index JSON file, as mentioned above.
We also plan to modify the interface for a more intuitive display of segmented
genomes, such as RSV or Influenza. You can track our progress on the
segmented_demo branch. Basically, we will provide a dropdown that allows
users to render discrete segments of the genome, one at a time.
Support
We encourage you to add any problems with the application as an issue in this repository, but you can also email us at contact@cidgoh.ca.
Authors and acknowledgement
@ivansg44: Visualization development
@anwarMZ: Genomic analysis
@Anoosha-Sehar: Functional annotation
@miseminger: Functional annotation and data standardization
William Hsiao, Gary Van Domselaar, and Paul Gordon
The results here are in whole or part based upon data hosted at the Canadian VirusSeq Data Portal: https://virusseq-dataportal.ca/. We wish to acknowledge the following organisations/laboratories for contributing data to the Portal: Canadian Public Health Laboratory Network (CPHLN), CanCOGGeN VirusSeq, Saskatchewan - Roy Romanow Provincial Laboratory(RRPL), Nova Scotia Health Authority, Alberta ProvLab North(APLN), Queen's University / Kingston Health Sciences Centre, National Microbiology Laboratory(NML), BCCDC Public Health Laboratory, Public Health Ontario(PHO), Newfoundland and Labrador - Eastern Health, Unity Health Toronto, Ontario Institute for Cancer Research(OICR), Manitoba Cadham Provincial Laboratory, and Manitoba Cadham Provincial Laboratory.
License
Owner
- Name: Centre for Infectious Disease and One Health
- Login: cidgoh
- Kind: organization
- Email: contact@cidgoh.ca
- Location: Canada
- Website: www.cidgoh.ca
- Twitter: cidgoh
- Repositories: 14
- Profile: https://github.com/cidgoh
Hsiao Laboratory at Simon Fraser University
GitHub Events
Total
- Create event: 14
- Release event: 5
- Issues event: 4
- Watch event: 8
- Delete event: 7
- Issue comment event: 2
- Push event: 59
- Pull request event: 14
Last Year
- Create event: 14
- Release event: 5
- Issues event: 4
- Watch event: 8
- Delete event: 7
- Issue comment event: 2
- Push event: 59
- Pull request event: 14
Issues and Pull Requests
Last synced: 6 months ago
All Time
- Total issues: 3
- Total pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 19 days
- Total issue authors: 1
- Total pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.2
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 3
- Pull requests: 5
- Average time to close issues: N/A
- Average time to close pull requests: 19 days
- Issue authors: 1
- Pull request authors: 1
- Average comments per issue: 0.0
- Average comments per pull request: 0.2
- Merged pull requests: 5
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- ivansg44 (14)
- anwarMZ (3)
Pull Request Authors
- ivansg44 (13)
Top Labels
Issue Labels
Pull Request Labels
Dependencies
- Flask-Caching ==1.10.1
- dash ==1.20.0
- dash-bootstrap-components ==0.11.3
- gunicorn ==20.1.0
- pandas ==1.2.0
- appleboy/ssh-action master composite
- actions/checkout v2 composite
- continuumio/anaconda3 latest build
- jonasal/nginx-certbot 2.4.0-alpine build