https://github.com/apetkau/comp7944-project
Project on visualizing association rules extracted from covid-19 data.
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (6.0%) to scientific vocabulary
Repository
Project on visualizing association rules extracted from covid-19 data.
Basic Info
Statistics
- Stars: 1
- Watchers: 2
- Forks: 1
- Open Issues: 5
- Releases: 0
Metadata Files
README.md
COMP 7944 Project
This project involves applying data mining techniques on COVID-19 data to extract association rules and visualize these rules as a network. This repository includes code and supplementary materials for the project and is available online at https://github.com/apetkau/comp7944-project.
Authors
Prepared for COMP 7944 at the University of Manitoba on April 23, 2020 by:
- Aaron Petkau
- Hosne Al Walid Shaiket
- Rahatul Amin Ananto
Interative visualizations
Below lists all the interactive versions of our visualization of association rules (as a network). For each dataset we produced two networks, one where nodes are colored by confidence and the other where nodes are colored by lift. Zooming or panning can be accomplished using the mouse, and network nodes can be dragged and dropped.
- Symptoms rules network
- Geographic date network
- Geographic age network
- SNV/Genomics network
Data sources
Copies of the two datasets we are using can be found at:
- Epidemiological dataset (data) - (alternative link)
- Genomics (SNV) dataset (data) - (alternative link)
Defining transaction datasets for mining
We processed the above datasets to define sets of items (transactions) for use with the Apriori algorithm for finding frequent itemsets and association rules. We constructed 4 separate transactional itemsets for further processing. Jupyter notebooks for processing this data is given below.
- Epidemiological dataset (code)
- Used to define the Symptoms, Geographic date, and Geographic age transactional itemsets for processing.
- Genomics/SNV dataset (code)
- Used to define the SNV/genomics dataset transactions.
Association rule mining and visualization
We next applied data mining techniques to find association rules in the above datasets and visualize the rules. Jupyter notebooks for this process are given below.
- Symptoms dataset (code)
- Geographic date dataset (code)
- Geographic age dataset (code)
- Genomics/SNV dataset (code)
Software
To reproduce this analysis you can use the following instructions to install dependencies using conda (though we note some additional R packages may need to be installed manually).
- Install Miniconda used for software dependency management.
- Install dependencies (from
dependencies.condafile) using the command:
bash
conda create --name datamining --file dependencies.conda
- Activate the conda environment with installed software:
bash
conda activate datamining
- Run Jupyter lab.
bash
jupyter lab
You should now be able to load up the Juptyer notebooks and work with them.
License
The source data for this project (under the data/ directory) is redistributed under the respective licenses of the original providers. The code in this project is distributed under the Apache 2.0 license.
Owner
- Name: Aaron Petkau
- Login: apetkau
- Kind: user
- Company: Public Health Agency of Canada
- Repositories: 70
- Profile: https://github.com/apetkau
Bioinformatician with the Public Health Agency of Canada.
GitHub Events
Total
Last Year
Issues and Pull Requests
Last synced: about 1 year ago
All Time
- Total issues: 5
- Total pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Total issue authors: 1
- Total pull request authors: 0
- Average comments per issue: 0.4
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Past Year
- Issues: 0
- Pull requests: 0
- Average time to close issues: N/A
- Average time to close pull requests: N/A
- Issue authors: 0
- Pull request authors: 0
- Average comments per issue: 0
- Average comments per pull request: 0
- Merged pull requests: 0
- Bot issues: 0
- Bot pull requests: 0
Top Authors
Issue Authors
- walid-shaiket (5)