swat

SWAT : Sliding Window Association Test

https://github.com/taehojo/swat

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

✓
CITATION.cff file
Found CITATION.cff file
✓
codemeta.json file
Found codemeta.json file
○
.zenodo.json file
○
DOI references
○
Academic publication links
○
Academic email domains
○
Institutional organization owner
○
JOSS paper metadata
○
Scientific vocabulary similarity
Low similarity (15.9%) to scientific vocabulary

Last synced: 10 months ago · JSON representation ·

Repository

SWAT : Sliding Window Association Test

Basic Info

Host: GitHub
Owner: taehojo
Language: Python
Default Branch: master
Size: 1.6 MB

Statistics

Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
Releases: 0

Created almost 3 years ago · Last pushed about 1 year ago

Metadata Files

Readme Citation

SWAT (Sliding Window Association Test)

Introduction

SWAT (Sliding Window Association Test) is a tool for Whole Genome Sequencing (WGS) analysis using machine learning. It's a newly developed Python-based tool that aims to provide a robust and efficient way to analyze high-dimensional genomic data.

SWAT is designed to identify phenotype-related single nucleotide polymorphisms (SNPs), making it particularly useful for developing accurate disease classification models. The tool also includes a sophisticated imputer which is capable of automatically filling in missing data, hence improving the quality of the analysis.

You can also access the web implementation of this tool at SWAT-web.

Installation

To install and run this tool, follow these steps:

Clone this repository to your local machine:

bash git clone https://github.com/taehojo/SWAT.git

Navigate to the project directory:

bash cd SWAT

Install the required Python packages. It's recommended to do this in a virtual environment:

bash pip install -r requirements.txt

Requirements

Python 3.8 or higher

Usage

The tool can be run from the command line with the following syntax:

```bash python main.py [inputfile] --win [windowsize] --imputation [imputationmethod] --numresults [numtopresults] --numjobs [numparalleljobs] --classifier [classifier] --name [outputfilename] --WGSmerge [mergedfilepath] --WGSselect --fastrun --noplots --noapi

```

where:

[input_file] is the path to the input data file. This parameter is required.
[window_size] is the window size for analysis. The default size is 200.
[imputation_method] is the method employed to handle missing data. The options include "simple", "1nn", "5nn", or "10nn". "simple" stands for mean imputation, and "1nn", "5nn", "10nn" denote k-Nearest Neighbors method with k being 1, 5, and 10 respectively. The default method is "5nn".
[num_top_results] determines the number of top results to output. The default is 20.
[num_parallel_jobs] specifies the number of jobs to run in parallel. -1 means utilizing all processors. The default is to use all processors.
[classifier] indicates the classifier to use. Choose "rf" for RandomForest and "dl" for Deep Learning. The default is "rf".
[output_file_name] allows to choose a name for the output files instead of the timestamp.
[merged_file_path] is the path to a CSV file from which the script can load top accuracies and continue the analysis.
--WGS_select is an option to have the script save top accuracies to a CSV file for later use.
--fast_run is an option to execute the script only with the RandomForest classifier without creating plot images.
--no_plots is an option to prevent the creation of plot images.
--no_api is an option to prevent the script from making API calls to get SNP details.

Execution example: bash python main.py sample/APOE_LD_Block.csv

This command initiates the SNP analysis and stores the results in the 'results' directory. The outcomes include CSV files with the top N features and accuracy results, and if not suppressed, PNG files depicting accuracies and feature importances. Here N refers to the number of top results specified.

To execute the script for WGS files, you can use the provided bash script as follows: bash run_swat.sh [input_file] [chunk_size] For example: bash ./run_swat.sh sample/APOE_LD_Block.csv 1000

This will handle large WGS files by breaking them into smaller chunks, running the SNP analysis on each chunk, and then merging the results.

:bookmark: Example of SWAT application:

Jo, Taeho, et al. "Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification." Briefings in Bioinformatics 23.2 (2022)

Owner

Name: Taeho Jo
Login: taehojo
Kind: user
Location: Indiana, USA
Company: Indiana University School of Medicine

Website: https://taehojo.github.io
Repositories: 4
Profile: https://github.com/taehojo

Computational Biologist, Ph.D

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Jo"
  given-names: "Taeho"
  orcid: "https://orcid.org/0000-0003-1765-5735"
title: "SWAT:Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification"
version: 1.0.0
date-released: 2022-2-20
url: " https://www.github.com/taehojo/SWAT"

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Open Source Science