swat

SWAT : Sliding Window Association Test

https://github.com/taehojo/swat

Science Score: 31.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
    Found CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (15.9%) to scientific vocabulary
Last synced: 10 months ago · JSON representation ·

Repository

SWAT : Sliding Window Association Test

Basic Info
  • Host: GitHub
  • Owner: taehojo
  • Language: Python
  • Default Branch: master
  • Size: 1.6 MB
Statistics
  • Stars: 2
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created almost 3 years ago · Last pushed about 1 year ago
Metadata Files
Readme Citation

README.md

SWAT (Sliding Window Association Test)

Introduction

SWAT (Sliding Window Association Test) is a tool for Whole Genome Sequencing (WGS) analysis using machine learning. It's a newly developed Python-based tool that aims to provide a robust and efficient way to analyze high-dimensional genomic data.

SWAT is designed to identify phenotype-related single nucleotide polymorphisms (SNPs), making it particularly useful for developing accurate disease classification models. The tool also includes a sophisticated imputer which is capable of automatically filling in missing data, hence improving the quality of the analysis.

You can also access the web implementation of this tool at SWAT-web.

Installation

To install and run this tool, follow these steps:

  1. Clone this repository to your local machine:

bash git clone https://github.com/taehojo/SWAT.git

  1. Navigate to the project directory:

bash cd SWAT

  1. Install the required Python packages. It's recommended to do this in a virtual environment:

bash pip install -r requirements.txt

Requirements

  • Python 3.8 or higher

Usage

The tool can be run from the command line with the following syntax:

```bash python main.py [inputfile] --win [windowsize] --imputation [imputationmethod] --numresults [numtopresults] --numjobs [numparalleljobs] --classifier [classifier] --name [outputfilename] --WGSmerge [mergedfilepath] --WGSselect --fastrun --noplots --noapi

```

where:

  • [input_file] is the path to the input data file. This parameter is required.
  • [window_size] is the window size for analysis. The default size is 200.
  • [imputation_method] is the method employed to handle missing data. The options include "simple", "1nn", "5nn", or "10nn". "simple" stands for mean imputation, and "1nn", "5nn", "10nn" denote k-Nearest Neighbors method with k being 1, 5, and 10 respectively. The default method is "5nn".
  • [num_top_results] determines the number of top results to output. The default is 20.
  • [num_parallel_jobs] specifies the number of jobs to run in parallel. -1 means utilizing all processors. The default is to use all processors.
  • [classifier] indicates the classifier to use. Choose "rf" for RandomForest and "dl" for Deep Learning. The default is "rf".
  • [output_file_name] allows to choose a name for the output files instead of the timestamp.
  • [merged_file_path] is the path to a CSV file from which the script can load top accuracies and continue the analysis.
  • --WGS_select is an option to have the script save top accuracies to a CSV file for later use.
  • --fast_run is an option to execute the script only with the RandomForest classifier without creating plot images.
  • --no_plots is an option to prevent the creation of plot images.
  • --no_api is an option to prevent the script from making API calls to get SNP details.

Execution example: bash python main.py sample/APOE_LD_Block.csv

This command initiates the SNP analysis and stores the results in the 'results' directory. The outcomes include CSV files with the top N features and accuracy results, and if not suppressed, PNG files depicting accuracies and feature importances. Here N refers to the number of top results specified.

To execute the script for WGS files, you can use the provided bash script as follows: bash run_swat.sh [input_file] [chunk_size] For example: bash ./run_swat.sh sample/APOE_LD_Block.csv 1000

This will handle large WGS files by breaking them into smaller chunks, running the SNP analysis on each chunk, and then merging the results.

:bookmark: Example of SWAT application:

Jo, Taeho, et al. "Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification." Briefings in Bioinformatics 23.2 (2022)

Owner

  • Name: Taeho Jo
  • Login: taehojo
  • Kind: user
  • Location: Indiana, USA
  • Company: Indiana University School of Medicine

Computational Biologist, Ph.D

Citation (citation.cff)

cff-version: 1.2.0
message: "If you use this software, please cite it as below."
authors:
- family-names: "Jo"
  given-names: "Taeho"
  orcid: "https://orcid.org/0000-0003-1765-5735"
title: "SWAT:Deep learning-based identification of genetic variants: application to Alzheimer’s disease classification"
version: 1.0.0
date-released: 2022-2-20
url: " https://www.github.com/taehojo/SWAT"

GitHub Events

Total
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 1
Last Year
  • Issues event: 5
  • Watch event: 1
  • Issue comment event: 2
  • Push event: 1