https://github.com/asadprodhan/gwas_training

Beginner training course for genome-wide association mapping using R and publicly available datasets

https://github.com/asadprodhan/gwas_training

Science Score: 13.0%

This score indicates how likely this project is to be science-related based on various indicators:

  • CITATION.cff file
  • codemeta.json file
    Found codemeta.json file
  • .zenodo.json file
  • DOI references
  • Academic publication links
  • Academic email domains
  • Institutional organization owner
  • JOSS paper metadata
  • Scientific vocabulary similarity
    Low similarity (10.3%) to scientific vocabulary
Last synced: 10 months ago · JSON representation

Repository

Beginner training course for genome-wide association mapping using R and publicly available datasets

Basic Info
  • Host: GitHub
  • Owner: asadprodhan
  • License: gpl-3.0
  • Default Branch: main
  • Size: 16.6 KB
Statistics
  • Stars: 0
  • Watchers: 1
  • Forks: 0
  • Open Issues: 0
  • Releases: 0
Created over 1 year ago · Last pushed over 1 year ago
Metadata Files
Readme License

README.md

3-Day Beginner Training Course: R for Rice Genome-Wide Association Mapping

📚 Overview

This repository contains materials for a 3-day beginner-level training course on performing Genome-Wide Association Studies (GWAS) in rice using R. The course covers essential data preprocessing, visualization, and analysis workflows, utilizing publicly available datasets.


📅 Course Outline

Day 1: Introduction to R and GWAS Datasets

  • Objective: Familiarize with R basics, loading datasets, and understanding GWAS datasets.
  • Topics Covered:
    • R basics and syntax
    • Loading and exploring phenotype and genotype datasets
  • Files Used:
    • phenotype_data.csv
    • genotype_data.csv
    • day1_intro.R

Script Highlights:
- Load and explore phenotype data.
- Load and explore genotype data.
- Understand data structure and contents.


Day 2: Data Analysis and Visualization

  • Objective: Learn data visualization techniques with R and create a Manhattan Plot.
  • Topics Covered:
    • Data exploration and preprocessing
    • Visualization using ggplot2
  • Files Used:
    • manhattan_data.csv
    • day2_visualization.R

Script Highlights:
- Create a Manhattan Plot to visualize SNP associations.
- Interpret GWAS visualization outputs.


Day 3: Performing GWAS with GAPIT

  • Objective: Conduct GWAS using the GAPIT (Genome Association and Prediction Integrated Tool) package.
  • Topics Covered:
    • Introduction to GAPIT
    • Running GWAS analysis
    • Interpreting GWAS outputs
  • Files Used:
    • phenotype_data.csv
    • genotype_data.csv
    • day3_gwas_pipeline.R

Script Highlights:
- Run a GWAS pipeline using GAPIT.
- Generate and interpret GWAS summary statistics.


🛠️ Installation and Setup

Prerequisites

Ensure you have the following installed: - R (≥4.0) - RStudio (Recommended) - Required R Packages:
r install.packages(c("ggplot2", "GAPIT"))

Automated Sample Scripts and Datasets for Rice GWAS Training

This repository contains automated sample scripts and dataset recommendations for a 3-day training program on rice Genome-Wide Association Studies (GWAS) using R.

📅 Day 1: Introduction to R and GWAS Concepts

Session 1: Getting Started with R

Script 1: R Basics & Data Import

```r

Basic R Operations

x <- 5 y <- 10 sum <- x + y print(sum)

Install necessary packages

install.packages(c("readr", "dplyr", "ggplot2"))

Load dataset (example dataset from Rice SNP-Seek Database)

library(readr) phenotypedata <- readcsv("phenotypedata.csv") head(phenotypedata) ```

Dataset: Example phenotype data (e.g., plant height, grain yield) in CSV format.


Session 2: Introduction to GWAS

Script 2: Understanding Phenotypic and Genotypic Data

```r

Explore phenotype data

summary(phenotypedata) str(phenotypedata)

Sample Genotypic Data (Example SNP data)

genotypedata <- readcsv("genotypedata.csv") head(genotypedata) ```

Datasets: - phenotype_data.csv: Sample phenotype data (e.g., Plant Height, Yield). - genotype_data.csv: Example SNP dataset (e.g., marker names, genotypes).


Session 3: Exploring Rice GWAS Data

Script 3: Data Cleaning and Preprocessing

```r

Clean Phenotype Data

library(dplyr) phenotypedata <- phenotypedata %>% filter(!is.na(Trait1)) %>% mutate(Group = as.factor(Group))

Clean Genotype Data

genotypedata <- genotypedata %>% filter(!is.na(SNP1)) ```


📅 Day 2: Data Analysis and Visualization in R

Session 1: Data Manipulation in R

Script 4: Data Wrangling

```r

Summarize phenotype data

library(dplyr) summarystats <- phenotypedata %>% groupby(Group) %>% summarise(meanheight = mean(Height, na.rm = TRUE))

print(summary_stats) ```


Session 2: Phenotypic Data Analysis

Script 5: Descriptive Statistics and Correlation

```r

Basic Statistics

summary(phenotype_data$Height)

Correlation

cormatrix <- cor(phenotypedata[, c("Height", "Yield")], use = "complete.obs") print(cor_matrix) ```


Session 3: Data Visualization

Script 6: Manhattan and QQ Plots

```r library(ggplot2)

Sample Manhattan Plot Data

manhattandata <- data.frame( SNP = paste0("SNP", 1:1000), Chromosome = sample(1:12, 1000, replace = TRUE), Position = runif(1000, 1, 1e6), Pvalue = runif(1000, 0, 0.05) )

Manhattan Plot

ggplot(manhattandata, aes(x = Position, y = -log10(Pvalue), color = as.factor(Chromosome))) + geompoint() + thememinimal() + labs(title = "Manhattan Plot", x = "Genomic Position", y = "-log10(P-value)") ```

Dataset: manhattan_data.csv (or generated dynamically in the script).


📅 Day 3: Performing GWAS in R

Session 1: GWAS Analysis Using R Packages

Script 7: GWAS with GAPIT

```r

Install GAPIT

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GAPIT")

library(GAPIT)

Example GWAS Run

gwasresult <- GAPIT( Y = phenotypedata, # Phenotype data G = genotype_data, # Genotype data PCA.total = 3 # Number of principal components ) ```


Session 2: Interpreting GWAS Results

Script 8: SNP Annotation and Result Interpretation

```r

Filter significant SNPs

significantsnps <- gwasresult$GWAS %>% filter(P.value < 0.05)

Display top SNPs

head(significant_snps) ```


Session 3: Practical GWAS Workflow

Script 9: Full GWAS Pipeline Automation

```r

Load necessary libraries

library(GAPIT) library(dplyr)

Run GWAS

gwasresult <- GAPIT( Y = phenotypedata, G = genotype_data, PCA.total = 3 )

Visualize results

library(qqman) manhattan(gwasresult$GWAS, col = c("blue4", "orange3")) qq(gwasresult$GWAS$P.value) ```

Datasets: - phenotype_data.csv - genotype_data.csv


📂 Dataset Files

  • phenotype_data.csv: Example phenotype data.
  • genotype_data.csv: Example genotype SNP data.
  • manhattan_data.csv: Example GWAS plot data.

🛠️ Dependencies

  • R version >= 4.0
  • R Packages: readr, dplyr, ggplot2, GAPIT, qqman

📖 References


🤝 Contributing

Contributions are welcome! Please open an issue or pull request.

📜 License

This project is licensed under the MIT License.

Happy analyzing! 🌾📊✨

Owner

  • Name: Asad Prodhan
  • Login: asadprodhan
  • Kind: user
  • Location: Perth, Australia
  • Company: Department of Primary Industries and Regional Development

Laboratory Scientist at DPIRD. My work involves Oxford Nanopore Sequencing and Bioinformatics for pest and pathogen diagnosis.

GitHub Events

Total
  • Push event: 2
  • Create event: 2
Last Year
  • Push event: 2
  • Create event: 2