https://github.com/asadprodhan/gwas_training
Beginner training course for genome-wide association mapping using R and publicly available datasets
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
✓codemeta.json file
Found codemeta.json file -
○.zenodo.json file
-
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (10.3%) to scientific vocabulary
Repository
Beginner training course for genome-wide association mapping using R and publicly available datasets
Basic Info
- Host: GitHub
- Owner: asadprodhan
- License: gpl-3.0
- Default Branch: main
- Size: 16.6 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
README.md
3-Day Beginner Training Course: R for Rice Genome-Wide Association Mapping
📚 Overview
This repository contains materials for a 3-day beginner-level training course on performing Genome-Wide Association Studies (GWAS) in rice using R. The course covers essential data preprocessing, visualization, and analysis workflows, utilizing publicly available datasets.
📅 Course Outline
Day 1: Introduction to R and GWAS Datasets
- Objective: Familiarize with R basics, loading datasets, and understanding GWAS datasets.
- Topics Covered:
- R basics and syntax
- Loading and exploring phenotype and genotype datasets
- R basics and syntax
- Files Used:
phenotype_data.csvgenotype_data.csvday1_intro.R
Script Highlights:
- Load and explore phenotype data.
- Load and explore genotype data.
- Understand data structure and contents.
Day 2: Data Analysis and Visualization
- Objective: Learn data visualization techniques with R and create a Manhattan Plot.
- Topics Covered:
- Data exploration and preprocessing
- Visualization using
ggplot2
- Data exploration and preprocessing
- Files Used:
manhattan_data.csvday2_visualization.R
Script Highlights:
- Create a Manhattan Plot to visualize SNP associations.
- Interpret GWAS visualization outputs.
Day 3: Performing GWAS with GAPIT
- Objective: Conduct GWAS using the GAPIT (Genome Association and Prediction Integrated Tool) package.
- Topics Covered:
- Introduction to GAPIT
- Running GWAS analysis
- Interpreting GWAS outputs
- Introduction to GAPIT
- Files Used:
phenotype_data.csvgenotype_data.csvday3_gwas_pipeline.R
Script Highlights:
- Run a GWAS pipeline using GAPIT.
- Generate and interpret GWAS summary statistics.
🛠️ Installation and Setup
Prerequisites
Ensure you have the following installed:
- R (≥4.0)
- RStudio (Recommended)
- Required R Packages:
r
install.packages(c("ggplot2", "GAPIT"))
Automated Sample Scripts and Datasets for Rice GWAS Training
This repository contains automated sample scripts and dataset recommendations for a 3-day training program on rice Genome-Wide Association Studies (GWAS) using R.
📅 Day 1: Introduction to R and GWAS Concepts
Session 1: Getting Started with R
Script 1: R Basics & Data Import
```r
Basic R Operations
x <- 5 y <- 10 sum <- x + y print(sum)
Install necessary packages
install.packages(c("readr", "dplyr", "ggplot2"))
Load dataset (example dataset from Rice SNP-Seek Database)
library(readr) phenotypedata <- readcsv("phenotypedata.csv") head(phenotypedata) ```
Dataset: Example phenotype data (e.g., plant height, grain yield) in CSV format.
Session 2: Introduction to GWAS
Script 2: Understanding Phenotypic and Genotypic Data
```r
Explore phenotype data
summary(phenotypedata) str(phenotypedata)
Sample Genotypic Data (Example SNP data)
genotypedata <- readcsv("genotypedata.csv") head(genotypedata) ```
Datasets:
- phenotype_data.csv: Sample phenotype data (e.g., Plant Height, Yield).
- genotype_data.csv: Example SNP dataset (e.g., marker names, genotypes).
Session 3: Exploring Rice GWAS Data
Script 3: Data Cleaning and Preprocessing
```r
Clean Phenotype Data
library(dplyr) phenotypedata <- phenotypedata %>% filter(!is.na(Trait1)) %>% mutate(Group = as.factor(Group))
Clean Genotype Data
genotypedata <- genotypedata %>% filter(!is.na(SNP1)) ```
📅 Day 2: Data Analysis and Visualization in R
Session 1: Data Manipulation in R
Script 4: Data Wrangling
```r
Summarize phenotype data
library(dplyr) summarystats <- phenotypedata %>% groupby(Group) %>% summarise(meanheight = mean(Height, na.rm = TRUE))
print(summary_stats) ```
Session 2: Phenotypic Data Analysis
Script 5: Descriptive Statistics and Correlation
```r
Basic Statistics
summary(phenotype_data$Height)
Correlation
cormatrix <- cor(phenotypedata[, c("Height", "Yield")], use = "complete.obs") print(cor_matrix) ```
Session 3: Data Visualization
Script 6: Manhattan and QQ Plots
```r library(ggplot2)
Sample Manhattan Plot Data
manhattandata <- data.frame( SNP = paste0("SNP", 1:1000), Chromosome = sample(1:12, 1000, replace = TRUE), Position = runif(1000, 1, 1e6), Pvalue = runif(1000, 0, 0.05) )
Manhattan Plot
ggplot(manhattandata, aes(x = Position, y = -log10(Pvalue), color = as.factor(Chromosome))) + geompoint() + thememinimal() + labs(title = "Manhattan Plot", x = "Genomic Position", y = "-log10(P-value)") ```
Dataset: manhattan_data.csv (or generated dynamically in the script).
📅 Day 3: Performing GWAS in R
Session 1: GWAS Analysis Using R Packages
Script 7: GWAS with GAPIT
```r
Install GAPIT
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install("GAPIT")
library(GAPIT)
Example GWAS Run
gwasresult <- GAPIT( Y = phenotypedata, # Phenotype data G = genotype_data, # Genotype data PCA.total = 3 # Number of principal components ) ```
Session 2: Interpreting GWAS Results
Script 8: SNP Annotation and Result Interpretation
```r
Filter significant SNPs
significantsnps <- gwasresult$GWAS %>% filter(P.value < 0.05)
Display top SNPs
head(significant_snps) ```
Session 3: Practical GWAS Workflow
Script 9: Full GWAS Pipeline Automation
```r
Load necessary libraries
library(GAPIT) library(dplyr)
Run GWAS
gwasresult <- GAPIT( Y = phenotypedata, G = genotype_data, PCA.total = 3 )
Visualize results
library(qqman) manhattan(gwasresult$GWAS, col = c("blue4", "orange3")) qq(gwasresult$GWAS$P.value) ```
Datasets:
- phenotype_data.csv
- genotype_data.csv
📂 Dataset Files
phenotype_data.csv: Example phenotype data.genotype_data.csv: Example genotype SNP data.manhattan_data.csv: Example GWAS plot data.
🛠️ Dependencies
- R version >= 4.0
- R Packages:
readr,dplyr,ggplot2,GAPIT,qqman
📖 References
- GAPIT documentation: https://zzlab.net/GAPIT/
- SNP-Seek Database: https://snp-seek.irri.org/
🤝 Contributing
Contributions are welcome! Please open an issue or pull request.
📜 License
This project is licensed under the MIT License.
Happy analyzing! 🌾📊✨
Owner
- Name: Asad Prodhan
- Login: asadprodhan
- Kind: user
- Location: Perth, Australia
- Company: Department of Primary Industries and Regional Development
- Website: www.linkedin.com/in/asadprodhan
- Twitter: Asad_Prodhan
- Repositories: 2
- Profile: https://github.com/asadprodhan
Laboratory Scientist at DPIRD. My work involves Oxford Nanopore Sequencing and Bioinformatics for pest and pathogen diagnosis.
GitHub Events
Total
- Push event: 2
- Create event: 2
Last Year
- Push event: 2
- Create event: 2