https://github.com/animesh/deseq2-multibatch
Batch Correction and Adaptive Contrast Calling in Complex RNA-seq Experimental Designs Using DESeq2
Science Score: 13.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
○CITATION.cff file
-
○codemeta.json file
-
○.zenodo.json file
-
✓DOI references
Found 2 DOI reference(s) in README -
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (11.7%) to scientific vocabulary
Last synced: 6 months ago
·
JSON representation
Repository
Batch Correction and Adaptive Contrast Calling in Complex RNA-seq Experimental Designs Using DESeq2
Basic Info
- Host: GitHub
- Owner: animesh
- Language: R
- Default Branch: main
- Size: 5.62 MB
Statistics
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
- Releases: 0
Fork of julienroyulaval/DESeq2-MultiBatch
Created 10 months ago
· Last pushed 6 months ago
https://github.com/animesh/DESeq2-MultiBatch/blob/main/
# DESeq2-MultiBatch
Batch Correction for Multi-Factorial RNA-seq Experiments
# Context
Here, you will find the main steps to reproduce the correction method implemented in "DESeq2-MultiBatch: Batch Correction for Multi-Factorial RNA-seq Experiments".
If used in research, please cite:
Roy, J., Monthony, A. S., Torkamaneh, D. (2025). DESeq2-MultiBatch: Batch Correction for Multi-Factorial RNA-seq Experiments. https://doi.org/10.1101/2025.04.20.649392
For detailed steps and comparisons with other batch correction tools, please refer to the HTML file provided in this repository.
It also explains methods of contrast calling to extract meaningful results in DESeq2 using various designs applied to the experiment outlined below, with examples and figures.
Readers will also find various ressources to better understand design matrices and apply DESeq2 to their data at the end of the HTML.
Note : The HTML is being updated and will be reuploaded shortly.
For more information about DESeq2 and the standard RNA-seq analysis steps, refer to:
- [Analyzing RNA-seq data with DESeq2](https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html) (Love & al., 2025)
- [RNA-seq workflow: gene-level exploratory analysis and differential expression](https://bioconductor.org/packages/release/workflows/vignettes/rnaseqGene/inst/doc/rnaseqGene.html) (Love & al., 2019)
## Experimental design
The simulated dataset used in this study consists of 1,000 genes across 48 samples. The dataset is designed to be computationally efficient, with confirmed scalability to larger real-life datasets.
The experimental design involves two batches, modeling gene expression in dioecious plants (e.g., XX/XY system) undergoing sex-specific treatments at two distinct time points: Day 0 (vegetative) and Day 1 (early flowering). Due to sex-specific treatments to alter flower sex, male and female plants were separated across batches:
Batch A: Female controls (n=6) and treated males (n=6)
Batch B: Male controls (n=6) and treated females (n=6)
The treatments were applied following Day 0 measurements but before Day 1 measurements, allowing Day 0 data to serve as a baseline for controlling batch effects. Samples were collected across two genotypes, with three biological replicates per combination of sex, treatment, genotype, and time point.
# Single and double batch correction
## Load data
```
# load metadata
coldata <- read.csv(file = "coldata.csv")
coldata[,2:6] <- lapply(coldata[,2:6], as.factor)
# load counts
counts <- read.csv(file = "counts.csv", row.names = 1)
```
Samples are rows in the ColData, with factors as columns. In the count file, rows are genes with samples as columns.
## Run DESeq2
### Single batch design
```
library(DESeq2)
design_single <- ~ Day + Batch + Genotype + Sex + Sex:Day + Treatment
dds_single <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = design_single)
dds_single <- DESeq(dds_single)
```
### Double batch design (interaction with sex)
```
design_double <- ~ Day + Batch + Genotype + Sex + Sex:Batch + Sex:Day + Treatment
dds_double <- DESeqDataSetFromMatrix(countData = counts, colData = coldata, design = design_double)
dds_double <- DESeq(dds_double)
```
## Call for contrasts and extract scaling factors
```
resultsNames(dds_single)
resultsNames(dds_double)
# single
batch <- results(dds_single, name = "Batch_b_vs_a", independentFiltering = FALSE)
batch <- as.data.frame(batch)
batch$Scaling_Factor_Batch_A <- sqrt(2^(batch$log2FoldChange))
batch$Scaling_Factor_Batch_B <- 1 / batch$Scaling_Factor_Batch_A
# double
batch_xx <- results(dds_double, name = "Batch_b_vs_a", independentFiltering = FALSE)
batch_xx <- as.data.frame(batch_xx)
batch_xy <- results(dds_double, contrast=list(c("Batch_b_vs_a","Batchb.Sexxy")), independentFiltering = FALSE)
batch_xy <- as.data.frame(batch_xy)
batch_xx$Scaling_Factor_Batch_A <- sqrt(2^(batch_xx$log2FoldChange))
batch_xx$Scaling_Factor_Batch_B <- 1 / batch_xx$Scaling_Factor_Batch_A
batch_xy$Scaling_Factor_Batch_A <- sqrt(2^(batch_xy$log2FoldChange))
batch_xy$Scaling_Factor_Batch_B <- 1 / batch_xy$Scaling_Factor_Batch_A
```
## Apply corrections
```
# single
normalized_counts_df <- as.data.frame(counts(dds_single, normalized = TRUE))
scaled_counts_single <- normalized_counts_df
for (Sample_ID in colnames(normalized_counts_df)) {
Batch <- coldata[Sample_ID, "Batch"]
if (Batch == "a") {
scaling_factors <- batch$Scaling_Factor_Batch_A
} else if (Batch == "b") {
scaling_factors <- batch$Scaling_Factor_Batch_B
}
scaled_counts_single[, Sample_ID] <- normalized_counts_df[, Sample_ID] * scaling_factors
}
# double
normalized_counts_df <- as.data.frame(counts(dds_double, normalized = TRUE))
scaled_counts_double <- normalized_counts_df
for (Sample_ID in colnames(normalized_counts_df)) {
Sex <- coldata[Sample_ID, "Sex"]
Batch <- coldata[Sample_ID, "Batch"]
if (Sex == "xx" && Batch == "a") {
scaling_factors <- batch_xx$Scaling_Factor_Batch_A
} else if (Sex == "xx" && Batch == "b") {
scaling_factors <- batch_xx$Scaling_Factor_Batch_B
} else if (Sex == "xy" && Batch == "a") {
scaling_factors <- batch_xy$Scaling_Factor_Batch_A
} else if (Sex == "xy" && Batch == "b") {
scaling_factors <- batch_xy$Scaling_Factor_Batch_B
}
scaled_counts_double[, Sample_ID] <- normalized_counts_df[, Sample_ID] * scaling_factors
}
```
Scaled counts can be rounded if needed, but they are not meant to be reused for differential expression analysis with DESeq2 or other tools.
While the contrasts obtained by using DESeq2 on the scaled counts with a design that excludes the *batch* factor will provide the same log2 fold changes as the original DESeq2 analysis, the *p* values and the adjusted *p* values will be skewed, leading to an increase in false positives.
For more detail, please refer to the HTML file.
Owner
- Name: Ani
- Login: animesh
- Kind: user
- Location: Norway
- Company: Norwegian University of Science and Technology
- Website: https://www.fuzzylife.org
- Twitter: animesh1977
- Repositories: 749
- Profile: https://github.com/animesh
A medical graduate from Delhi University with post-graduation in bioinformatics from Jawaharlal Nehru University, India.
GitHub Events
Total
- Push event: 2
Last Year
- Push event: 2