This activity demonstrates the Wilcoxon Rank-Sum Test for identifying differential gene expression (DEG). We’ll analyze expression data for two genes: one differentially expressed gene (DEG) and one non-differentially expressed gene (non-DEG). We will also adjust for multiple testing using the False Discovery Rate (FDR).
We start with a dataset containing expression levels for two genes across two conditions (Control and Treated), with 5 replicates per condition.
# Provided data
group <- rep(c("Control", "Treated"), each = 5) # Conditions
gene_deg <- c(5.2, 5.5, 5.1, 5.3, 5.4, 7.2, 7.1, 7.3, 7.4, 7.5) # DEG
gene_non_deg <- c(5.1, 5.2, 5.3, 5.4, 5.5, 5.1, 5.2, 5.3, 5.4, 5.5) # Non-DEG
# Create a data frame
expression_data <- data.frame(
Group = group,
Gene_DEG = gene_deg,
Gene_Non_DEG = gene_non_deg
)
head(expression_data)
## Group Gene_DEG Gene_Non_DEG
## 1 Control 5.2 5.1
## 2 Control 5.5 5.2
## 3 Control 5.1 5.3
## 4 Control 5.3 5.4
## 5 Control 5.4 5.5
## 6 Treated 7.2 5.1
We perform the Wilcoxon Rank-Sum Test to compare expression levels between the two groups for each gene.
# Wilcoxon test for DEG
test_deg <- wilcox.test(Gene_DEG ~ Group, data = expression_data)
test_deg
##
## Wilcoxon rank sum exact test
##
## data: Gene_DEG by Group
## W = 0, p-value = 0.007937
## alternative hypothesis: true location shift is not equal to 0
# Wilcoxon test for Non-DEG
test_non_deg <- wilcox.test(Gene_Non_DEG ~ Group, data = expression_data)
## Warning in wilcox.test.default(x = DATA[[1L]], y = DATA[[2L]], ...): cannot
## compute exact p-value with ties
test_non_deg
##
## Wilcoxon rank sum test with continuity correction
##
## data: Gene_Non_DEG by Group
## W = 12.5, p-value = 1
## alternative hypothesis: true location shift is not equal to 0
When testing multiple genes, we need to adjust for multiple testing to control the False Discovery Rate (FDR). Here, we demonstrate FDR correction for the two genes.
# Collect p-values
p_values <- c(test_deg$p.value, test_non_deg$p.value)
names(p_values) <- c("Gene_DEG", "Gene_Non_DEG")
# Apply FDR correction
adjusted_p_values <- p.adjust(p_values, method = "BH")
adjusted_p_values
## Gene_DEG Gene_Non_DEG
## 0.01587302 1.00000000
library(ggplot2)
# Boxplot for the DEG
ggplot(expression_data, aes(x = Group, y = Gene_DEG, fill = Group)) +
geom_boxplot() +
labs(
title = "A Differentially Expressed Gene",
x = "Condition",
y = "Expression Level"
) +
theme_classic()
# Boxplot for the Non-DEG
ggplot(expression_data, aes(x = Group, y = Gene_Non_DEG, fill = Group)) +
geom_boxplot() +
labs(
title = "A Non-Differentially Expressed Gene",
x = "Condition",
y = "Expression Level"
) +
theme_classic()