Introduction

This document guides students through activities to apply the Wilcoxon Rank-Sum Test for differential gene expression analysis. The test is a non-parametric method to compare expression levels between two independent groups, such as control and treated samples.


Activity: Manual Calculation of the Wilcoxon Rank-Sum Test

Simulated Gene Expression Data

We will use a small dataset with expression levels for one gene across two conditions:

# Simulated data
gene1 <- c(5.2, 6.3, 4.8, 5.6, 7.0, 10.2, 11.1, 8.3)  # Expression levels
group <- rep(c("Control", "Treated"), each = 4)
data <- data.frame(Condition = group, Expression = gene1)
data
##   Condition Expression
## 1   Control        5.2
## 2   Control        6.3
## 3   Control        4.8
## 4   Control        5.6
## 5   Treated        7.0
## 6   Treated       10.2
## 7   Treated       11.1
## 8   Treated        8.3

Steps

  1. Combine data from both groups.
  2. Rank the expression levels.
  3. Sum the ranks for each group.
  4. Perform the Wilcoxon Rank-Sum Test.
# Perform the Wilcoxon Rank-Sum Test
wilcox_test <- wilcox.test(Expression ~ Condition, data = data)
wilcox_test
## 
##  Wilcoxon rank sum exact test
## 
## data:  Expression by Condition
## W = 0, p-value = 0.02857
## alternative hypothesis: true location shift is not equal to 0

Reflection Questions

  1. What does the p-value tell us about the differences between conditions?
  2. How does the Benjamini-Hochberg correction help in multiple testing?
  3. What biological insights can you derive from the significant genes?