Mann-Whitney U Test: Theory and Application

1. Introduction

The Mann-Whitney U Test (also called the Wilcoxon Rank-Sum Test) is a nonparametric test that compares two independent samples to determine whether one tends to have larger values than the other. It is an alternative to the independent two-sample t-test when: - The data are not normally distributed. - The sample size is small. - The data are ordinal (ranked) rather than continuous.

The test was developed by Henry Mann and Donald Whitney (1947) as an enhancement of Wilcoxon’s rank-sum test (1945).


2. Motivation

Why Not Use the t-Test?

The independent two-sample t-test assumes: 1. Normality: The two groups come from normally distributed populations. 2. Equal Variances: The variance in both groups is the same. 3. Interval Data: The data are measured on a numerical scale.

However, in many real-world cases: - The data are skewed or non-normally distributed. - The sample size is too small to check normality. - The data are ordinal (e.g., Likert scale responses: 1-5).

The Mann-Whitney U Test is a distribution-free alternative that only assumes that the two distributions have a similar shape.


3. Hypothesis Formulation

Suppose we have two independent groups:

\[ X_1, X_2, \dots, X_m \quad \text{(Sample 1, size \( m \))} \] \[ Y_1, Y_2, \dots, Y_n \quad \text{(Sample 2, size \( n \))} \]

The test examines whether one group tends to have higher values than the other.

  • Null Hypothesis (\(H_0\)): The two distributions are identical. \[ P(X > Y) = P(Y > X) = 0.5 \]
  • Alternative Hypothesis (\(H_A\)): One distribution is shifted higher than the other. \[ P(X > Y) \neq 0.5 \]

This is a rank-based test that evaluates whether one group tends to have larger values than the other.


4. Test Statistic

Step 1: Combine and Rank the Data

  1. Combine all observations from both samples and assign ranks from smallest to largest.
  2. If there are ties, assign the average rank.

Step 2: Compute Rank Sums

Let: - \(R_X\) = sum of ranks for Sample 1. - \(R_Y\) = sum of ranks for Sample 2.

Step 3: Compute the U Statistic

The Mann-Whitney U statistic is calculated as:

\[ U_X = R_X - \frac{m(m+1)}{2} \]

\[ U_Y = R_Y - \frac{n(n+1)}{2} \]

where \(U_X\) and \(U_Y\) represent the number of times an observation in one group is greater than an observation in the other group.

The test statistic is:

\[ U = \min(U_X, U_Y) \]


5. Distribution Under \(H_0\)

  • For small samples (\(m, n \leq 20\)), use Mann-Whitney tables for critical values.
  • For large samples (\(m, n > 20\)), the test statistic follows an approximate normal distribution:

\[ Z = \frac{U - \frac{mn}{2}}{\sqrt{\frac{mn(m+n+1)}{12}}} \]

where \(Z\) follows the standard normal distribution \(N(0,1)\).


6. Assumptions of the Mann-Whitney U Test

  1. The two samples are independent.
  2. The response variable is ordinal or continuous.
  3. The distributions of the two populations have similar shape.

7. Example: Application in R

Scenario

A psychologist wants to compare the stress levels of two groups of students: 1. Group A: Students who meditate before exams. 2. Group B: Students who do not meditate.

The measured stress levels (on a scale of 1 to 100) are:

\[ \begin{array}{|c|c|} \hline \textbf{Meditation Group} & \textbf{No Meditation Group} \\ \hline 42 & 65 \\ 50 & 70 \\ 48 & 72 \\ 39 & 80 \\ 45 & 78 \\ \hline \end{array} \]


R Code for Mann-Whitney U Test

# Sample Data: Stress Levels
meditation <- c(42, 50, 48, 39, 45)
no_meditation <- c(65, 70, 72, 80, 78)

# Perform Mann-Whitney U Test in R
wilcox.test(meditation, no_meditation, alternative = "two.sided")
## 
##  Wilcoxon rank sum exact test
## 
## data:  meditation and no_meditation
## W = 0, p-value = 0.007937
## alternative hypothesis: true location shift is not equal to 0

Example: Simulated Data (Independent Two Groups)

Scenario

A scientist wants to compare reaction times between two groups:

  • Group A (with caffeine)
  • Group B (without caffeine)
# Simulated reaction time data (in seconds)
set.seed(42)
groupA <- rnorm(15, mean = 5.2, sd = 0.8)  # Caffeine group
groupB <- rnorm(15, mean = 6.0, sd = 0.9)  # Non-Caffeine group

# Create a dataframe
df_mannwhitney <- data.frame(
  Group = rep(c("Caffeine", "No Caffeine"), each = 15),
  ReactionTime = c(groupA, groupB)
)

# Display first few rows
kable(head(df_mannwhitney), caption = "First Few Rows of Reaction Time Data")
First Few Rows of Reaction Time Data
Group ReactionTime
Caffeine 6.296767
Caffeine 4.748241
Caffeine 5.490503
Caffeine 5.706290
Caffeine 5.523415
Caffeine 5.115100

###Performing the Mann-Whitney U Test###

# Perform Mann-Whitney U Test
mannwhitney_test <- wilcox.test(groupA, groupB, alternative = "two.sided")

# Print test results
mannwhitney_test
## 
##  Wilcoxon rank sum exact test
## 
## data:  groupA and groupB
## W = 96, p-value = 0.5125
## alternative hypothesis: true location shift is not equal to 0

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The two groups have significantly different reaction times.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant difference in reaction times between the two groups.

ggplot(df_mannwhitney, aes(x = Group, y = ReactionTime, fill = Group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Reaction Times: Caffeine vs. No Caffeine",
       x = "Group",
       y = "Reaction Time (seconds)") +
  theme_minimal()

2. Real Dataset: PlantGrowth in R

The built-in PlantGrowth dataset in R records the weight of plants under three conditions:

  • ctrl: Control group
  • trt1: Treatment 1
  • trt2: Treatment 2

We will compare the control group (ctrl) vs. Treatment 1 (trt1) using the Mann-Whitney U test (also known as the Wilcoxon rank-sum test).

# Load dataset
data("PlantGrowth")

# Filter for Control and Treatment 1
plant_data <- PlantGrowth %>% filter(group != "trt2")

# Perform Mann-Whitney U Test
mannwhitney_real <- wilcox.test(weight ~ group, data = plant_data)

# Display first few rows
kable(head(plant_data), caption = "First Few Rows of Plant Growth Data")
First Few Rows of Plant Growth Data
weight group
4.17 ctrl
5.58 ctrl
5.18 ctrl
6.11 ctrl
4.50 ctrl
4.61 ctrl
# Print test result
mannwhitney_real
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  weight by group
## W = 67.5, p-value = 0.1986
## alternative hypothesis: true location shift is not equal to 0

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → There is a statistically significant difference in plant weight between the Control group and Treatment 1.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant difference in plant weight between the Control group and Treatment 1.

ggplot(plant_data, aes(x = group, y = weight, fill = group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Plant Growth: Control vs. Treatment 1",
       x = "Group",
       y = "Plant Weight") +
  theme_minimal()

3. Hands-On Exercise: Exam Scores

Scenario

A teacher records exam scores for two different teaching methods.

Student Traditional Modern
A 78 85
B 75 80
C 80 87
D 72 78
E 77 83
F 83 89
G 79 84
H 76 81

Task for Students

  1. Create vectors in R for Traditional and Modern scores:
traditional_scores <- c(78, 75, 80, 72, 77, 83, 79, 76)
modern_scores <- c(85, 80, 87, 78, 83, 89, 84, 81)
traditional <- c(78, 75, 80, 72, 77, 83, 79, 76)
modern      <- c(85, 80, 87, 78, 83, 89, 84, 81)

# Perform Mann-Whitney U Test
mannwhitney_exercise <- wilcox.test(traditional, modern, alternative = "two.sided")

# Print test result
mannwhitney_exercise
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  traditional and modern
## W = 6.5, p-value = 0.008505
## alternative hypothesis: true location shift is not equal to 0
df_exercise <- data.frame(
  Method = rep(c("Traditional", "Modern"), each = 8),
  Score = c(traditional, modern)
)

ggplot(df_exercise, aes(x = Method, y = Score, fill = Method)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Exam Scores: Traditional vs. Modern Teaching",
       x = "Teaching Method",
       y = "Exam Score") +
  theme_minimal()

## Conclusion

The Mann-Whitney U test (also known as the Wilcoxon rank-sum test) is a robust non-parametric alternative to the independent samples t-test, especially when:

  • Data are not normally distributed.
  • You are comparing independent groups.
  • The focus is on ranks rather than absolute values.

It is widely used in fields like medicine, psychology, and education, where the assumption of normality is often not met.