Mann-Whitney U Test: Theory and Application

1. Introduction

The Mann-Whitney U Test (also called the Wilcoxon Rank-Sum Test) is a nonparametric test that compares two independent samples to determine whether one tends to have larger values than the other. It is an alternative to the independent two-sample t-test when: - The data are not normally distributed. - The sample size is small. - The data are ordinal (ranked) rather than continuous.

The test was developed by Henry Mann and Donald Whitney (1947) as an enhancement of Wilcoxon’s rank-sum test (1945).

2. Motivation

Why Not Use the t-Test?

The independent two-sample t-test assumes: 1. Normality: The two groups come from normally distributed populations. 2. Equal Variances: The variance in both groups is the same. 3. Interval Data: The data are measured on a numerical scale.

However, in many real-world cases: - The data are skewed or non-normally distributed. - The sample size is too small to check normality. - The data are ordinal (e.g., Likert scale responses: 1-5).

The Mann-Whitney U Test is a distribution-free alternative that only assumes that the two distributions have a similar shape.

3. Hypothesis Formulation

Suppose we have two independent groups:

\[ X_1, X_2, \dots, X_m \quad \text{(Sample 1, size \( m \))} \] \[ Y_1, Y_2, \dots, Y_n \quad \text{(Sample 2, size \( n \))} \]

The test examines whether one group tends to have higher values than the other.

Null Hypothesis (\(H_0\)): The two distributions are identical. \[ P(X > Y) = P(Y > X) = 0.5 \]
Alternative Hypothesis (\(H_A\)): One distribution is shifted higher than the other. \[ P(X > Y) \neq 0.5 \]

This is a rank-based test that evaluates whether one group tends to have larger values than the other.

4. Test Statistic

Step 1: Combine and Rank the Data

Combine all observations from both samples and assign ranks from smallest to largest.
If there are ties, assign the average rank.

Step 2: Compute Rank Sums

Let: - \(R_X\) = sum of ranks for Sample 1. - \(R_Y\) = sum of ranks for Sample 2.

Step 3: Compute the U Statistic

The Mann-Whitney U statistic is calculated as:

\[ U_X = R_X - \frac{m(m+1)}{2} \]

\[ U_Y = R_Y - \frac{n(n+1)}{2} \]

where \(U_X\) and \(U_Y\) represent the number of times an observation in one group is greater than an observation in the other group.

The test statistic is:

\[ U = \min(U_X, U_Y) \]

5. Distribution Under \(H_0\)

For small samples (\(m, n \leq 20\)), use Mann-Whitney tables for critical values.
For large samples (\(m, n > 20\)), the test statistic follows an approximate normal distribution:

\[ Z = \frac{U - \frac{mn}{2}}{\sqrt{\frac{mn(m+n+1)}{12}}} \]

where \(Z\) follows the standard normal distribution \(N(0,1)\).

6. Assumptions of the Mann-Whitney U Test

The two samples are independent.
The response variable is ordinal or continuous.
The distributions of the two populations have similar shape.

7. Example: Application in R

Scenario

A psychologist wants to compare the stress levels of two groups of students: 1. Group A: Students who meditate before exams. 2. Group B: Students who do not meditate.

The measured stress levels (on a scale of 1 to 100) are:

\[ \begin{array}{|c|c|} \hline \textbf{Meditation Group} & \textbf{No Meditation Group} \\ \hline 42 & 65 \\ 50 & 70 \\ 48 & 72 \\ 39 & 80 \\ 45 & 78 \\ \hline \end{array} \]

R Code for Mann-Whitney U Test

# Sample Data: Stress Levels
meditation <- c(42, 50, 48, 39, 45)
no_meditation <- c(65, 70, 72, 80, 78)

# Perform Mann-Whitney U Test in R
wilcox.test(meditation, no_meditation, alternative = "two.sided")

## 
##  Wilcoxon rank sum exact test
## 
## data:  meditation and no_meditation
## W = 0, p-value = 0.007937
## alternative hypothesis: true location shift is not equal to 0

Example: Simulated Data (Independent Two Groups)

Scenario

A scientist wants to compare reaction times between two groups:

Group A (with caffeine)
Group B (without caffeine)

# Simulated reaction time data (in seconds)
set.seed(42)
groupA <- rnorm(15, mean = 5.2, sd = 0.8)  # Caffeine group
groupB <- rnorm(15, mean = 6.0, sd = 0.9)  # Non-Caffeine group

# Create a dataframe
df_mannwhitney <- data.frame(
  Group = rep(c("Caffeine", "No Caffeine"), each = 15),
  ReactionTime = c(groupA, groupB)
)

# Display first few rows
kable(head(df_mannwhitney), caption = "First Few Rows of Reaction Time Data")

First Few Rows of Reaction Time Data
Group	ReactionTime
Caffeine	6.296767
Caffeine	4.748241
Caffeine	5.490503
Caffeine	5.706290
Caffeine	5.523415
Caffeine	5.115100

###Performing the Mann-Whitney U Test###

# Perform Mann-Whitney U Test
mannwhitney_test <- wilcox.test(groupA, groupB, alternative = "two.sided")

# Print test results
mannwhitney_test

## 
##  Wilcoxon rank sum exact test
## 
## data:  groupA and groupB
## W = 96, p-value = 0.5125
## alternative hypothesis: true location shift is not equal to 0

Interpretation

If the p-value is less than 0.05, we reject H₀:

H₀ → The two groups have significantly different reaction times.

If the p-value is greater than or equal to 0.05, we fail to reject H₀:

H₀ → There is no statistically significant difference in reaction times between the two groups.

ggplot(df_mannwhitney, aes(x = Group, y = ReactionTime, fill = Group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Reaction Times: Caffeine vs. No Caffeine",
       x = "Group",
       y = "Reaction Time (seconds)") +
  theme_minimal()

2. Real Dataset: `PlantGrowth` in R

The built-in PlantGrowth dataset in R records the weight of plants under three conditions:

ctrl: Control group
trt1: Treatment 1
trt2: Treatment 2

We will compare the control group (ctrl) vs. Treatment 1 (trt1) using the Mann-Whitney U test (also known as the Wilcoxon rank-sum test).

# Load dataset
data("PlantGrowth")

# Filter for Control and Treatment 1
plant_data <- PlantGrowth %>% filter(group != "trt2")

# Perform Mann-Whitney U Test
mannwhitney_real <- wilcox.test(weight ~ group, data = plant_data)

# Display first few rows
kable(head(plant_data), caption = "First Few Rows of Plant Growth Data")

First Few Rows of Plant Growth Data
weight	group
4.17	ctrl
5.58	ctrl
5.18	ctrl
6.11	ctrl
4.50	ctrl
4.61	ctrl

# Print test result
mannwhitney_real

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  weight by group
## W = 67.5, p-value = 0.1986
## alternative hypothesis: true location shift is not equal to 0

Interpretation

If the p-value is less than 0.05, we reject H₀:

H₀ → There is a statistically significant difference in plant weight between the Control group and Treatment 1.

If the p-value is greater than or equal to 0.05, we fail to reject H₀:

H₀ → There is no statistically significant difference in plant weight between the Control group and Treatment 1.

ggplot(plant_data, aes(x = group, y = weight, fill = group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Plant Growth: Control vs. Treatment 1",
       x = "Group",
       y = "Plant Weight") +
  theme_minimal()

3. Hands-On Exercise: Exam Scores

Scenario

A teacher records exam scores for two different teaching methods.

Student	Traditional	Modern
A	78	85
B	75	80
C	80	87
D	72	78
E	77	83
F	83	89
G	79	84
H	76	81

Task for Students

Create vectors in R for Traditional and Modern scores:

traditional_scores <- c(78, 75, 80, 72, 77, 83, 79, 76)
modern_scores <- c(85, 80, 87, 78, 83, 89, 84, 81)

traditional <- c(78, 75, 80, 72, 77, 83, 79, 76)
modern      <- c(85, 80, 87, 78, 83, 89, 84, 81)

# Perform Mann-Whitney U Test
mannwhitney_exercise <- wilcox.test(traditional, modern, alternative = "two.sided")

# Print test result
mannwhitney_exercise

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  traditional and modern
## W = 6.5, p-value = 0.008505
## alternative hypothesis: true location shift is not equal to 0

df_exercise <- data.frame(
  Method = rep(c("Traditional", "Modern"), each = 8),
  Score = c(traditional, modern)
)

ggplot(df_exercise, aes(x = Method, y = Score, fill = Method)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Exam Scores: Traditional vs. Modern Teaching",
       x = "Teaching Method",
       y = "Exam Score") +
  theme_minimal()

## Conclusion

The Mann-Whitney U test (also known as the Wilcoxon rank-sum test) is a robust non-parametric alternative to the independent samples t-test, especially when:

Data are not normally distributed.
You are comparing independent groups.
The focus is on ranks rather than absolute values.

It is widely used in fields like medicine, psychology, and education, where the assumption of normality is often not met.

Non-Parametric Hypothesis Test: Mann-Whitney U Test

Dr. Debashis Chatterjee

2025-02-19

Mann-Whitney U Test: Theory and Application

1. Introduction

2. Motivation

Why Not Use the t-Test?

3. Hypothesis Formulation

4. Test Statistic

Step 1: Combine and Rank the Data

Step 2: Compute Rank Sums

Step 3: Compute the U Statistic

5. Distribution Under \(H_0\)

6. Assumptions of the Mann-Whitney U Test

7. Example: Application in R

Scenario

R Code for Mann-Whitney U Test

Example: Simulated Data (Independent Two Groups)

Scenario

Interpretation

2. Real Dataset: `PlantGrowth` in R

Interpretation

3. Hands-On Exercise: Exam Scores

Scenario

Task for Students

Non-Parametric Hypothesis Test: Mann-Whitney U Test

Dr. Debashis Chatterjee

2025-02-19

Mann-Whitney U Test: Theory and Application

1. Introduction

2. Motivation

Why Not Use the t-Test?

3. Hypothesis Formulation

4. Test Statistic

Step 1: Combine and Rank the Data

Step 2: Compute Rank Sums

Step 3: Compute the U Statistic

5. Distribution Under \(H_0\)

6. Assumptions of the Mann-Whitney U Test

7. Example: Application in R

Scenario

R Code for Mann-Whitney U Test

Example: Simulated Data (Independent Two Groups)

Scenario

Interpretation

2. Real Dataset: PlantGrowth in R

Interpretation

3. Hands-On Exercise: Exam Scores

Scenario

Task for Students

2. Real Dataset: `PlantGrowth` in R