Introduction

The Wilcoxon Signed-Rank Test is a non-parametric alternative to the paired t-test. It is used when:

  1. The data is paired or matched (e.g., pre-treatment vs post-treatment).
  2. The differences are not normally distributed (a key assumption of the paired t-test).
  3. You want to test if the median difference is zero rather than the mean difference.

Unlike the Sign Test, which only considers signs (+/-), the Wilcoxon Signed-Rank Test also incorporates the magnitude of differences.

1. Wilcoxon Signed-Rank Test for a Single Sample

One-Sample Wilcoxon Signed-Rank Test: Theory and Application

1. Introduction

The One-Sample Wilcoxon Signed-Rank Test is a nonparametric alternative to the one-sample t-test. It is used when: - The data are not normally distributed. - The sample size is small. - The goal is to test whether the median of a single population is equal to a specified value.

The test was introduced by Frank Wilcoxon (1945) as an extension of the Sign Test, incorporating both signs and magnitudes of differences.

Motivation: The one-sample t-test assumes normality, but many real-world datasets do not satisfy this assumption. The Wilcoxon Signed-Rank Test provides a robust, distribution-free alternative.


2. Hypothesis Formulation

Given a sample:

\[ \{X_1, X_2, \dots, X_n\} \]

we test whether the median (\(m\)) is equal to a hypothesized value \(m_0\).

  • Null Hypothesis (\(H_0\)): The true median equals \(m_0\). \[ \text{median}(X) = m_0 \]
  • Alternative Hypothesis (\(H_A\)): The median is not equal to \(m_0\) (two-sided) or is greater/less than \(m_0\) (one-sided). \[ \text{median}(X) \neq m_0 \quad \text{(Two-sided test)} \]

3. Test Procedure

Step 1: Compute Differences from \(m_0\)

For each observation \(X_i\), compute the difference:

\[ D_i = X_i - m_0 \]

Step 2: Remove Zero Differences

Observations where \(D_i = 0\) are discarded.

Step 3: Rank Absolute Differences

Compute the absolute differences:

\[ |D_i| = |X_i - m_0| \]

Assign ranks \(R_i\) from smallest to largest. If there are ties, assign the average rank.

Step 4: Assign Signs to Ranks

Each rank retains the sign of \(D_i\):

\[ R_i^+ = R_i \quad \text{if } D_i > 0, \quad R_i^- = R_i \quad \text{if } D_i < 0 \]

Step 5: Compute Test Statistic

Define: - \(W^+ =\) Sum of positive signed ranks. - \(W^- =\) Sum of negative signed ranks. - The Wilcoxon Signed-Rank Test Statistic is:

\[ W = \min(W^+, W^-) \]


4. Distribution Under \(H_0\)

  • If the sample size \(n\) is small (\(n \leq 20\)), use Wilcoxon critical values (table lookup).
  • If the sample size is large (\(n > 20\)), the test statistic \(W\) follows an approximate normal distribution:

\[ Z = \frac{W - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}} \]

where \(Z\) follows the standard normal distribution \(N(0,1)\).


5. Assumptions of the One-Sample Wilcoxon Signed-Rank Test

  1. The sample is randomly selected.
  2. The differences \(D_i = X_i - m_0\) are independent.
  3. The differences are symmetrically distributed around the median.
  4. The data are at least ordinal (i.e., can be ranked meaningfully).

6. Example: Application in R

Scenario

A nutritionist believes that the median daily caloric intake of a certain population is 2000 kcal. To test this claim, they collect a random sample of 10 individuals.


Dataset

The observed caloric intake values (in kcal) are:

\[ \begin{array}{|c|c|} \hline \textbf{Participant} & \textbf{Caloric Intake} \\ \hline 1 & 1950 \\ 2 & 2020 \\ 3 & 2100 \\ 4 & 1980 \\ 5 & 2050 \\ 6 & 1990 \\ 7 & 2070 \\ 8 & 2000 \\ 9 & 1955 \\ 10 & 2080 \\ \hline \end{array} \]


R Code for One-Sample Wilcoxon Signed-Rank Test

# Sample Data: Daily Caloric Intake
caloric_intake <- c(1950, 2020, 2100, 1980, 2050, 1990, 2070, 2000, 1955, 2080)

# Perform Wilcoxon Signed-Rank Test in R
wilcox.test(caloric_intake, mu = 2000, alternative = "two.sided")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  caloric_intake
## V = 32, p-value = 0.2855
## alternative hypothesis: true location is not equal to 2000

Interpretation of Wilcoxon Signed-Rank Test Output

The Wilcoxon Signed-Rank Test with continuity correction was conducted to assess whether the median caloric intake significantly differs from 2000 kcal.

Test Output Summary

Key Interpretation Points

  • The test statistic is \(V = 32\).
  • The p-value is 0.2855.
  • The alternative hypothesis states that the true median caloric intake is not equal to 2000 kcal.

Statistical Decision

The p-value (0.2855) is greater than 0.05, meaning we fail to reject the null hypothesis \(H_0\).

Conclusion

Since \(p > 0.05\), there is insufficient evidence to conclude that the median caloric intake significantly differs from 2000 kcal. In other words, the observed differences in caloric intake could have occurred due to random chance.

Final Report

  • If p-value < 0.05: We would reject \(H_0\) and conclude a significant difference.
  • If p-value > 0.05 (as in this case): We fail to reject \(H_0\), suggesting no statistically significant deviation from the hypothesized median.

This analysis shows that the Wilcoxon Signed-Rank Test does not provide strong evidence against the assumption that the median caloric intake is 2000 kcal. 🚀

Theory

Hypothesis (One-Sample Case)

We have a sample {X1, X2, …, Xn} and we want to test if the true median m equals a hypothesized value m0:

  • H0: m = m0
  • HA: mm0 (two-sided test)

Test Statistic

  1. Compute the differences: Di = Xi - m0
  2. Remove zero differences.
  3. Rank the absolute values of Di (smallest = rank 1).
  4. Compute the sum of ranks for positive and negative differences separately.

The test statistic is:

W = min(W+, W-)

where W+ is the sum of ranks for positive differences and W- is the sum of ranks for negative differences.

Under H0, W follows a Wilcoxon distribution (approximately normal for large n).

Example: Simulated Data (One-Sample Case)

Scenario

A company wants to test whether the median employee satisfaction score is equal to 75.

# Simulated Employee Satisfaction Scores
set.seed(42)
satisfaction_scores <- c(78, 74, 80, 72, 76, 79, 75, 81, 77, 74)

# Hypothesized median satisfaction score
hypothesized_median <- 75

# Perform Wilcoxon Signed-Rank Test
wilcox_one_sample <- wilcox.test(satisfaction_scores, mu = hypothesized_median, paired = FALSE)

# Display test results
wilcox_one_sample
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  satisfaction_scores
## V = 35.5, p-value = 0.1369
## alternative hypothesis: true location is not equal to 75

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The median satisfaction score differs significantly from 75.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant difference between the median satisfaction score and 75.

df_satisfaction <- data.frame(Scores = satisfaction_scores)

ggplot(df_satisfaction, aes(x = Scores)) +
  geom_histogram(bins = 6, fill = "steelblue", alpha = 0.8, color = "black") +
  geom_vline(xintercept = hypothesized_median, color = "red", size = 1.2) +
  labs(title = "Histogram of Employee Satisfaction Scores", x = "Satisfaction Score", y = "Frequency") +
  theme_minimal()

Example: Simulated Data (One-Sample Case)

Scenario

#A company wants to test whether the median employee satisfaction score is equal to 75.

Generating Simulated Data

set.seed(123)  # For reproducibility
satisfaction_scores <- rnorm(20, 70, 10) # Simulate scores (not necessarily normal)
# Add some skewness (optional, to further deviate from normality)
satisfaction_scores <- satisfaction_scores + runif(20, -5, 5)

# Ensure no scores below zero (satisfaction can't be negative)
satisfaction_scores[satisfaction_scores < 0] <- 0
satisfaction_scores <- round(satisfaction_scores)

# Create a data frame (optional but good practice)
df <- data.frame(score = satisfaction_scores)

print(df)
##    score
## 1     61
## 2     67
## 3     85
## 4     69
## 5     68
## 6     84
## 7     72
## 8     57
## 9     61
## 10    69
## 11    78
## 12    73
## 13    77
## 14    67
## 15    65
## 16    85
## 17    71
## 18    53
## 19    81
## 20    64
wilcox.test(df$score, mu = 75, alternative = "two.sided")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  df$score
## V = 52, p-value = 0.04976
## alternative hypothesis: true location is not equal to 75

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The median satisfaction score differs significantly from 75.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant difference between the median satisfaction score and 75.

2. Wilcoxon Signed-Rank Test for Paired Data

Wilcoxon Signed-Rank Test: Theory and Application

1. Introduction

The Wilcoxon Signed-Rank Test is a nonparametric statistical test that is used to compare the median of a single sample or the median difference between paired observations. It is particularly useful when: - The data are not normally distributed. - The sample size is small, making the t-test unreliable. - The data are paired (dependent observations).

This test was introduced by Frank Wilcoxon (1945) and is an extension of the Sign Test, which only considers the signs of the differences but not their magnitude.

Motivation: The parametric paired t-test assumes that the differences follow a normal distribution. However, real-world data often violate this assumption (e.g., skewed data or ordinal measurements). The Wilcoxon Signed-Rank Test provides a distribution-free alternative.


2. Hypothesis Formulation

Given paired observations:

\[ \{(X_1, Y_1), (X_2, Y_2), \dots, (X_n, Y_n)\} \]

we define the differences:

\[ D_i = X_i - Y_i \]

Null and Alternative Hypothesis

  • Null Hypothesis (\(H_0\)): The median difference is zero. \[ \text{median}(D) = 0 \]
  • Alternative Hypothesis (\(H_A\)): The median difference is not zero (two-sided) or is greater/less than zero (one-sided). \[ \text{median}(D) \neq 0 \quad \text{(Two-sided test)} \]

3. Test Procedure

Step 1: Remove Zero Differences

If \(D_i = 0\), discard the observation as it does not contribute to ranking.

Step 2: Rank Absolute Differences

Compute the absolute differences:

\[ |D_i| = |X_i - Y_i| \]

Assign ranks \(R_i\) to the nonzero absolute differences, from smallest to largest. If there are ties, assign the average rank.

Step 3: Assign Signs to Ranks

Each rank \(R_i\) retains the sign of \(D_i\):

\[ R_i^+ = R_i \quad \text{if } D_i > 0, \quad R_i^- = R_i \quad \text{if } D_i < 0 \]

Step 4: Compute Test Statistic

Define: - \(W^+ =\) Sum of positive signed ranks. - \(W^- =\) Sum of negative signed ranks. - The Wilcoxon Signed-Rank Test Statistic is:

\[ W = \min(W^+, W^-) \]


4. Distribution Under \(H_0\)

  • If the sample size \(n\) is small (\(n \leq 20\)), we use exact Wilcoxon critical values (table lookup).
  • If the sample size is large (\(n > 20\)), the test statistic \(W\) follows an approximate normal distribution:

\[ Z = \frac{W - \frac{n(n+1)}{4}}{\sqrt{\frac{n(n+1)(2n+1)}{24}}} \]

where \(Z\) follows the standard normal distribution \(N(0,1)\).


5. Assumptions of the Wilcoxon Signed-Rank Test

  1. The differences \(D_i = X_i - Y_i\) are independent.
  2. The differences are symmetrically distributed around the median.
  3. The data are at least ordinal (i.e., can be ranked meaningfully).

6. Example: Application in R

Scenario

A researcher wants to determine if a new training program improves student test scores. The scores before and after the program are:

\[ \begin{array}{|c|c|c|} \hline \textbf{Student} & \textbf{Before} & \textbf{After} \\ \hline 1 & 65 & 70 \\ 2 & 78 & 80 \\ 3 & 75 & 78 \\ 4 & 60 & 65 \\ 5 & 80 & 82 \\ 6 & 70 & 74 \\ 7 & 72 & 76 \\ 8 & 68 & 72 \\ \hline \end{array} \]

R Code for Wilcoxon Signed-Rank Test

# Sample Data: Before and After Scores
before <- c(65, 78, 75, 60, 80, 70, 72, 68)
after  <- c(70, 80, 78, 65, 82, 74, 76, 72)

# Perform Wilcoxon Signed-Rank Test in R
wilcox.test(before, after, paired = TRUE, alternative = "two.sided")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  before and after
## V = 0, p-value = 0.01356
## alternative hypothesis: true location shift is not equal to 0

Theory

Hypothesis (Paired Case)

For paired data {(X1, Y1), (X2, Y2), …, (Xn, Yn)}, define Di = Xi - Yi. We test:

  • H0: median(D) = 0 (no difference)
  • HA: median(D) ≠ 0

The test follows the same procedure as the single-sample case, but using the paired before-after differences (Di).

Example: Simulated Paired Data

Scenario

A fitness trainer wants to evaluate whether a new workout program significantly changes participants’ weight.

set.seed(2023)
# Simulate paired data (Before & After weights)
before_weight <- c(82, 78, 85, 79, 77, 80, 83, 75, 81, 78)
after_weight  <- before_weight - rnorm(10, mean = 1.5, sd = 0.5)  # Weight reduction by ~1.5 kg

# Display paired data
df_weights <- data.frame(Participant = 1:10, Before = before_weight, After = after_weight)
kable(df_weights, caption = "Before and After Weight (kg)")
Before and After Weight (kg)
Participant Before After
1 82 80.54189
2 78 76.99147
3 85 84.43753
4 79 77.59307
5 77 75.81674
6 80 77.95460
7 83 81.95686
8 75 72.99918
9 81 79.69963
10 78 76.73406
# Perform Wilcoxon Signed-Rank Test for Paired Data
wilcox_paired <- wilcox.test(before_weight, after_weight, paired = TRUE)

# Print test result
wilcox_paired
## 
##  Wilcoxon signed rank exact test
## 
## data:  before_weight and after_weight
## V = 55, p-value = 0.001953
## alternative hypothesis: true location shift is not equal to 0
df_weights$Difference <- df_weights$Before - df_weights$After

ggplot(df_weights, aes(x = Participant, y = Difference, fill = Difference > 0)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 0, color = "red", size = 1) +
  labs(title = "Weight Differences: Before - After", y = "Weight Change (kg)") +
  scale_fill_manual(values = c("TRUE" = "steelblue", "FALSE" = "tomato")) +
  theme_minimal()

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The new workout program significantly reduces weight.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → There is no statistically significant change in weight due to the workout program.

df_weights$Difference <- df_weights$Before - df_weights$After

ggplot(df_weights, aes(x = Participant, y = Difference, fill = Difference > 0)) +
  geom_bar(stat = "identity") +
  geom_hline(yintercept = 0, color = "red", size = 1) +
  labs(title = "Weight Differences: Before - After", y = "Weight Change (kg)") +
  scale_fill_manual(values = c("TRUE" = "steelblue", "FALSE" = "tomato")) +
  theme_minimal()

3. Hands-On Exercise: Exam Scores

Scenario

A professor records students’ exam scores before and after introducing a new teaching method.

Student Before After
A 78 82
B 75 77
C 80 84
D 72 74
E 77 79
F 83 85
G 79 81
H 76 80

Task for Students

  1. Create vectors in R for Before & After scores:
before_scores <- c(78, 75, 80, 72, 77, 83, 79, 76)
after_scores <- c(82, 77, 84, 74, 79, 85, 81, 80)
wilcox.test(before_scores, after_scores, paired = TRUE, alternative = "less") # "less" because we are testing for an *increase*.
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  before_scores and after_scores
## V = 0, p-value = 0.00577
## alternative hypothesis: true location shift is less than 0
#or
differences <- before_scores - after_scores
wilcox.test(differences, mu=0, alternative="less")
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  differences
## V = 0, p-value = 0.00577
## alternative hypothesis: true location is less than 0
before_scores <- c(78, 75, 80, 72, 77, 83, 79, 76)
after_scores  <- c(82, 77, 84, 74, 79, 85, 81, 80)

# Wilcoxon Test
wilcox_exercise <- wilcox.test(before_scores, after_scores, paired = TRUE)

# Print result
wilcox_exercise
## 
##  Wilcoxon signed rank test with continuity correction
## 
## data:  before_scores and after_scores
## V = 0, p-value = 0.01154
## alternative hypothesis: true location shift is not equal to 0

If the p-value from the wilcox.test output is less than 0.05, we reject H0:

H0 → The new teaching method has not increased the median exam score (or the median difference is zero or negative).

We conclude that the new teaching method has significantly increased the median exam score.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → The new teaching method has not increased the median exam score.

We conclude that there is not enough evidence to say that the new teaching method has significantly increased the median exam score.