Kruskal-Wallis Test: Theory and Application

1. Introduction

The Kruskal-Wallis Test is a nonparametric alternative to one-way ANOVA used to compare three or more independent groups. It determines whether at least one of the groups has a different median from the others.

This test is useful when: - The assumptions of ANOVA (normality & homogeneity of variance) are violated. - The sample size is small, making parametric tests unreliable. - The data are ordinal or non-normally distributed.

The test was developed by William Kruskal and W. Allen Wallis in 1952 as an extension of the Wilcoxon rank-sum test to multiple groups.


2. Motivation: Why Not Use ANOVA?

One-way ANOVA assumes: 1. Normality: Data within each group follow a normal distribution. 2. Homogeneity of Variance: All groups have equal variance. 3. Interval Data: The data are measured on a numerical scale.

However, in many real-world situations: - The data are skewed or non-normally distributed. - The sample sizes are small and unequal. - The data are ordinal (e.g., satisfaction ratings: 1-5).

Example

Imagine a study comparing customer satisfaction across three different stores. The satisfaction ratings (on a 1-10 scale) may not be normally distributed. The Kruskal-Wallis Test provides a way to compare the stores without assuming normality.


3. Hypothesis Formulation

We have \(k\) independent groups with sample sizes:

\[ X_{11}, X_{12}, \dots, X_{1n_1} \quad \text{(Group 1, size \( n_1 \))} \] \[ X_{21}, X_{22}, \dots, X_{2n_2} \quad \text{(Group 2, size \( n_2 \))} \] \[ \vdots \] \[ X_{k1}, X_{k2}, \dots, X_{kn_k} \quad \text{(Group \( k \), size \( n_k \))} \]

  • Null Hypothesis (\(H_0\)): The population medians are equal. \[ H_0: M_1 = M_2 = \dots = M_k \]
  • Alternative Hypothesis (\(H_A\)): At least one group has a different median. \[ H_A: M_i \neq M_j \quad \text{for some } i, j \]

4. Test Procedure

Step 1: Rank the Data

  1. Combine all data points across all groups.
  2. Assign ranks from smallest to largest.
  3. If there are ties, assign the average rank.

Step 2: Compute Rank Sums

For each group \(i\), compute the sum of ranks:

\[ R_i = \sum \text{(Ranks in Group \( i \))} \]

Step 3: Compute the Kruskal-Wallis Test Statistic

The test statistic is given by:

\[ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1) \]

where: - \(N\) is the total sample size (\(N = n_1 + n_2 + \dots + n_k\)). - \(n_i\) is the sample size of group \(i\). - \(R_i\) is the sum of ranks for group \(i\).


5. Distribution Under \(H_0\)

  • If sample sizes are small (\(n_i < 5\)), use exact Kruskal-Wallis tables.
  • If sample sizes are large (\(N > 5\)), the test statistic follows an approximate chi-square distribution:

\[ H \sim \chi^2_{k-1} \]

where \(k-1\) is the degrees of freedom.


6. Assumptions of the Kruskal-Wallis Test

  1. The samples are independent.
  2. The response variable is ordinal or continuous.
  3. The distributions have similar shape (only location shift is tested).

7. Example: Application in R

Scenario

A company wants to compare employee job satisfaction levels across three departments: HR, IT, and Marketing. The satisfaction scores (out of 100) are:

\[ \begin{array}{|c|c|c|} \hline \textbf{HR} & \textbf{IT} & \textbf{Marketing} \\ \hline 70 & 85 & 78 \\ 75 & 80 & 82 \\ 72 & 88 & 79 \\ 68 & 82 & 81 \\ 77 & 87 & 80 \\ \hline \end{array} \]


R Code for Kruskal-Wallis Test

# Sample Data: Job Satisfaction Scores
hr <- c(70, 75, 72, 68, 77)
it <- c(85, 80, 88, 82, 87)
marketing <- c(78, 82, 79, 81, 80)

# Combine into a Data Frame
satisfaction_data <- data.frame(
  score = c(hr, it, marketing),
  department = rep(c("HR", "IT", "Marketing"), each = 5)
)

# Perform Kruskal-Wallis Test in R
kruskal.test(score ~ department, data = satisfaction_data)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  score by department
## Kruskal-Wallis chi-squared = 11.22, df = 2, p-value = 0.003661

8. Interpretation of Results

If the p-value is less than 0.05, we reject H0:

H0 → All groups have the same median job satisfaction score.

We conclude that at least one department has a statistically significant different median job satisfaction score.

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → All groups have the same median job satisfaction score.

We conclude that there is not enough evidence to say that the median job satisfaction scores differ significantly between departments.

The Kruskal-Wallis test is useful when comparing non-normally distributed data across multiple groups.

9. Advantages of the Kruskal-Wallis Test

  • Non-parametric: No assumption of normality is required.
  • Applicable to Ordinal Data: Works with ranked variables.
  • Robust to Outliers: Less sensitive to extreme values than ANOVA.

10. References

  • Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47(260), 583-621.
  • Conover, W. J. (1999). Practical Nonparametric Statistics. 3rd Edition, Wiley.
  • Gibbons, J. D., & Chakraborti, S. (2010). Nonparametric Statistical Inference. 5th Edition, CRC Press.

1. Kruskal-Wallis Test for Multiple Independent Groups

Theory

Hypothesis

We compare k independent groups with sample sizes n1, n2, …, nk.

  • Null Hypothesis H0: The distributions of all groups are identical.
  • Alternative Hypothesis HA: At least one group differs in distribution.

Test Statistic

  1. Rank all observations from smallest to largest.
  2. Compute the sum of ranks for each group.
  3. Compute the test statistic:

H = (12 / (N * (N + 1))) * Σi=1k (Ri2 / ni) - 3 * (N + 1)

where:

  • N = total number of observations
  • Ri = sum of ranks for group i
  • ni = sample size of group i

Under H0, H follows an approximate chi-square distribution with k - 1 degrees of freedom.

Example: Simulated Data (Three Independent Groups)

Scenario

A researcher wants to compare test scores among three teaching methods:

  • Traditional
  • Online
  • Hybrid
# Simulated test scores for three groups
set.seed(42)
traditional <- rnorm(15, mean = 70, sd = 10)
online <- rnorm(15, mean = 75, sd = 12)
hybrid <- rnorm(15, mean = 78, sd = 9)

# Create a dataframe
df_kruskal <- data.frame(
  Method = rep(c("Traditional", "Online", "Hybrid"), each = 15),
  Score = c(traditional, online, hybrid)
)

# Display first few rows
kable(head(df_kruskal), caption = "First Few Rows of Test Scores Data")
First Few Rows of Test Scores Data
Method Score
Traditional 83.70958
Traditional 64.35302
Traditional 73.63128
Traditional 76.32863
Traditional 74.04268
Traditional 68.93875
# Perform Kruskal-Wallis Test
kruskal_test <- kruskal.test(Score ~ Method, data = df_kruskal)

# Print test results
kruskal_test
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Score by Method
## Kruskal-Wallis chi-squared = 0.49237, df = 2, p-value = 0.7818

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → All groups have the same distribution. (Or, more formally: The distributions of all groups are identical.)

We conclude that there is a statistically significant difference in distribution between at least one pair of groups. (It’s important to note that this doesn’t tell us which* groups are different, just that at least one pair is.)*

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → All groups have the same distribution.

We conclude that there is not enough evidence to say that the distributions differ significantly between the groups.

ggplot(df_kruskal, aes(x = Method, y = Score, fill = Method)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Test Scores by Teaching Method",
       x = "Teaching Method",
       y = "Test Score") +
  theme_minimal()

## 2. Post-Hoc Analysis (Pairwise Comparisons)

If the Kruskal-Wallis test finds a statistically significant difference, we need to perform post-hoc pairwise comparisons to determine which groups are different.

Dunn’s test (or a similar post-hoc test like the Conover-Iman test) is commonly used for these pairwise comparisons after a Kruskal-Wallis test.

# Ensure PMCMRplus is installed
if (!requireNamespace("PMCMRplus", quietly = TRUE)) {
  install.packages("PMCMRplus")
}

# Load necessary libraries
library(PMCMRplus)
library(dplyr)

# Check for missing values in Method and Score columns
df_kruskal <- df_kruskal %>% 
  filter(!is.na(Method) & !is.na(Score)) # Remove NAs if any

# Ensure that 'Method' is a factor with valid levels
df_kruskal$Method <- as.factor(df_kruskal$Method)

# Perform the Kruskal-Wallis test
kruskal_test <- kruskal.test(Score ~ Method, data = df_kruskal)

# Check the structure of the data
str(df_kruskal)  # Make sure 'Method' is a factor and 'Score' is numeric
## 'data.frame':    45 obs. of  2 variables:
##  $ Method: Factor w/ 3 levels "Hybrid","Online",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ Score : num  83.7 64.4 73.6 76.3 74 ...
# Perform post-hoc pairwise comparisons using Nemenyi Test
posthoc_test <- kwAllPairsNemenyiTest(Score ~ Method, data = df_kruskal)

# Print post-hoc test results
posthoc_test
##             Hybrid Online
## Online      0.77   -     
## Traditional 0.98   0.88

Kruskal-Wallis Test Results Interpretation

The Kruskal-Wallis test evaluates whether there is a significant difference in the distribution of scores across multiple independent groups.

Test 1: Scores by Department

\[ \text{H}_{0}: \text{All departments have the same score distribution.} \] \[ \text{H}_{A}: \text{At least one department has a different score distribution.} \]

\[ \chi^2 = 11.22, \quad df = 2, \quad p = 0.003661 \]

  • Since the p-value = 0.003661 is less than 0.05, we reject \(H_0\).
  • This means there is a statistically significant difference in scores between at least one pair of departments.

Test 2: Scores by Teaching Method

\[ \text{H}_{0}: \text{All teaching methods have the same score distribution.} \] \[ \text{H}_{A}: \text{At least one method has a different score distribution.} \]

\[ \chi^2 = 0.49237, \quad df = 2, \quad p = 0.7818 \]

  • Since the p-value = 0.7818 is greater than 0.05, we fail to reject \(H_0\).
  • This means there is no statistically significant difference in scores between teaching methods.

Post-Hoc Pairwise Comparisons

The Nemenyi post-hoc test is used to identify which specific groups differ significantly.

\[ \begin{array}{c|ccc} & \text{Hybrid} & \text{Online} & \text{Traditional} \\ \hline \text{Online} & 0.77 & - & - \\ \text{Traditional} & 0.98 & 0.88 & - \\ \end{array} \]

  • p-values indicate the significance of differences between groups.
  • Since all p-values are greater than 0.05, none of the pairwise differences are statistically significant.
  • Even though the Kruskal-Wallis test found a difference between departments, the post-hoc test suggests that specific pairwise differences are not strong enough to be significant.

Conclusion

  1. Scores differ significantly between departments (\(p = 0.003661\)), but pairwise comparisons show no strong evidence of differences.
  2. Scores do not differ significantly between teaching methods (\(p = 0.7818\)), meaning all teaching methods perform similarly.
  3. Post-hoc tests indicate that none of the pairwise differences are statistically significant.
  4. Further investigation with larger samples may help clarify results.

📚 References

  1. Hollander, M., Wolfe, D. A

3. Real Dataset: PlantGrowth in R

The PlantGrowth dataset records the weights of plants under three different conditions:

  • ctrl (Control)
  • trt1 (Treatment 1)
  • trt2 (Treatment 2)

We will compare plant weights across these three conditions using the Kruskal-Wallis test.

# Load dataset
data("PlantGrowth")

# Perform Kruskal-Wallis Test
kruskal_real <- kruskal.test(weight ~ group, data = PlantGrowth)

# Display first few rows
kable(head(PlantGrowth), caption = "First Few Rows of Plant Growth Data")
First Few Rows of Plant Growth Data
weight group
4.17 ctrl
5.58 ctrl
5.18 ctrl
6.11 ctrl
4.50 ctrl
4.61 ctrl
# Print test result
kruskal_real
## 
##  Kruskal-Wallis rank sum test
## 
## data:  weight by group
## Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842

Interpretation

If the p-value is less than 0.05, we reject H0:

H0 → The plant weights are identically distributed across all three conditions.

We conclude that there is a statistically significant difference in plant weight between at least one pair of the three conditions (ctrl, trt1, and trt2).

If the p-value is greater than or equal to 0.05, we fail to reject H0:

H0 → The plant weights are identically distributed across all three conditions.

We conclude that there is not enough evidence to say that the plant weights differ significantly across the three conditions.

ggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
  geom_boxplot(alpha = 0.7) +
  labs(title = "Plant Growth by Treatment Group",
       x = "Treatment Group",
       y = "Plant Weight") +
  theme_minimal()

4. Hands-On Exercise: Employee Productivity

Scenario

A company evaluates employee productivity under three different shift schedules.

Employee Morning Afternoon Night
1 80 85 78
2 75 82 76
3 78 88 80
4 82 90 85
5 79 87 79

Task for Students

  1. Create vectors in R for Morning, Afternoon, and Night shift productivity:
morning <- c(80, 75, 78, 82, 79, 84, 81, 77)
afternoon <- c(85, 82, 88, 90, 87, 92, 89, 86)
night <- c(78, 76, 80, 85, 79, 83, 81, 77)

# Create dataframe
df_productivity <- data.frame(
  Shift = rep(c("Morning", "Afternoon", "Night"), each = 8),
  Productivity = c(morning, afternoon, night)
)

# Kruskal-Wallis Test
kruskal_exercise <- kruskal.test(Productivity ~ Shift, data = df_productivity)

# Print results
kruskal_exercise
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Productivity by Shift
## Kruskal-Wallis chi-squared = 13.561, df = 2, p-value = 0.001136

Conclusion

The Kruskal-Wallis test is a robust non-parametric alternative to ANOVA, especially when:

  • Data are not normally distributed.
  • You are comparing more than two independent groups.
  • The focus is on ranks rather than absolute values.

It is widely used in fields like medicine, psychology, and social sciences, where the assumption of normality is often not met.

References

  • Hollander, M., Wolfe, D. A., & Chicken, E. (2015). Nonparametric Statistical Methods. Wiley.
  • Gibbons, J. D., & Chakraborti, S. (2010). Nonparametric Statistical Inference. CRC Press.
  • Wickham, H. (2019). R for Data Science. O’Reilly Media.