The Kruskal-Wallis Test is a nonparametric alternative to one-way ANOVA used to compare three or more independent groups. It determines whether at least one of the groups has a different median from the others.
This test is useful when: - The assumptions of ANOVA (normality & homogeneity of variance) are violated. - The sample size is small, making parametric tests unreliable. - The data are ordinal or non-normally distributed.
The test was developed by William Kruskal and W. Allen Wallis in 1952 as an extension of the Wilcoxon rank-sum test to multiple groups.
One-way ANOVA assumes: 1. Normality: Data within each group follow a normal distribution. 2. Homogeneity of Variance: All groups have equal variance. 3. Interval Data: The data are measured on a numerical scale.
However, in many real-world situations: - The data are skewed or non-normally distributed. - The sample sizes are small and unequal. - The data are ordinal (e.g., satisfaction ratings: 1-5).
Imagine a study comparing customer satisfaction across three different stores. The satisfaction ratings (on a 1-10 scale) may not be normally distributed. The Kruskal-Wallis Test provides a way to compare the stores without assuming normality.
We have \(k\) independent groups with sample sizes:
\[ X_{11}, X_{12}, \dots, X_{1n_1} \quad \text{(Group 1, size \( n_1 \))} \] \[ X_{21}, X_{22}, \dots, X_{2n_2} \quad \text{(Group 2, size \( n_2 \))} \] \[ \vdots \] \[ X_{k1}, X_{k2}, \dots, X_{kn_k} \quad \text{(Group \( k \), size \( n_k \))} \]
For each group \(i\), compute the sum of ranks:
\[ R_i = \sum \text{(Ranks in Group \( i \))} \]
The test statistic is given by:
\[ H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1) \]
where: - \(N\) is the total sample size (\(N = n_1 + n_2 + \dots + n_k\)). - \(n_i\) is the sample size of group \(i\). - \(R_i\) is the sum of ranks for group \(i\).
\[ H \sim \chi^2_{k-1} \]
where \(k-1\) is the degrees of freedom.
A company wants to compare employee job satisfaction levels across three departments: HR, IT, and Marketing. The satisfaction scores (out of 100) are:
\[ \begin{array}{|c|c|c|} \hline \textbf{HR} & \textbf{IT} & \textbf{Marketing} \\ \hline 70 & 85 & 78 \\ 75 & 80 & 82 \\ 72 & 88 & 79 \\ 68 & 82 & 81 \\ 77 & 87 & 80 \\ \hline \end{array} \]
# Sample Data: Job Satisfaction Scores
hr <- c(70, 75, 72, 68, 77)
it <- c(85, 80, 88, 82, 87)
marketing <- c(78, 82, 79, 81, 80)
# Combine into a Data Frame
satisfaction_data <- data.frame(
score = c(hr, it, marketing),
department = rep(c("HR", "IT", "Marketing"), each = 5)
)
# Perform Kruskal-Wallis Test in R
kruskal.test(score ~ department, data = satisfaction_data)
##
## Kruskal-Wallis rank sum test
##
## data: score by department
## Kruskal-Wallis chi-squared = 11.22, df = 2, p-value = 0.003661
If the p-value is less than 0.05, we reject H0:
H0 → All groups have the same median job satisfaction score.
We conclude that at least one department has a statistically significant different median job satisfaction score.
If the p-value is greater than or equal to 0.05, we fail to reject H0:
H0 → All groups have the same median job satisfaction score.
We conclude that there is not enough evidence to say that the median job satisfaction scores differ significantly between departments.
The Kruskal-Wallis test is useful when comparing non-normally distributed data across multiple groups.
We compare k independent groups with sample sizes n1, n2, …, nk.
H = (12 / (N * (N + 1))) * Σi=1k (Ri2 / ni) - 3 * (N + 1)
where:
Under H0, H follows an approximate chi-square distribution with k - 1 degrees of freedom.
A researcher wants to compare test scores among three teaching methods:
# Simulated test scores for three groups
set.seed(42)
traditional <- rnorm(15, mean = 70, sd = 10)
online <- rnorm(15, mean = 75, sd = 12)
hybrid <- rnorm(15, mean = 78, sd = 9)
# Create a dataframe
df_kruskal <- data.frame(
Method = rep(c("Traditional", "Online", "Hybrid"), each = 15),
Score = c(traditional, online, hybrid)
)
# Display first few rows
kable(head(df_kruskal), caption = "First Few Rows of Test Scores Data")
Method | Score |
---|---|
Traditional | 83.70958 |
Traditional | 64.35302 |
Traditional | 73.63128 |
Traditional | 76.32863 |
Traditional | 74.04268 |
Traditional | 68.93875 |
# Perform Kruskal-Wallis Test
kruskal_test <- kruskal.test(Score ~ Method, data = df_kruskal)
# Print test results
kruskal_test
##
## Kruskal-Wallis rank sum test
##
## data: Score by Method
## Kruskal-Wallis chi-squared = 0.49237, df = 2, p-value = 0.7818
If the p-value is less than 0.05, we reject H0:
H0 → All groups have the same distribution. (Or, more formally: The distributions of all groups are identical.)
We conclude that there is a statistically significant difference in distribution between at least one pair of groups. (It’s important to note that this doesn’t tell us which* groups are different, just that at least one pair is.)*
If the p-value is greater than or equal to 0.05, we fail to reject H0:
H0 → All groups have the same distribution.
We conclude that there is not enough evidence to say that the distributions differ significantly between the groups.
ggplot(df_kruskal, aes(x = Method, y = Score, fill = Method)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Test Scores by Teaching Method",
x = "Teaching Method",
y = "Test Score") +
theme_minimal()
## 2. Post-Hoc Analysis (Pairwise Comparisons)
If the Kruskal-Wallis test finds a statistically significant difference, we need to perform post-hoc pairwise comparisons to determine which groups are different.
Dunn’s test (or a similar post-hoc test like the Conover-Iman test) is commonly used for these pairwise comparisons after a Kruskal-Wallis test.
# Ensure PMCMRplus is installed
if (!requireNamespace("PMCMRplus", quietly = TRUE)) {
install.packages("PMCMRplus")
}
# Load necessary libraries
library(PMCMRplus)
library(dplyr)
# Check for missing values in Method and Score columns
df_kruskal <- df_kruskal %>%
filter(!is.na(Method) & !is.na(Score)) # Remove NAs if any
# Ensure that 'Method' is a factor with valid levels
df_kruskal$Method <- as.factor(df_kruskal$Method)
# Perform the Kruskal-Wallis test
kruskal_test <- kruskal.test(Score ~ Method, data = df_kruskal)
# Check the structure of the data
str(df_kruskal) # Make sure 'Method' is a factor and 'Score' is numeric
## 'data.frame': 45 obs. of 2 variables:
## $ Method: Factor w/ 3 levels "Hybrid","Online",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ Score : num 83.7 64.4 73.6 76.3 74 ...
# Perform post-hoc pairwise comparisons using Nemenyi Test
posthoc_test <- kwAllPairsNemenyiTest(Score ~ Method, data = df_kruskal)
# Print post-hoc test results
posthoc_test
## Hybrid Online
## Online 0.77 -
## Traditional 0.98 0.88
The Kruskal-Wallis test evaluates whether there is a significant difference in the distribution of scores across multiple independent groups.
\[ \text{H}_{0}: \text{All departments have the same score distribution.} \] \[ \text{H}_{A}: \text{At least one department has a different score distribution.} \]
\[ \chi^2 = 11.22, \quad df = 2, \quad p = 0.003661 \]
\[ \text{H}_{0}: \text{All teaching methods have the same score distribution.} \] \[ \text{H}_{A}: \text{At least one method has a different score distribution.} \]
\[ \chi^2 = 0.49237, \quad df = 2, \quad p = 0.7818 \]
The Nemenyi post-hoc test is used to identify which specific groups differ significantly.
\[ \begin{array}{c|ccc} & \text{Hybrid} & \text{Online} & \text{Traditional} \\ \hline \text{Online} & 0.77 & - & - \\ \text{Traditional} & 0.98 & 0.88 & - \\ \end{array} \]
PlantGrowth
in RThe PlantGrowth
dataset records the weights of plants
under three different conditions:
ctrl
(Control)trt1
(Treatment 1)trt2
(Treatment 2)We will compare plant weights across these three conditions using the Kruskal-Wallis test.
# Load dataset
data("PlantGrowth")
# Perform Kruskal-Wallis Test
kruskal_real <- kruskal.test(weight ~ group, data = PlantGrowth)
# Display first few rows
kable(head(PlantGrowth), caption = "First Few Rows of Plant Growth Data")
weight | group |
---|---|
4.17 | ctrl |
5.58 | ctrl |
5.18 | ctrl |
6.11 | ctrl |
4.50 | ctrl |
4.61 | ctrl |
# Print test result
kruskal_real
##
## Kruskal-Wallis rank sum test
##
## data: weight by group
## Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
If the p-value is less than 0.05, we reject H0:
H0 → The plant weights are identically distributed across all three conditions.
We conclude that there is a statistically significant difference in plant weight between at least one pair of the three conditions (ctrl, trt1, and trt2).
If the p-value is greater than or equal to 0.05, we fail to reject H0:
H0 → The plant weights are identically distributed across all three conditions.
We conclude that there is not enough evidence to say that the plant weights differ significantly across the three conditions.
ggplot(PlantGrowth, aes(x = group, y = weight, fill = group)) +
geom_boxplot(alpha = 0.7) +
labs(title = "Plant Growth by Treatment Group",
x = "Treatment Group",
y = "Plant Weight") +
theme_minimal()
A company evaluates employee productivity under three different shift schedules.
Employee | Morning | Afternoon | Night |
---|---|---|---|
1 | 80 | 85 | 78 |
2 | 75 | 82 | 76 |
3 | 78 | 88 | 80 |
4 | 82 | 90 | 85 |
5 | 79 | 87 | 79 |
… | … | … | … |
morning <- c(80, 75, 78, 82, 79, 84, 81, 77)
afternoon <- c(85, 82, 88, 90, 87, 92, 89, 86)
night <- c(78, 76, 80, 85, 79, 83, 81, 77)
# Create dataframe
df_productivity <- data.frame(
Shift = rep(c("Morning", "Afternoon", "Night"), each = 8),
Productivity = c(morning, afternoon, night)
)
# Kruskal-Wallis Test
kruskal_exercise <- kruskal.test(Productivity ~ Shift, data = df_productivity)
# Print results
kruskal_exercise
##
## Kruskal-Wallis rank sum test
##
## data: Productivity by Shift
## Kruskal-Wallis chi-squared = 13.561, df = 2, p-value = 0.001136
The Kruskal-Wallis test is a robust non-parametric alternative to ANOVA, especially when:
It is widely used in fields like medicine, psychology, and social sciences, where the assumption of normality is often not met.