Kruskal-Wallis Test
A collection of data samples are independent if they come from unrelated populations and the samples do not affect each other. Using the Kruskal-Wallis Test, we can decide whether the population distributions are identical without assuming them to follow the normal distribution.
Kruskal-Wallis test by rank is a non-parametric alternative to one-way ANOVA test, which extends the two-samples Wilcoxon test in the situation where there are more than two groups. It’s recommended when the assumptions of one-way ANOVA test are not met.
PlanthGrowth Data
Here, we’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
The column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.
## [1] "ctrl" "trt1" "trt2"
If the levels are not automatically in the correct order, re-order them as follow:
## Compute summary statistics by groups:
PlantGrowth %>% group_by( group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE),
median = median(weight, na.rm = TRUE),
IQR = IQR(weight, na.rm = TRUE)
)
## # A tibble: 3 x 6
## group count mean sd median IQR
## <ord> <int> <dbl> <dbl> <dbl> <dbl>
## 1 ctrl 10 5.03 0.583 5.15 0.743
## 2 trt1 10 4.66 0.794 4.55 0.662
## 3 trt2 10 5.53 0.443 5.44 0.467
Visualize the data using box plots
Here we will use the ggpubr R package for an easy ggplot2-based data visualization.
## Warning: package 'ggpubr' was built under R version 4.4.1
ggboxplot(PlantGrowth, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
order = c("ctrl", "trt1", "trt2"),
ylab = "Weight", xlab = "Treatment")+
ggtitle("Kruskal-Wallis Test in R")
library("ggpubr")
ggline(PlantGrowth, x = "group", y = "weight",
add = c("mean_se", "jitter"),
order = c("ctrl", "trt1", "trt2"),
ylab = "Weight", xlab = "Treatment") +
ggtitle("Kruskal-Wallis Test in R")
Compute Kruskal-Wallis test
We want to know if there is any significant difference between the average weights of plants in the 3 experimental conditions.
The test can be performed using the function kruskal.test() as follow:
##
## Kruskal-Wallis rank sum test
##
## data: weight by group
## Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
Interpretation
As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the treatment groups.
Multiple pairwise-comparison between groups
From the output of the Kruskal-Wallis test, we know that there is a significant difference between groups, but we don’t know which pairs of groups are different.
It’s possible to use the function pairwise.wilcox.test() to calculate pairwise comparisons between group levels with corrections for multiple testing.
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: PlantGrowth$weight and PlantGrowth$group
##
## ctrl trt1
## trt1 0.199 -
## trt2 0.095 0.027
##
## P value adjustment method: BH
The pairwise comparison shows that, only trt1 and trt2 are significantly different (p < 0.05).