Name: Sabiena Glen
Date: March 24, 2023
Kruskal-Wallis test by rank is a non-parametric alternative to one-way ANOVA test, which extends the two-samples Wilcoxon test in the situation where there are more than two groups. It’s recommended when the assumptions of one-way ANOVA test are not met.
We’ll use the built-in R data set named PlantGrowth. It contains the weight of plants obtained under a control and two different treatment conditions.
The dataset PlantGrowth is attached and is displayed below.
attach(PlantGrowth)
df <- PlantGrowth
table(df)
## group
## weight ctrl trt1 trt2
## 3.59 0 1 0
## 3.83 0 1 0
## 4.17 1 1 0
## 4.32 0 1 0
## 4.41 0 1 0
## 4.5 1 0 0
## 4.53 1 0 0
## 4.61 1 0 0
## 4.69 0 1 0
## 4.81 0 1 0
## 4.89 0 1 0
## 4.92 0 0 1
## 5.12 0 0 1
## 5.14 1 0 0
## 5.17 1 0 0
## 5.18 1 0 0
## 5.26 0 0 1
## 5.29 0 0 1
## 5.33 1 0 0
## 5.37 0 0 1
## 5.5 0 0 1
## 5.54 0 0 1
## 5.58 1 0 0
## 5.8 0 0 1
## 5.87 0 1 0
## 6.03 0 1 0
## 6.11 1 0 0
## 6.15 0 0 1
## 6.31 0 0 1
The dataset PlantGrowth is checked below.
# load dataset
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 3):
##
## group, weight
# print the head of the dataset
head(PlantGrowth)
## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
# print the dimensions of the dataset
dim(PlantGrowth)
## [1] 30 2
# print the names of the columns in the dataset
names(PlantGrowth)
## [1] "weight" "group"
In R terminology, the column “group” is called factor and the different categories (“ctr”, “trt1”, “trt2”) are named factor levels. The levels are ordered alphabetically.
The group levels are shown below.
# Show the group levels
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 3):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 4):
##
## group, weight
my_data <- PlantGrowth
levels(my_data$group)
## [1] "ctrl" "trt1" "trt2"
If the levels are not automatically in the correct order, re-order them as follows:
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 3):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 4):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 5):
##
## group, weight
my_data <- PlantGrowth
my_data$group <- factor(my_data$group,
levels = c('ctrl', 'trt1', 'trt2'))
levels(my_data$group)
## [1] "ctrl" "trt1" "trt2"
It’s possible to compute summary statistics by groups. The dplyr package can be used.
#Compute summary statistics by groups:
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 4):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 5):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 6):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 7):
##
## group, weight
my_data <- PlantGrowth
group_by(my_data, group) %>%
summarise(
count = n(),
mean = mean(weight, na.rm = TRUE),
sd = sd(weight, na.rm = TRUE),
median = median(weight, na.rm = TRUE),
IQR = IQR(weight, na.rm = TRUE)
)
## # A tibble: 3 × 6
## group count mean sd median IQR
## <fct> <int> <dbl> <dbl> <dbl> <dbl>
## 1 ctrl 10 5.03 0.583 5.15 0.743
## 2 trt1 10 4.66 0.794 4.55 0.662
## 3 trt2 10 5.53 0.443 5.44 0.467
The boxplot of the weight column of the dataset PlantGrowth is shown below.
# Boxplot of weight
boxplot(PlantGrowth$weight)
The boxplot of the group column of the dataset PlantGrowth is shown below.
# Boxplot of group
boxplot(PlantGrowth$group)
A boxplot that plots weight by group and colour by group.
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 3):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 5):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 6):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 7):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 8):
##
## group, weight
my_data <- PlantGrowth
library(ggpubr)
## Warning: package 'ggpubr' was built under R version 4.2.3
## Loading required package: ggplot2
ggboxplot(my_data, x = "group", y = "weight",
color = "group", palette = c("#00AFBB", "#E7B800", "#FC4E07"),
order = c("ctrl", "trt1", "trt2"),
ylab = "Weight", xlab = "Treatment")
library("ggpubr")
ggline(my_data, x = "group", y = "weight",
add = c("mean_se", "jitter"),
order = c("ctrl", "trt1", "trt2"),
ylab = "Weight", xlab = "Treatment")
The Kruskal-Wallis test is done and the results are shown below.
attach(PlantGrowth)
## The following objects are masked from PlantGrowth (pos = 5):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 6):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 8):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 9):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 10):
##
## group, weight
## The following objects are masked from PlantGrowth (pos = 11):
##
## group, weight
my_data <- PlantGrowth
kruskal.test(weight ~ group, data = my_data)
##
## Kruskal-Wallis rank sum test
##
## data: weight by group
## Kruskal-Wallis chi-squared = 7.9882, df = 2, p-value = 0.01842
As the p-value is less than the significance level 0.05, we can conclude that there are significant differences between the treatment groups.
From the output of the Kruskal-Wallis test, we know that there is a significant difference between groups, but we don’t know which pairs of groups are different.
It’s possible to use the function pairwise.wilcox.test() to calculate pairwise comparisons between group levels with corrections for multiple testing.
pairwise.wilcox.test(PlantGrowth$weight, PlantGrowth$group,
p.adjust.method = "BH")
## Warning in wilcox.test.default(xi, xj, paired = paired, ...): cannot compute
## exact p-value with ties
##
## Pairwise comparisons using Wilcoxon rank sum test with continuity correction
##
## data: PlantGrowth$weight and PlantGrowth$group
##
## ctrl trt1
## trt1 0.199 -
## trt2 0.095 0.027
##
## P value adjustment method: BH
The pairwise comparison shows that, only trt1 and trt2 are significantly different (p < 0.05).