Friedman Test in R

Data Preparation

We’ll use the self esteem score dataset measured over three time points. The data is available in the datarium package.

data("selfesteem", package = "datarium")
paged_table(head(selfesteem, 3))

Gather columns t1, t2 and t3 into long format. Convert id and time variables into factor (or grouping) variables:

selfesteem <- selfesteem %>%
  gather(key = "time", value = "score", t1, t2, t3) %>%
  convert_as_factor(id, time)
paged_table(head(selfesteem, 3))
Summary Statistics

Compute some summary statistics of the self-esteem score by groups (time):

a<-selfesteem %>%
  group_by(time) %>%
  get_summary_stats(score, type = "common")
paged_table(a)
Visualization

Create a box plot and add points corresponding to individual values

ggboxplot(selfesteem, x = "time", y = "score", add = "jitter")

Computation

We’ll use the pipe-friendly friedman_test() function [rstatix package], a wrapper around the R base function friedman.test().

res.fried <- selfesteem %>% friedman_test(score ~ time |id)
paged_table(res.fried)

The self esteem score was statistically significantly different at the different time points during the diet, X2(2) = 18.2, p = 0.0001.

Effect Size

The Kendall’s W can be used as the measure of the Friedman test effect size. It is calculated as follow : W = X2/N(K-1); where W is the Kendall’s W value; X2 is the Friedman test statistic value; N is the sample size. k is the number of measurements per subject (M. T. Tomczak and Tomczak 2014).

The Kendall’s W coefficient assumes the value from 0 (indicating no relationship) to 1 (indicating a perfect relationship).

Kendall’s W uses the Cohen’s interpretation guidelines of 0.1 - < 0.3 (small effect), 0.3 - < 0.5 (moderate effect) and >= 0.5 (large effect). Confidence intervals are calculated by bootstap.

b<-selfesteem %>% friedman_effsize(score ~ time |id)
paged_table(b)

A large effect size is detected, W = 0.91.

Multiple Pairwise-Comparisons

From the output of the Friedman test, we know that there is a significant difference between groups, but we don’t know which pairs of groups are different.

A significant Friedman test can be followed up by pairwise Wilcoxon signed-rank tests for identifying which groups are different.

Pairwise comparisons using paired Wilcoxon signed-rank test. P-values are adjusted using the Bonferroni multiple testing correction method.

Pairwise Comparisons
pwc <- selfesteem %>%
  wilcox_test(score ~ time, paired = TRUE, p.adjust.method = "bonferroni")
paged_table(pwc)

All the pairwise differences are statistically significant.

Pairwise comparisons using sign test:
pwc2 <- selfesteem %>%
  sign_test(score ~ time, p.adjust.method = "bonferroni")
paged_table(pwc2)
Report

The self-esteem score was statistically significantly different at the different time points using Friedman test, X2(2) = 18.2, p = 0.00011.

Pairwise Wilcoxon signed rank test between groups revealed statistically significant differences in self esteem score between t1 and t2 (p = 0.006); t1 and t3 (0.006); t2 and t3 (0.012).

pwc <- pwc %>% add_xy_position(x = "time")
ggboxplot(selfesteem, x = "time", y = "score", add = "point") +
  stat_pvalue_manual(pwc, hide.ns = TRUE) +
  labs(
    subtitle = get_test_label(res.fried,  detailed = TRUE),
    caption = get_pwc_label(pwc)
  )