Non parametric ANOVA

Daria Skarbek, 184869

Published on 23.01.2022

Non - Parametric Alternatives to ANOVA

A presentation of two alternatives to One-way ANOVA based on exercises from 7.01.

Kruskal - Wallis ANOVA

First, we will perform an alternative to One-Way ANOVA for independent samples.
This test is an extension of Wilcoxon Rank Test that can be used for comparing results of two samples.

com_science <- c(6.4, 6.8, 7.2, 8.3, 8.4, 9.1, 9.4, 9.7)
economics <- c(2.5, 3.7, 4.9, 5.4, 5.9, 8.1, 8.2)
statictic <- c(1.3, 4.1, 4.9, 5.2, 5.5, 8.2)
Computer Science Economics Statistics
6.4 2.5 1.3
6.8 3.7 4.1
7.2 4.9 4.9
8.3 5.4 5.2
8.4 5.9 5.5
9.1 8.1 8.2
9.4 8.2 NaN
9.7 NaN NaN

The data given are presented above. For each of three categories: computer science, economics and statistics we have respectively 8, 7 and 6 observations.

We can visualize the data using a boxplot:

ggplot(data = data1, aes(x = score, y = study, fill = study)) + geom_boxplot() + 
  theme_minimal() + theme(legend.position = 'none') + xlab('Score') + 
  ylab('Type of Studies') + scale_fill_brewer(name = "", palette="Greens") + 
  ggtitle("Distribution of scores based on type of studies")

For the following study we can create corresponding hypothesis:
H0: The scores of students does not depend on type of studies.
H1: The scores of students not depend on type of studies.

To check it, we will perform Kruskal-Wallis ANOVA using function kruskal.test.

kruskal.test(score ~ study, data = data1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  score by study
## Kruskal-Wallis chi-squared = 9.8491, df = 2, p-value = 0.007266

For significance level 0.05 we observe that our p-value is much smaller. Hence, we can easily reject the null hypothesis and decide that the scores do depend on the type of studies.

Friedman Test

Second exercise asked us to compare results of dependent samples. Hence, the corresponding test will be Repeated Measures ANOVA for 2 samples, or as in this example Friedman Test for more that 2 samples.

We will start with creating data from the exercise.

danonek <- c(4.6, 6.8, 6.6, 5.8, 5.4, 6.2, 7.0, 5.4)
activia <- rep(5.0, times = 8)
zott <- c(3.4, 5, 4.6, 4.2, 4.6, 5.0, 6.6, 5.4)
id <- c(1:8)
ID Danonek Activia Zott
1 4.6 5 3.4
2 6.8 5 5.0
3 6.6 5 4.6
4 5.8 5 4.2
5 5.4 5 4.6
6 6.2 5 5.0
7 7.0 5 6.6
8 5.4 5 5.4

We can see the data for 8 participants who have given a grade to three different types of yogurts.

We can visualize the data using a boxplot:

ggplot(data = data3, aes(x = score, y = yogurt, fill = yogurt)) + geom_boxplot() + 
  geom_jitter() + theme_minimal() + theme(legend.position = 'none') + xlab('Score') + 
  ylab('Type of yogurt') + scale_fill_brewer(name = "", palette="Greens") + 
  ggtitle("Distribution of scores for each type of yogurt")

ggplot(data = data3, aes(x = score, y = yogurt, color = yogurt)) + geom_point() + 
  facet_wrap(~id) + theme(legend.position = 'none') + xlab('') + ylab('') + 
  ggtitle('Distribution of scores for each type of yogurt for each participant')

On the first plot we can see general scores for all participants. Hence, we can observe that on average the highest scores were given to Danonek, while the lowest to Zott.

On the second plot, we can observe scores given particular participants - meaning the value for danonek, zott and activia for each responder.

We state the null hypothesis for the exercise as:

H0: There is no difference between scores for different yogurts for the participants.

We will test it using Friedman Test.

friedman.test(score ~ yogurt | id, data = data3)
## 
##  Friedman rank sum test
## 
## data:  score and yogurt and id
## Friedman chi-squared = 9.1724, df = 2, p-value = 0.01019

Here we get low p-value being 0.01. Hence, we can reject the null hypothesis and state that there is a signifcant different scores given to each yogurt.