1 Kruskal-Wallis test

Kruskal-Wallis test by rank is a non-parametric alternative to one-way ANOVA test, which extends the two-samples Wilcoxon test in the situation where there are more than two groups. Its recommended when the assumptions of one-way ANOVA test are not met. This tutorial describes how to compute Kruskal-Wallis test in R software.

2 The Kruskal Wallis chi-square statistic

The “Kruskal Wallis chi-squared” value reported by the R function is equal to the statistic H that is computed in the test. If there are no ties then

where is the mean of the ranks in the i-th sample and is the mean of all ranks.

It is named like this because the statistic follows approximately a chi squared distribution. Under the hood you can see it as the means being approximated as normal distributions with variance

3 The p-value

For the Kruskal Wallis test the p-value is a way to indicate how extreme a particular measurement Hobserved is by stating the probabilty that the value for an experiment when the null hypothesis is true, Hif H0 true, would be equal or higher.

If the null hypothesis is false then you will be more likely to get such high/extreme values, thus when you observe an unlikely (ie low p-value) extreme value H this indicates that the null/no-effect hypothesis may be false or at least is not supported by the data.

library(readr)
Madrid <- read_delim("la Universidad Carlos III de Madrid completo.csv", ";", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
##   dimensoes = col_character(),
##   variaveis = col_character(),
##   valor = col_double()
## )
Tecnopuc <- read_delim("Tecnopuc.csv", ";", escape_double = FALSE, trim_ws = TRUE)
## Parsed with column specification:
## cols(
##   Dimensoes = col_character(),
##   variaveis = col_character(),
##   valor = col_double()
## )

4 Data Summary of la Universidad Carlos III de Madrid Tecnopark

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
summary(Madrid$valor)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -50.00    0.00   20.00   17.26   30.00   50.00

5 Data Summary of TECNOPUC - RS / Brazil

summary(Tecnopuc$valor)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -10.00   10.00   30.00   24.58   40.00   50.00

6 kruskal test applied to la Universidad Carlos III de Madrid Tecnopark data and TECNOPUC data

kruskal.test(Madrid$valor,Tecnopuc$valor)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Madrid$valor and Tecnopuc$valor
## Kruskal-Wallis chi-squared = 31.944, df = 6, p-value = 1.673e-05

7 Conclusions

The value of the test statistic is 31.944. This value already contains the fix when there are ties (repetitions). The p-value is greater than 0.05; also the value of the test statistic is lower than the chi-square-tabulation:

The conclusion is therefore that we accept the null hypothesis H0: the means of the 2 groups are statistically equal.

8 References

Kruskal, William H., and W. Allen Wallis. “Use of ranks in one-criterion variance analysis.” Journal of the American statistical Association 47.260 (1952): 583-621. https://doi.org/10.1080/01621459.1952.10483441

Sextus Empiricus (https://stats.stackexchange.com/users/164061/sextus-empiricus), Meaning of chi-squared in R Kruskal-Wallis test, URL (version: 2018-12-14): https://stats.stackexchange.com/q/381887