library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(agricolae)
type1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
type2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
type3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
type4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid <- data.frame(type1,type2,type3,type4)
fluid <- pivot_longer(data = fluid, c(type1,type2,type3,type4))
To test if the fluids differ, let’s test if the means are different by testing the following hypothesis:
\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]
To check if we can reject \(H_o\) or not, let’s perform an ANOVA test:
aov.fluid <- aov(value~name, data=fluid)
summary(aov.fluid)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.16 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the P-value (P = 0.0525) is not smaller than \(\alpha\) (\(\alpha\) = 0.05) we can not reject \(H_o\)
Even with the null hypothesis not being rejected, the p-value is too close to \(\alpha\), which indicates that there might be a level that actually differs from the others.
Therefore, let’s perform a Fisher test to check if there is at list one level that is different from the rest:
print(LSD.test(aov.fluid,"name",alpha=0.05))
## $statistics
## MSerror Df Mean CV t.value LSD
## 3.299667 20 19.09167 9.514614 2.085963 2.187666
##
## $parameters
## test p.ajusted name.t ntr alpha
## Fisher-LSD none name 4 0.05
##
## $means
## value std r LCL UCL Min Max Q25 Q50 Q75
## type1 18.65000 1.952178 6 17.10309 20.19691 16.3 21.6 17.450 18.25 19.800
## type2 17.95000 1.854454 6 16.40309 19.49691 15.3 20.3 16.950 17.85 19.275
## type3 20.95000 1.879096 6 19.40309 22.49691 18.5 23.6 19.675 20.95 22.075
## type4 18.81667 1.554885 6 17.26975 20.36358 16.9 21.1 17.700 18.80 19.675
##
## $comparison
## NULL
##
## $groups
## value groups
## type3 20.95000 a
## type4 18.81667 ab
## type1 18.65000 b
## type2 17.95000 b
##
## attr(,"class")
## [1] "group"
Conclusion: from the Fisher test, we can actually confirm that the Type 3 of Fluid is different from the others. Therefore, if I were to choose one of the groups, I would choose the third one.
plot(aov.fluid,1)
From the plot shown above, it is possble to see that the the first column of data has a large variance, because the size of the column is a little bit bigger when compared to the others.
But, when the scale of the plot is checked, it is possible to observe that the variance is not different enough, and, therefore, it is possible to admit that the hypothesis of constant variance between the levels is obeyed.
library(tidyr)
library(dplyr)
type1 <- c(110, 157, 194, 178)
type2 <- c(1, 2, 4, 18)
type3 <- c(880, 1256, 5276, 4355)
type4 <- c(495, 7040, 5307, 10050)
type5 <- c(7, 5, 29, 2)
mat <- data.frame(type1,type2,type3,type4,type5)
mat <- pivot_longer(data = mat, c(type1,type2,type3,type4,type5))
To perform the test of means, let’s formulate the following hypothesis:
\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 = \mu \\ H_a: A\; least\; one\; mean\; \mu_k\; is\; different \]
To test the formulated hypothesis, we can perform an ANOVA test. Therefore, considering an \(\alpha=0.05\),
aov.mat <- aov(value~name, data=mat)
summary(aov.mat)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 103191489 25797872 6.191 0.00379 **
## Residuals 15 62505657 4167044
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since \(p-value=0.0038<\alpha=0.05\), we can reject \(H_o\) and conclude that there is at least one \(\mu_k\) that is different from an overall mean.
plot(aov.mat,1)
From the plot above, it is possible to understand that the strong assumption of contant variance is not obeyed. Therefore, the ANOVA test is not the appropriate model to test this hypothesis.
plot(aov.mat,2)
From the normal plot, we can conclude that the data is not normal. Which also indicates that it would be better to test the hypothesis by using a non-parametric test.
Since the data is neither normal nor has a constant variance, we have two options:
For this problem, let’s perform a non-parametric test using the Krukall-Wallis test:
kruskal.test(value~name, data=mat)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046
Since \(p-value = 0.002 < \alpha=0.05\), we can successfully reject \(H_o\).
library(tidyr)
library(dplyr)
type1 <- c(31, 10, 21, 4, 1)
type2 <- c(62, 40, 24, 30, 35)
type3 <- c(53, 27, 120, 97, 68)
method <- data.frame(type1,type2,type3)
method <- pivot_longer(data = method, c(type1,type2,type3))
To test if the methods differ, let’s test if the means are different by testing the following hypothesis:
\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]
To check if we can reject \(H_o\) or not, let’s perform an ANOVA test (assuming an \(\alpha=0.05\):
aov.method <- aov(value~name, dat=method)
summary(aov.method)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 8964 4482 7.914 0.00643 **
## Residuals 12 6796 566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since \(p-value = 0.006 < \alpha = 0.05\), we can successfully reject \(H_o\).
plot(aov.method,1)
plot(aov.method,2)
According to the plots above, the assumption of constant variance is clearly not obeyed. Therefore, the ANOVA test is not the correct statistical model to be applied in this situation.
Since the data is normal and only the variance differs, let’s try to transform the data through a Box Cox transformation.
library(MASS)
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
boxcox(value~name, data=method)
From the plot it is possible to assume a \(\lambda=0.5\) for the transformation.
Therefore, we get:
lambda <- 0.5
method_transf <- method
method_transf$value <- method_transf$value^lambda
aov.method.transformed <- aov(value~name,data=method_transf)
plot(aov.method.transformed,1)
Since the results still have discrepant variances, we can conclude that the transformation failed. Therefore, let’s perform a non-parametric test even with a normal data, since the hypothesis of constant variance and normality are not hypothesis that would interfere with the test:
kruskal.test(value~name, data=method)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 8.54, df = 2, p-value = 0.01398
Conclusion: Since \(p-value=0.014 < \alpha=0.05 \;(assumed)\), we can reject \(H_o\).
summary(aov.method.transformed)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 63.90 31.95 9.84 0.00295 **
## Residuals 12 38.96 3.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From the p-value of the ANOVA test it is possible to conclude that actually the difference in the variance is not actually that significant, since the results matched and both reject Ho.
library(tidyr)
library(dplyr)
library(agricolae)
type1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
type2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
type3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
type4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)
fluid <- data.frame(type1,type2,type3,type4)
fluid <- pivot_longer(data = fluid, c(type1,type2,type3,type4))
\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]
kruskal.test(value~name, data=fluid)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
From the results of the Kruskall-Wallis test, we can reject Ho.
As we could see in the problem 3.23, even with a not reject Ho from the ANOVA test, we could find a level that was actually different from the rest of the tests. In other words, Ho was actually rejected.
From the results obtained from the Kruskall-Wallis test, we could also reject Ho.
Therefore, we can conclude that the results are comparable and that the Krukall-Wallis test gave us a better result since we did not need to confirm the difference of means with a Fisher test, because Ho was already rejected.