Homework - Week 6

Question 3.23

Reading Data

library(tidyr)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(agricolae)

type1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
type2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
type3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
type4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

fluid <- data.frame(type1,type2,type3,type4)

fluid <- pivot_longer(data = fluid, c(type1,type2,type3,type4))

Item a

To test if the fluids differ, let’s test if the means are different by testing the following hypothesis:

\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]

To check if we can reject \(H_o\) or not, let’s perform an ANOVA test:

aov.fluid <- aov(value~name, data=fluid)
summary(aov.fluid)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## name         3  30.16   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the P-value (P = 0.0525) is not smaller than \(\alpha\) (\(\alpha\) = 0.05) we can not reject \(H_o\)

Item b

Even with the null hypothesis not being rejected, the p-value is too close to \(\alpha\), which indicates that there might be a level that actually differs from the others.

Therefore, let’s perform a Fisher test to check if there is at list one level that is different from the rest:

print(LSD.test(aov.fluid,"name",alpha=0.05))

## $statistics
##    MSerror Df     Mean       CV  t.value      LSD
##   3.299667 20 19.09167 9.514614 2.085963 2.187666
## 
## $parameters
##         test p.ajusted name.t ntr alpha
##   Fisher-LSD      none   name   4  0.05
## 
## $means
##          value      std r      LCL      UCL  Min  Max    Q25   Q50    Q75
## type1 18.65000 1.952178 6 17.10309 20.19691 16.3 21.6 17.450 18.25 19.800
## type2 17.95000 1.854454 6 16.40309 19.49691 15.3 20.3 16.950 17.85 19.275
## type3 20.95000 1.879096 6 19.40309 22.49691 18.5 23.6 19.675 20.95 22.075
## type4 18.81667 1.554885 6 17.26975 20.36358 16.9 21.1 17.700 18.80 19.675
## 
## $comparison
## NULL
## 
## $groups
##          value groups
## type3 20.95000      a
## type4 18.81667     ab
## type1 18.65000      b
## type2 17.95000      b
## 
## attr(,"class")
## [1] "group"

Conclusion: from the Fisher test, we can actually confirm that the Type 3 of Fluid is different from the others. Therefore, if I were to choose one of the groups, I would choose the third one.

Item c

plot(aov.fluid,1)

From the plot shown above, it is possble to see that the the first column of data has a large variance, because the size of the column is a little bit bigger when compared to the others.

But, when the scale of the plot is checked, it is possible to observe that the variance is not different enough, and, therefore, it is possible to admit that the hypothesis of constant variance between the levels is obeyed.

Question 3.28

Reading Data

library(tidyr)
library(dplyr)

type1 <- c(110, 157, 194, 178)
type2 <- c(1, 2, 4, 18)
type3 <- c(880, 1256, 5276, 4355)
type4 <- c(495, 7040, 5307, 10050)
type5 <- c(7, 5, 29, 2)

mat <- data.frame(type1,type2,type3,type4,type5)

mat <- pivot_longer(data = mat, c(type1,type2,type3,type4,type5))

Item a

To perform the test of means, let’s formulate the following hypothesis:

\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu_5 = \mu \\ H_a: A\; least\; one\; mean\; \mu_k\; is\; different \]

To test the formulated hypothesis, we can perform an ANOVA test. Therefore, considering an \(\alpha=0.05\),

aov.mat <- aov(value~name, data=mat)
summary(aov.mat)

##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## name         4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since \(p-value=0.0038<\alpha=0.05\), we can reject \(H_o\) and conclude that there is at least one \(\mu_k\) that is different from an overall mean.

Item b

plot(aov.mat,1)

From the plot above, it is possible to understand that the strong assumption of contant variance is not obeyed. Therefore, the ANOVA test is not the appropriate model to test this hypothesis.

plot(aov.mat,2)

From the normal plot, we can conclude that the data is not normal. Which also indicates that it would be better to test the hypothesis by using a non-parametric test.

Item c

Since the data is neither normal nor has a constant variance, we have two options:

Transform the data;
Perform a non-parametric test.

For this problem, let’s perform a non-parametric test using the Krukall-Wallis test:

kruskal.test(value~name, data=mat)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

Since \(p-value = 0.002 < \alpha=0.05\), we can successfully reject \(H_o\).

Question 3.29

Reading Data

library(tidyr)
library(dplyr)

type1 <- c(31, 10, 21, 4, 1)
type2 <- c(62, 40, 24, 30, 35)
type3 <- c(53, 27, 120, 97, 68)

method <- data.frame(type1,type2,type3)

method <- pivot_longer(data = method, c(type1,type2,type3))

Item a

To test if the methods differ, let’s test if the means are different by testing the following hypothesis:

\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]

To check if we can reject \(H_o\) or not, let’s perform an ANOVA test (assuming an \(\alpha=0.05\):

aov.method <- aov(value~name, dat=method)
summary(aov.method)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## name         2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since \(p-value = 0.006 < \alpha = 0.05\), we can successfully reject \(H_o\).

Item b

plot(aov.method,1)

plot(aov.method,2)

According to the plots above, the assumption of constant variance is clearly not obeyed. Therefore, the ANOVA test is not the correct statistical model to be applied in this situation.

Item c

Since the data is normal and only the variance differs, let’s try to transform the data through a Box Cox transformation.

library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

boxcox(value~name, data=method)

From the plot it is possible to assume a \(\lambda=0.5\) for the transformation.

Therefore, we get:

lambda <- 0.5

method_transf <- method
method_transf$value <- method_transf$value^lambda

aov.method.transformed <- aov(value~name,data=method_transf)
plot(aov.method.transformed,1)

Since the results still have discrepant variances, we can conclude that the transformation failed. Therefore, let’s perform a non-parametric test even with a normal data, since the hypothesis of constant variance and normality are not hypothesis that would interfere with the test:

kruskal.test(value~name, data=method)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 8.54, df = 2, p-value = 0.01398

Conclusion: Since \(p-value=0.014 < \alpha=0.05 \;(assumed)\), we can reject \(H_o\).

Extra: Camparision between the Kruskall-Wallis and ANOVA transformed

summary(aov.method.transformed)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## name         2  63.90   31.95    9.84 0.00295 **
## Residuals   12  38.96    3.25                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the p-value of the ANOVA test it is possible to conclude that actually the difference in the variance is not actually that significant, since the results matched and both reject Ho.

Question 3.51 and 3.52

Reading the data for question 3.23

library(tidyr)
library(dplyr)
library(agricolae)

type1 <- c(17.6, 18.9, 16.3, 17.4, 20.1, 21.6)
type2 <- c(16.9, 15.3, 18.6, 17.1, 19.5, 20.3)
type3 <- c(21.4, 23.6, 19.4, 18.5, 20.5, 22.3)
type4 <- c(19.3, 21.1, 16.9, 17.5, 18.3, 19.8)

fluid <- data.frame(type1,type2,type3,type4)

fluid <- pivot_longer(data = fluid, c(type1,type2,type3,type4))

Hypothesis from question 3.23

\[ H_o: \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \\ H_a: At \; least \; one \; \mu_k \; is \; different \]

Performing the Kruskal-Wallis test:

kruskal.test(value~name, data=fluid)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Conclusion

3.51

From the results of the Kruskall-Wallis test, we can reject Ho.

3.52

As we could see in the problem 3.23, even with a not reject Ho from the ANOVA test, we could find a level that was actually different from the rest of the tests. In other words, Ho was actually rejected.

From the results obtained from the Kruskall-Wallis test, we could also reject Ho.

Therefore, we can conclude that the results are comparable and that the Krukall-Wallis test gave us a better result since we did not need to confirm the difference of means with a Fisher test, because Ho was already rejected.

Homework - Week 6

2022-10-08

Question 3.23

Reading Data

Item a

Item b

Item c

Question 3.28

Reading Data

Item a

Item b

Item c

Question 3.29

Reading Data

Item a

Item b

Item c

Extra: Camparision between the Kruskall-Wallis and ANOVA transformed

Question 3.51 and 3.52

Reading the data for question 3.23

Hypothesis from question 3.23

Performing the Kruskal-Wallis test:

Conclusion

3.51

3.52