library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
A<-read.csv("C:/R Activities/Week 6/Homework/assigment 10.csv")
colnames(A)<-c("Method","Obs")

(a):

Linear effect equation: \(x_{ij} =\mu_{i}+ e_{ij}\)

\(x_{ij}=\mu+\tau_{i}+e_ij\)

null hypothesis: \(H_0: \mu_{1}=\mu_{2}=\cdots =\mu_{i}=\mu\)

alternative hypothesis: \(H_1:\) at least one \(\mu_{i}\) differs

(b):

model<-aov(Obs~Method, data = A)
summary(model)
##             Df Sum Sq Mean Sq F value Pr(>F)    
## Method       3  708.7   236.2   76.29  4e-11 ***
## Residuals   20   61.9     3.1                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model)

## hat values (leverages) are all = 0.1666667
##  and there are no factor predictors; no plot no. 5

boxplot(Obs~Method, data = A, main="Boxplot of observations")

Dr. Matis said we ignore this question (normality) since we have insufficient data, it only has 6 even if from the qqnorm plot we draw that it passes the fat pencil test and it is roughly normally distributed. However, the variances are not equal.So we need to do transformation.

(c):

kruskal.test(Obs~Method, data = A)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  Obs by Method
## Kruskal-Wallis chi-squared = 21.156, df = 3, p-value = 9.771e-05

Ans: Since p-value = 9.771e-05 < 0.05, we do reject the null hypothesis(i.e., at least one means differs)

(d):

library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
## 
##     select
boxcox(Obs~Method, data = A)

lambda=.6  
#Square root transformation 
A$Obs<-A$Obs^lambda

Verifying data after transformation

boxcox(Obs~Method, data = A)

boxplot(Obs~Method, data = A, main="Boxplot of transformed observations ")

model<-aov(Obs~Method, data = A)
summary(model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Method       3  63.71  21.236   85.76 1.36e-11 ***
## Residuals   20   4.95   0.248                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Ans: After plotting transformed data, we can see that we corrected the data in T1, but the effect on T2 is limited. Overall, we fail to correct the variance, even if p = 1.36e-11<0.05 (we reject null hypothesis). we only have 6 samples, which is quite less for a parametric test. So always run the non-parametric test.