Problem 3.23

(a):

r1 <- c(17.6,18.9,16.3,17.4,20.1,21.6)
r2 <- c(16.9,15.3,18.6,17.1,19.5,20.3)
r3 <- c(21.4,23.6,19.4,18.5,20.5,22.3)
r4 <- c(19.3,21.1,16.9,17.5,18.3,19.8)
obs <- c(r1,r2,r3,r4)
type <- c(rep(1,6),rep(2,6),rep(3,6),rep(4,6))
dat1 <- cbind(obs,type)
str(dat1)
##  num [1:24, 1:2] 17.6 18.9 16.3 17.4 20.1 21.6 16.9 15.3 18.6 17.1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "obs" "type"
dat1 <- as.data.frame(dat1)
str(dat1)
## 'data.frame':    24 obs. of  2 variables:
##  $ obs : num  17.6 18.9 16.3 17.4 20.1 21.6 16.9 15.3 18.6 17.1 ...
##  $ type: num  1 1 1 1 1 1 2 2 2 2 ...
dat1$type <- as.factor(dat1$type)
str(dat1)
## 'data.frame':    24 obs. of  2 variables:
##  $ obs : num  17.6 18.9 16.3 17.4 20.1 21.6 16.9 15.3 18.6 17.1 ...
##  $ type: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 2 2 2 2 ...

null hypothesis: \(H_0: \mu_{1}=\mu_{2}=\cdots =\mu_{i}=\mu\)

alternative hypothesis: \(H_1:\) at least one \(\mu_{i}\) differs

Since only four samples are for each group, which is little to show noramality and variance, if non-parametric test:

kruskal.test(obs~type,data=dat1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  obs by type
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Since p-value = 0.1015 > 0.05, we do not reject the null hypothesis (i.e., no means differs).

If Parametric test:

aov.model<-aov(obs~type, data = dat1)
summary(aov.model)
##             Df Sum Sq Mean Sq F value Pr(>F)  
## type         3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Since p-value = 0.0525 > 0.05, we do not reject the null hypothesis (i.e., no means differs).

Overall, since both parametric test and non-parametric test show the same non-rejection. We are safer to conclusion that fluids do not differ, although we have P-values, which is close to significant level of 0.05.

Check assumptions:

boxplot(obs~type,xlab="type/method",ylab="observation",main="Boxplot of Observations")

meanx<-c(ave(r1),ave(r2),ave(r3),ave(r4))
res <- obs-meanx
qqnorm(res)

plot(meanx,res,xlab="population average",ylab="residual", main="constant variance check")

Comments: We can easily draw from normqq plot that even if it passes fat pencil test and follows normal distribution, we only have 6 samples for each population, which is little to show normality for running a parametric test. So we maybe run a non-parametric test.

We also know from box plot and residue plot that variances are roughly equal. No trasnformation is needed if we want to use parametic test.

————————————————————————————————

(b):

From the box plot, it is also determined that under the roughly same variance, we choose the higher average value, which is the Fluid Type 3. However, given the objective of long life and the close P-value to significant level of 0.05, we still need to run LSD test to confirm it.

library(agricolae)
LSD.test(aov.model,"type",p.adj = "none",console=TRUE)
## 
## Study: aov.model ~ "type"
## 
## LSD t Test for obs 
## 
## Mean Square Error:  3.299667 
## 
## type,  means and individual ( 95 %) CI
## 
##        obs      std r      LCL      UCL  Min  Max
## 1 18.65000 1.952178 6 17.10309 20.19691 16.3 21.6
## 2 17.95000 1.854454 6 16.40309 19.49691 15.3 20.3
## 3 20.95000 1.879096 6 19.40309 22.49691 18.5 23.6
## 4 18.81667 1.554885 6 17.26975 20.36358 16.9 21.1
## 
## Alpha: 0.05 ; DF Error: 20
## Critical Value of t: 2.085963 
## 
## least Significant Difference: 2.187666 
## 
## Treatments with the same letter are not significantly different.
## 
##        obs groups
## 3 20.95000      a
## 4 18.81667     ab
## 1 18.65000      b
## 2 17.95000      b
abs(mean(r1)-mean(r2))
## [1] 0.7
abs(mean(r1)-mean(r3))
## [1] 2.3
abs(mean(r1)-mean(r4))
## [1] 0.1666667
abs(mean(r2)-mean(r3))
## [1] 3
abs(mean(r2)-mean(r4))
## [1] 0.8666667
abs(mean(r3)-mean(r4))
## [1] 2.133333

Using confidence level=0.95, we obtained Mean Square Error=3.299667 and Critical Value of t=2.085963. Since abs(mean(r1)-mean(r2))=0.7, abs(mean(r1)-mean(r3))=2.3, abs(mean(r1)-mean(r4))=0.167, abs(mean(r2)-mean(r3))=3, abs(mean(r2)-mean(r4))=0.867, abs(mean(r3)-mean(r4))=2.133, we can draw that since there is not much difference between 1&2, 1&4 and 2&4, while all 1&3,2&3,4&3 have much more difference > 2.085963, we therefore confirm the aforementioned speculation on Fluid type 3 that it has the longest life.

————————————————————————————————

(c):

Repeatedly, from the residual plot in (a), we confirmed the conclusion that variances are roughly equal indicated by boxplot is correct, since there is connection between variance and residues. Higher range of residues reflects higher variance to some extent. Thus, the basic analysis of variance assumptions are satisfied.

————————————————————————————————

Problem 3.28

(a):

r1 <- c(110,157,194,178)
r2 <- c(1,2,4,18)
r3 <- c(880,1256,5276,4355)
r4 <- c(495,7040,5307,10050)
r5 <- c(7,5,29,2)
obs <- c(r1,r2,r3,r4,r5)
type <- c(rep(1,4),rep(2,4),rep(3,4),rep(4,4),rep(5,4))
dat1 <- cbind(obs,type)
str(dat1)
##  num [1:20, 1:2] 110 157 194 178 1 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "obs" "type"
dat1 <- as.data.frame(dat1)
str(dat1)
## 'data.frame':    20 obs. of  2 variables:
##  $ obs : num  110 157 194 178 1 ...
##  $ type: num  1 1 1 1 2 2 2 2 3 3 ...
dat1$type <- as.factor(dat1$type)
str(dat1)
## 'data.frame':    20 obs. of  2 variables:
##  $ obs : num  110 157 194 178 1 ...
##  $ type: Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 2 2 2 2 3 3 ...

null hypothesis: \(H_0: \mu_{1}=\mu_{2}=\cdots =\mu_{i}=\mu\)

alternative hypothesis: \(H_1:\) at least one \(\mu_{i}\) differs

Since only four samples are for each group, which is little to show noramality and variance, if non-Parametric test:

kruskal.test(obs~type,data=dat1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  obs by type
## Kruskal-Wallis chi-squared = 16.873, df = 4, p-value = 0.002046

Ans: Since p-value = 0.002046 < 0.05, we do confirm to reject the null hypothesis (i.e., at least one means differs).

If Parametric test:

aov.model<-aov(obs~type, data = dat1)
summary(aov.model)
##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## type         4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Ans: Since p-value = 0.00379 < 0.05, we do confirm to reject the null hypothesis (i.e., at least one means differs).

————————————————————————————————

(b):

Check assumptions:

boxplot(obs~type,xlab="type/method",ylab="observation",main="Boxplot of Observations")

meanx<-c(ave(r1),ave(r2),ave(r3),ave(r4),ave(r5))
res <- obs-meanx
qqnorm(res)

plot(meanx,res,xlab="population average",ylab="residual", main="constant variance check")

Comments: We can easily draw from normqq plot that it does not pass fat pencil test and does not follow normal distribution, but normliaty is a weak assumption, we maybe still use parametric test. Oppositely, since we only have 4 samples for each population, which is little to show normality for running a parametric test, we also maybe run a non-parametric test.

We also know from box plot and residue plot that variances are not roughly equal.Trasnformation is needed if we want to use parametic test.

————————————————————————————————

(c):

Transformation and Anova:

library(MASS)

boxcox(obs~type)

lambda = 0.01
obs<-obs^(lambda)
dat1$obs <- (dat1$obs)^(lambda)
boxcox(obs~type)

boxplot(obs~type,xlab="type/method",ylab="observation",main="Boxplot of Observations")

aov.model<-aov(obs~type, data = dat1)
summary(aov.model)
##             Df   Sum Sq  Mean Sq F value   Pr(>F)    
## type         4 0.018151 0.004538   37.67 1.17e-07 ***
## Residuals   15 0.001807 0.000120                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Ans: After plotting transformed data, 1 is within the interval means transfomation works well. Normality is corrected. Despite we can see that the transformed effect on Type 2 is relatively limited compared to others, overall, we maybe succeed to correct the variance. Since p = 1.17e-07<0.05 in Anova, we reject null hypothesis(i.e., at least one means differs). Repeatedly, we only have 4 samples, which is little for a parametric test. This running serves as a reference. So, we may run the non-parametric test in this case.

————————————————————————————————

Problem 3.29

(a):

r1 <- c(31,10,21,4,1)
r2 <- c(62,40,24,30,35)
r3 <- c(53,27,120,97,68)
obs <- c(r1,r2,r3)
type <- c(rep(1,5),rep(2,5),rep(3,5))
dat1 <- cbind(obs,type)
str(dat1)
##  num [1:15, 1:2] 31 10 21 4 1 62 40 24 30 35 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:2] "obs" "type"
dat1 <- as.data.frame(dat1)
str(dat1)
## 'data.frame':    15 obs. of  2 variables:
##  $ obs : num  31 10 21 4 1 62 40 24 30 35 ...
##  $ type: num  1 1 1 1 1 2 2 2 2 2 ...
dat1$type <- as.factor(dat1$type)
str(dat1)
## 'data.frame':    15 obs. of  2 variables:
##  $ obs : num  31 10 21 4 1 62 40 24 30 35 ...
##  $ type: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 2 2 2 2 2 ...

null hypothesis: \(H_0: \mu_{1}=\mu_{2}=\cdots =\mu_{i}=\mu\)

alternative hypothesis: \(H_1:\) at least one \(\mu_{i}\) differs

Since only five samples are for each group, which is little to show noramality and variance, if non-Parametric test:

kruskal.test(obs~type,data=dat1)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  obs by type
## Kruskal-Wallis chi-squared = 8.54, df = 2, p-value = 0.01398

Ans: Since p-value = 0.01398 < 0.05, we do confirm to reject the null hypothesis (i.e., at least one means differs).

If Parametric test:

aov.model<-aov(obs~type, data = dat1)
summary(aov.model)
##             Df Sum Sq Mean Sq F value  Pr(>F)   
## type         2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Ans: Since p-value = 0.00643 < 0.05, we do confirm to reject the null hypothesis (i.e., at least one means differs).

————————————————————————————————

(b):

Check assumptions:

boxplot(obs~type,xlab="type/method",ylab="observation",main="Boxplot of Observations")

meanx<-c(ave(r1),ave(r2),ave(r3))
res <- obs-meanx
qqnorm(res)

plot(meanx,res,xlab="population average",ylab="residual", main="constant variance check")

Comments: We can easily draw from normqq plot that it does pass fat pencil test and does follow normal distribution. Normliaty is a weak assumption, we maybe still use parametric test. Oppositely, since we only have 5 samples for each population, which is little to show normality for running a parametric test, we also maybe run a non-parametric test.

we also know from box plot and residue plot that variances are not roughly equal.Trasnformation is needed if we want to use parametic test.

————————————————————————————————

(c):

Transformation and Anova:

library(MASS)

boxcox(obs~type)

lambda = 0.45
obs<-obs^(lambda)
dat1$obs <- (dat1$obs)^(lambda)
boxcox(obs~type)

boxplot(obs~type,xlab="type/method",ylab="observation",main="Boxplot of Observations")

aov.model<-aov(obs~type, data = dat1)
summary(aov.model)
##             Df Sum Sq Mean Sq F value Pr(>F)   
## type         2  37.17   18.59   9.887 0.0029 **
## Residuals   12  22.56    1.88                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Ans: After plotting transformed data, 1 is within the interval means transfomation works well. Despite we can see that the transformed effect on Type/Method 2 is relatively limited compared to others, overall, we maybe succeed to correct the variance. Since p = 0.0029<0.05 in Anova, we reject null hypothesis (i.e., at least one means differs). Repeatedly, we only have 5 samples, which is little for a parametric test. This running serves as a reference. So, we may run the non-parametric test in this case.

————————————————————————————————

Problem 3.51

Question: Use the Kruskal–Wallis test for the experiment in Problem 3.23. Compare the conclusions obtained with those from the usual analysis of variance.

Ans: I have done this already in Problem 3.23.

Code for non-parametric test therein: kruskal.test(obs~type,data=dat1)

Since p-value = 0.1015 > 0.05, we do not reject the null hypothesis (i.e., no means differs).

Overall, since both parametric test and non-parametric test show the same non-rejection. We are safer to conclusion that fluids do not differ, although we have P-values, which are close to significant level of 0.05.

About comparison, since we do not assume the normaliy and equal variance in non-parametric Kruskal-Wallace test, we have less constrainsts therefore more confidence to this say this non-rejection. Again, if we can assume normaliy and equal variance, it is better to use parametric test (ANOVA) than the non-parametric test, which is more accurate.