Problem 3.23 The hypothesis that we are testing is:
\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \]
\[ H_1 = At\; least \; one \;mean\;is\;different \]
#a
f1<-c(17.6,18.9,16.3,17.4,20.1,21.6)
f2<-c(16.9,15.3,18.6,17.1,19.5,20.3)
f3<-c(21.4,23.6,19.4,18.5,20.5,22.3)
f4<-c(19.3,21.1,16.9,17.5,18.3,19.8)
dat<-data.frame(f1,f2,f3,f4)
library(tidyr)
dat2<-pivot_longer(dat,c(f1,f2,f3,f4))
aov.model<-aov(value~name,data=dat2)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 30.17 10.05 3.047 0.0525 .
## Residuals 20 65.99 3.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since P-value (0.0525) > alpha (0.05), we do not reject the hypothesis that the means are equal.
#b I would choose fluid 3, since it shows higher observations values.
#c
pop<-c(f1,f2,f3,f4)
meanx<-c(rep(mean(f1),6),rep(mean(f2),6),rep(mean(f3),6),rep(mean(f4),6))
res<-pop-meanx
qqnorm(res)
qqline(res)
plot(meanx,res,xlab="population average", ylab="residual",main="constant variance")
The data looks fairly normal and the variance looks to be consistent
within the samples.
Problem 3.28
#a
The hypothesis that we are testing is:
\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu \]
\[ H_1 = At\; least \; one \;mean\;is\;different \]
m1<-c(110, 157, 194, 178)
m2<-c(1, 2, 4, 18)
m3<-c(880, 1256, 5276, 4355)
m4<-c(495, 7040, 5307, 10050)
m5<-c(7, 5, 29, 2)
dat3<-data.frame(m1,m2,m3,m4,m5)
library(tidyr)
dat4<-pivot_longer(dat3,c(m1,m2,m3,m4,m5))
aov.model<-aov(value~name,data=dat4)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 103191489 25797872 6.191 0.00379 **
## Residuals 15 62505657 4167044
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since P-value (0.00379) < (0.01), we do reject the hypothesis that the means are equal.
#b
pop2<-c(m1,m2,m3,m4,m5)
meanx2<-c(rep(mean(m1),4),rep(mean(m2),4),rep(mean(m3),4),rep(mean(m4),4),rep(mean(m5),4))
res2<-pop2-meanx2
qqnorm(res2)
qqline(res2)
plot(meanx2,res2,xlab="population average", ylab="residual",main="variance")
From the variance residual plot we can clearly see a funnel shape that
indicates that each pop. has a different variance pattern, hence the
variance is not constant. In addition, the normal Q-Q plot shows that
the data is not normally distributed.
#C
dat5<-log(dat3)
dat6<-pivot_longer(dat5,c(m1,m2,m3,m4,m5))
aov.model<-aov(value~name,data=dat6)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 165.06 41.26 37.66 1.18e-07 ***
## Residuals 15 16.44 1.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
I applied a log transformation in the data and similarly as previously, the hypothesis that the means are equal is rejected.
Problem 3.29
#a
The hypothesis that we are testing is:
\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu \]
\[ H_1 = At\; least \; one \;mean\;is\;different \]
w1<-c(31, 10, 21, 4, 1)
w2<-c(62, 40, 24, 30, 35)
w3<-c(53, 27, 120, 97, 68)
dat7<-data.frame(w1,w2,w3)
library(tidyr)
dat8<-pivot_longer(dat7,c(w1,w2,w3))
aov.model<-aov(value~name,data=dat8)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 8964 4482 7.914 0.00643 **
## Residuals 12 6796 566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion:
The data looks fairly normal and the variance looks to be consistent within the samples.
#b
pop3<-c(w1,w2,w3)
meanx3<-c(rep(mean(w1),5),rep(mean(w2),5),rep(mean(w3),5))
res3<-pop3-meanx3
qqnorm(res3)
qqline(res3)
plot(meanx3,res3,xlab="population average", ylab="residual",main="variance")
Similarly with previous question, the variance of the residuals has a funnel shape, indicating the non constant variance. From the Normal Q-Q plot, the data does not looks like normal there is a little āSā shape variation along the QQ line.
#c
w1_1<-sqrt(w1)
w2_2<-sqrt(w2)
w3_3<-sqrt(w3)
dat9<-data.frame(w1_1,w2_2,w3_3)
library(tidyr)
dat10<-pivot_longer(dat9,c(w1_1,w2_2,w3_3))
aov.model<-aov(value~name,data=dat10)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 2 63.90 31.95 9.84 0.00295 **
## Residuals 12 38.96 3.25
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
After applying a square root transformation, the difference between the means is much more visible, showing a P- value of 0.0295, which is lesser than before the transformation.
Problem 3.51 & 3.52
f1<-c(17.6,18.9,16.3,17.4,20.1,21.6)
f2<-c(16.9,15.3,18.6,17.1,19.5,20.3)
f3<-c(21.4,23.6,19.4,18.5,20.5,22.3)
f4<-c(19.3,21.1,16.9,17.5,18.3,19.8)
dat<-data.frame(f1,f2,f3,f4)
library(tidyr)
dat2<-pivot_longer(dat,c(f1,f2,f3,f4))
kruskal.test(value~name,data=dat2)
##
## Kruskal-Wallis rank sum test
##
## data: value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015
Since P-value = 0.1015 (> than 0.05) the we do not reject the hypothesis that the means are equal.
Problem 4.3
We want to test the hypotheses of equal means against the hypotheses of at least one being different:
\[ H_0: \tau_i = 0 \]
\[ H_a: \tau_i \neq 0 \]
The linear effects model is shown by the equation below:
\[ y_{ij} = \mu_i + \tau_i + \beta_j = \epsilon_{ij}\]
where \(\mu_i, \tau_i, \beta_j, \epsilon_{ij}\), represents, respectively the average of the ith treatment, the interaction between treatments, additive effect of the blocks and normal random error.
library(GAD)
chemical<-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
bolt <- c(seq(1,5),seq(1,5),seq(1,5),seq(1,5))
obs<- c(73,68,74,71,67,
73,67,75,72,70,
75,68,78,73,68,
73,71,75,75,69)
chemical<-as.fixed(chemical)
bolt <- as.fixed(bolt)
model<-lm(obs~chemical+bolt)
gad(model)
## $anova
## Analysis of Variance Table
##
## Response: obs
## Df Sum Sq Mean Sq F value Pr(>F)
## chemical 3 12.95 4.317 2.3761 0.1211
## bolt 4 157.00 39.250 21.6055 2.059e-05 ***
## Residuals 12 21.80 1.817
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
With an \(\alpha\) = 0.05, the null hypothesis is not rejected (P-value = 0.1211).
Problem 4.22
\[ y_{ijk}=\mu+\tau_i+\beta_j+\alpha_k+\epsilon_{ijk} \]
\[ i = j = k = (1,2,3,4,5) \]
\(\mu\) = Baseline mean
\(\tau\) = Treatments (Ingredients)
\(\beta\) = Batch of materials
\(\alpha\) = Day of production
\(\epsilon\) = Random error
library(GAD)
batch<-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5))
day<-c(seq(1,5),seq(1,5),seq(1,5),seq(1,5),seq(1,5))
variable<-c("A","B","D","C","E","C","E","A","D","B","B","A","C","E","D","D","C","E","B","A","E","D","B","A","C")
value<-c(8,7,1,7,3,11,2,7,3,8,4,9,10,1,5,6,8,6,6,10,4,2,3,8,8)
dat<-data.frame(batch,day,variable,value)
dat$batch<-as.fixed(dat$batch)
dat$day<-as.fixed(dat$day)
dat$variable<-as.fixed(dat$variable)
model<-lm(value~batch+day+variable,data = dat)
anova(model)
## Analysis of Variance Table
##
## Response: value
## Df Sum Sq Mean Sq F value Pr(>F)
## batch 4 15.44 3.860 1.2345 0.3476182
## day 4 12.24 3.060 0.9787 0.4550143
## variable 4 141.44 35.360 11.3092 0.0004877 ***
## Residuals 12 37.52 3.127
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p-value for treatments is 0.0004877 which is very less than 0.05 therefore we can successfully reject the null hypothesis that the mean are equal with 95% confidence.