Homework-module4

Problem 3.23 The hypothesis that we are testing is:

\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu_4 = \mu \]

\[ H_1 = At\; least \; one \;mean\;is\;different \]

#a
f1<-c(17.6,18.9,16.3,17.4,20.1,21.6)
f2<-c(16.9,15.3,18.6,17.1,19.5,20.3)
f3<-c(21.4,23.6,19.4,18.5,20.5,22.3)
f4<-c(19.3,21.1,16.9,17.5,18.3,19.8)
dat<-data.frame(f1,f2,f3,f4)
library(tidyr)
dat2<-pivot_longer(dat,c(f1,f2,f3,f4))
aov.model<-aov(value~name,data=dat2)
summary(aov.model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## name         3  30.17   10.05   3.047 0.0525 .
## Residuals   20  65.99    3.30                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since P-value (0.0525) > alpha (0.05), we do not reject the hypothesis that the means are equal.

#b I would choose fluid 3, since it shows higher observations values.

#c
pop<-c(f1,f2,f3,f4)
meanx<-c(rep(mean(f1),6),rep(mean(f2),6),rep(mean(f3),6),rep(mean(f4),6))
res<-pop-meanx
qqnorm(res)
qqline(res)

plot(meanx,res,xlab="population average", ylab="residual",main="constant variance")

The data looks fairly normal and the variance looks to be consistent within the samples.

Problem 3.28

The hypothesis that we are testing is:

\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu \]

\[ H_1 = At\; least \; one \;mean\;is\;different \]

m1<-c(110, 157, 194, 178)
m2<-c(1, 2, 4, 18)
m3<-c(880, 1256, 5276, 4355)
m4<-c(495, 7040, 5307, 10050)
m5<-c(7, 5, 29, 2)
dat3<-data.frame(m1,m2,m3,m4,m5)
library(tidyr)
dat4<-pivot_longer(dat3,c(m1,m2,m3,m4,m5))
aov.model<-aov(value~name,data=dat4)
summary(aov.model)

##             Df    Sum Sq  Mean Sq F value  Pr(>F)   
## name         4 103191489 25797872   6.191 0.00379 **
## Residuals   15  62505657  4167044                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since P-value (0.00379) < (0.01), we do reject the hypothesis that the means are equal.

pop2<-c(m1,m2,m3,m4,m5)
meanx2<-c(rep(mean(m1),4),rep(mean(m2),4),rep(mean(m3),4),rep(mean(m4),4),rep(mean(m5),4))
res2<-pop2-meanx2
qqnorm(res2)
qqline(res2)

plot(meanx2,res2,xlab="population average", ylab="residual",main="variance")

From the variance residual plot we can clearly see a funnel shape that indicates that each pop. has a different variance pattern, hence the variance is not constant. In addition, the normal Q-Q plot shows that the data is not normally distributed.

dat5<-log(dat3)
dat6<-pivot_longer(dat5,c(m1,m2,m3,m4,m5))
aov.model<-aov(value~name,data=dat6)
summary(aov.model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         4 165.06   41.26   37.66 1.18e-07 ***
## Residuals   15  16.44    1.10                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

I applied a log transformation in the data and similarly as previously, the hypothesis that the means are equal is rejected.

Problem 3.29

The hypothesis that we are testing is:

\[ H_0 = \mu_1 = \mu_2 = \mu_3 = \mu \]

\[ H_1 = At\; least \; one \;mean\;is\;different \]

w1<-c(31, 10, 21, 4, 1)
w2<-c(62, 40, 24, 30, 35)
w3<-c(53, 27, 120, 97, 68)
dat7<-data.frame(w1,w2,w3)
library(tidyr)
dat8<-pivot_longer(dat7,c(w1,w2,w3))
aov.model<-aov(value~name,data=dat8)
summary(aov.model)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## name         2   8964    4482   7.914 0.00643 **
## Residuals   12   6796     566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion:

The data looks fairly normal and the variance looks to be consistent within the samples.

pop3<-c(w1,w2,w3)
meanx3<-c(rep(mean(w1),5),rep(mean(w2),5),rep(mean(w3),5))
res3<-pop3-meanx3
qqnorm(res3)
qqline(res3)

plot(meanx3,res3,xlab="population average", ylab="residual",main="variance")

Similarly with previous question, the variance of the residuals has a funnel shape, indicating the non constant variance. From the Normal Q-Q plot, the data does not looks like normal there is a little “S” shape variation along the QQ line.

w1_1<-sqrt(w1)
w2_2<-sqrt(w2)
w3_3<-sqrt(w3)
dat9<-data.frame(w1_1,w2_2,w3_3)
library(tidyr)
dat10<-pivot_longer(dat9,c(w1_1,w2_2,w3_3))
aov.model<-aov(value~name,data=dat10)
summary(aov.model)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## name         2  63.90   31.95    9.84 0.00295 **
## Residuals   12  38.96    3.25                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

After applying a square root transformation, the difference between the means is much more visible, showing a P- value of 0.0295, which is lesser than before the transformation.

Problem 3.51 & 3.52

f1<-c(17.6,18.9,16.3,17.4,20.1,21.6)
f2<-c(16.9,15.3,18.6,17.1,19.5,20.3)
f3<-c(21.4,23.6,19.4,18.5,20.5,22.3)
f4<-c(19.3,21.1,16.9,17.5,18.3,19.8)
dat<-data.frame(f1,f2,f3,f4)
library(tidyr)
dat2<-pivot_longer(dat,c(f1,f2,f3,f4))

kruskal.test(value~name,data=dat2)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  value by name
## Kruskal-Wallis chi-squared = 6.2177, df = 3, p-value = 0.1015

Since P-value = 0.1015 (> than 0.05) the we do not reject the hypothesis that the means are equal.

Problem 4.3

We want to test the hypotheses of equal means against the hypotheses of at least one being different:

\[ H_0: \tau_i = 0 \]

\[ H_a: \tau_i \neq 0 \]

The linear effects model is shown by the equation below:

\[ y_{ij} = \mu_i + \tau_i + \beta_j = \epsilon_{ij}\]

where \(\mu_i, \tau_i, \beta_j, \epsilon_{ij}\), represents, respectively the average of the ith treatment, the interaction between treatments, additive effect of the blocks and normal random error.

library(GAD)
chemical<-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5))
bolt <- c(seq(1,5),seq(1,5),seq(1,5),seq(1,5))
obs<- c(73,68,74,71,67,
        73,67,75,72,70,
        75,68,78,73,68,
        73,71,75,75,69)

chemical<-as.fixed(chemical)
bolt <- as.fixed(bolt)
model<-lm(obs~chemical+bolt)
gad(model)

## $anova
## Analysis of Variance Table
## 
## Response: obs
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## chemical   3  12.95   4.317  2.3761    0.1211    
## bolt       4 157.00  39.250 21.6055 2.059e-05 ***
## Residuals 12  21.80   1.817                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

With an \(\alpha\) = 0.05, the null hypothesis is not rejected (P-value = 0.1211).

Problem 4.22

\[ y_{ijk}=\mu+\tau_i+\beta_j+\alpha_k+\epsilon_{ijk} \]

\[ i = j = k = (1,2,3,4,5) \]

\(\mu\) = Baseline mean

\(\tau\) = Treatments (Ingredients)

\(\beta\) = Batch of materials

\(\alpha\) = Day of production

\(\epsilon\) = Random error

library(GAD)
batch<-c(rep(1,5),rep(2,5),rep(3,5),rep(4,5),rep(5,5))
day<-c(seq(1,5),seq(1,5),seq(1,5),seq(1,5),seq(1,5))
variable<-c("A","B","D","C","E","C","E","A","D","B","B","A","C","E","D","D","C","E","B","A","E","D","B","A","C")
value<-c(8,7,1,7,3,11,2,7,3,8,4,9,10,1,5,6,8,6,6,10,4,2,3,8,8)
dat<-data.frame(batch,day,variable,value)

dat$batch<-as.fixed(dat$batch)
dat$day<-as.fixed(dat$day)
dat$variable<-as.fixed(dat$variable)
model<-lm(value~batch+day+variable,data = dat)
anova(model)

## Analysis of Variance Table
## 
## Response: value
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## batch      4  15.44   3.860  1.2345 0.3476182    
## day        4  12.24   3.060  0.9787 0.4550143    
## variable   4 141.44  35.360 11.3092 0.0004877 ***
## Residuals 12  37.52   3.127                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Since the p-value for treatments is 0.0004877 which is very less than 0.05 therefore we can successfully reject the null hypothesis that the mean are equal with 95% confidence.

Homework-module4

Kaiser Hamid

2024-10-11