1 Question 3.7:

The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the following data were collected:

Observations of Tensile Strength
Mixing Technique 1 2 3 4
1 3129 3000 2865 2890
2 3200 3300 2975 3150
3 2800 2900 2985 3050
4 2600 2700 2600 2765
  1. Use the Fisher LSD method with \(\alpha\) 0.05 to make comparisons between pairs of means.

  2. Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?

  3. Plot the residuals versus the predicted tensile strength. Comment on the plot.

  4. Prepare a scatter plot of the results to aid the interpretation of the results of this experiment

1.1 Solution:

PART C:

Reading the Data in TIDYR Format:

Mixingtech1<-c(3129,3000,2865,2890)
Mixingtech2<-c(3200,3300,2975,3150)
Mixingtech3<-c(2800,2900,2985,3050)
Mixingtech4<-c(2600,2700,2600,2765)
dat<-data.frame(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4)

Now using Pivot_Longer command to create tidy data:

library(tidyr)
dat<-pivot_longer(dat,c(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4))
print(dat)
## # A tibble: 16 × 2
##    name        value
##    <chr>       <dbl>
##  1 Mixingtech1  3129
##  2 Mixingtech2  3200
##  3 Mixingtech3  2800
##  4 Mixingtech4  2600
##  5 Mixingtech1  3000
##  6 Mixingtech2  3300
##  7 Mixingtech3  2900
##  8 Mixingtech4  2700
##  9 Mixingtech1  2865
## 10 Mixingtech2  2975
## 11 Mixingtech3  2985
## 12 Mixingtech4  2600
## 13 Mixingtech1  2890
## 14 Mixingtech2  3150
## 15 Mixingtech3  3050
## 16 Mixingtech4  2765

Now performing ONE Way ANOVA:

aov.model<-aov(value~name,data=dat)
summary(aov.model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         3 489740  163247   12.73 0.000489 ***
## Residuals   12 153908   12826                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

To conduct the LSD test: Defining the hypothesis first:

\[H_o:\space \mu_{i} \space - \mu_{j}= 0\]\[H_a:\space \mu_{i} \space - \mu_{j}\neq 0\] where, i and j correspond to mixing techniques.

To calculate the least significant difference using the equation below:

\[ LSD= \tau _{\alpha /2,N-1} * \sqrt{\frac{2MSE}{n}} \]

We use this equation because our data is balanced and n1=n2=n3=n4

From t-table, We note down our t-statistic and MSE from ANOVA results:

t<-2.179
MSE<-12826
n=4

Using the LSD Equation:

LSD = t*sqrt(2*MSE/n)
print(LSD)
## [1] 174.497

Now, If difference of any treatment average exceeds by more than 174.5 would imply that that pair of means significantly differs:

abs(mean(Mixingtech1)-mean(Mixingtech2))
## [1] 185.25
abs(mean(Mixingtech1)-mean(Mixingtech3))
## [1] 37.25
abs(mean(Mixingtech1)-mean(Mixingtech4))
## [1] 304.75
abs(mean(Mixingtech2)-mean(Mixingtech3))
## [1] 222.5
abs(mean(Mixingtech2)-mean(Mixingtech4))
## [1] 490
abs(mean(Mixingtech3)-mean(Mixingtech4))
## [1] 267.5

Conclusion:

---> the only pair of means that we fail to reject is \(\mu_{1}\) & \(\mu_{3}\) , because the difference in means is less than the LSD. i.e.,

\[\bar{y}_{1.} - \bar{y}_{2.} = 37.5 < LSD\]
For all other pairs of means, we reject the null hypothesis and conclude that there is a significant difference between population means because their differences are > LSD Value as shown above
PART D:

Since we already ran ONE Way ANOVA in Part C, we also ran ANOVA Validation i.e.: Plot (aov.model) command and got Normal Probability Plot.

Conclusion:

---> Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.

PART E:

Conclusion:

---> looking at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.

PART F:

Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
library(car)
library(carData)
scatterplot(Strength ~ Type, data=dat,
            xlab="Mixing Technique", ylab="Tensile Strength",
            main="Scatter Plot")

Conclusion:

---> Mixing techniques one and three look as if they would be significantly similar. This was actually proven in part c to be true. Graphically, we can tell that all other means are different.

---> The plot above also shows the sample average for each treatment and the 95 percent confidence interval on the treatment mean.

---> If we look at the graph “Residuals vs Factor Levels” values, we can see the scatter plot, depicting similar results.

2 Question 3.10:

A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton content and replicates the experiment five times. The data are shown in the following table.

Observations of Tensile Strength of cloth fibre
Cotton Weight % 1 2 3 4 5
15 7 7 15 11 9
20 12 17 12 18 18
25 14 19 19 18 18
30 19 25 22 19 23
35 7 10 11 15 11
  1. Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?

  2. Analyze the residuals from this experiment and comment on model adequacy.

2.1 Solution:

PART B:

Reading the Data in TIDYR Format:

CW15<-c(7,7,15,11,9)
CW20<-c(12,17,12,18,18)
CW25<-c(14,19,19,18,18)
CW30<-c(19,25,22,19,23)
CW35<-c(7,10,11,15,11)
dat<-data.frame(CW15,CW20,CW25,CW30,CW35)

Now using Pivot_Longer command to create tidy data:

library(tidyr)
dat<-pivot_longer(dat,c(CW15,CW20,CW25,CW30,CW35))
print(dat)
## # A tibble: 25 × 2
##    name  value
##    <chr> <dbl>
##  1 CW15      7
##  2 CW20     12
##  3 CW25     14
##  4 CW30     19
##  5 CW35      7
##  6 CW15      7
##  7 CW20     17
##  8 CW25     19
##  9 CW30     25
## 10 CW35     10
## # … with 15 more rows

Now performing ONE Way ANOVA:

aov.model<-aov(value~name,data=dat)
summary(aov.model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         4  475.8  118.94   14.76 9.13e-06 ***
## Residuals   20  161.2    8.06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

To conduct the LSD test: Defining the hypothesis first:

\[H_o:\space \mu_{i} \space - \mu_{j}= 0\]\[H_a:\space \mu_{i} \space - \mu_{j}\neq 0\] where, i and j correspond to Cotton Weight Percentages.

To calculate the least significant difference using the equation below:

\[ LSD= \tau _{\alpha /2,N-1} * \sqrt{\frac{2MSE}{n}} \]

We use this equation because our data is balanced and n1=n2=n3=n4=n5

From t-table, We note down our t-statistic and MSE from ANOVA results:

t<-2.086
MSE<-8.06
n=5

Using the LSD Equation:

LSD = t*sqrt(2*MSE/n)
print(LSD)
## [1] 3.745517

Now, If difference of any treatment average exceeds by more than 3.745 would imply that that pair of means significantly differs:

abs(mean(CW15)-mean(CW20))
## [1] 5.6
abs(mean(CW15)-mean(CW25))
## [1] 7.8
abs(mean(CW15)-mean(CW30))
## [1] 11.8
abs(mean(CW15)-mean(CW35))
## [1] 1
abs(mean(CW20)-mean(CW25))
## [1] 2.2
abs(mean(CW20)-mean(CW30))
## [1] 6.2
abs(mean(CW20)-mean(CW35))
## [1] 4.6
abs(mean(CW25)-mean(CW30))
## [1] 4
abs(mean(CW25)-mean(CW35))
## [1] 6.8
abs(mean(CW30)-mean(CW35))
## [1] 10.8

Conclusion:

---> the only pair of means that we fail to reject is \(\mu_{1}\) & \(\mu_{5}\) , and \(\mu_{2}\) & \(\mu_{3}\) because the difference in means is less than the LSD=3.745. i.e.,

\[\bar{y}_{1.} - \bar{y}_{5.} = 1 < LSD\]
\[\bar{y}_{2.} - \bar{y}_{3.} = 1 < LSD\]
For all other pairs of means, we reject the null hypothesis and conclude that there is a significant difference between population means because their differences are > LSD Value as shown above.

PART C:

Since we already ran ONE Way ANOVA in Part B, we also ran ANOVA Validation i.e.: Plot (aov.model) command and got Plots for Residuals:

Conclusion:

---> Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.

---> From residual to fitted values plot we can see that points fairly lie in rectangular shape , which accepts the assumptions of constant variance

3 Question 3.44:

Suppose that four normal populations have means of \(\mu_{1}=50\) , \(\mu_{2}=60\), \(\mu_{3}=50\), and \(\mu_{4}=60\) How many observations should be taken from each population so that the probability of rejecting the null hypothesis of equal population means is at least 0.90? Assume that \(\alpha\) = 0.05 and that a reasonable estimate of the error variance is \(\sigma=5\).

3.1 Solution:

DATA:

Groups=4
Within Variance=25
\(\alpha=0.05\)
Power=90%

Seems like a maximum variability case, nonetheless since variance is given, we can use power.anova.test:

power.anova.test(groups = 4, n=NULL, between.var = var(c(50,50,60,60)), within.var = 25, sig.level = 0.05, power = 0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##          groups = 4
##               n = 4.658128
##     between.var = 33.33333
##      within.var = 25
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

OR We can use Cohen’s way:

Since Groups are even i.e. 4, we Use effect size as mentioned below:

\(f=\frac{d}{2}\)
where, \(d=\frac{\mu_{max}-\mu_{min}}{\sigma}\)

sigma=5
d=(60-50)/sigma
print(d)
## [1] 2
f=d/2
print(f)
## [1] 1
library(pwr)
pwr.anova.test(k=4,n=NULL,f=1, sig.level = 0.05, power = 0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 4.658119
##               f = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Conclusion:

---> From both the methods its proved that number of observations to be selected from each population are 5.

4 Question 3.45:

Refer to Problem 3.44.

  1. How would your answer change if a reasonable estimate of the experimental error variance were 36?

  2. How would your answer change if a reasonable estimate of the experimental error variance were 49?

  3. Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of 2 affects the decision about sample size?

  4. Can you make any recommendations about how we should use this general approach to choosing n in practice?

4.1 Solution:

PART A:

Variance=36
sigma=6
d=(60-50)/sigma
print(d)
## [1] 1.666667
f=d/2
print(f)
## [1] 0.8333333
pwr.anova.test(k=4,n=NULL,f=0.8333333, sig.level = 0.05, power = 0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 6.180858
##               f = 0.8333333
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Conclusion:

---> Number of observations to be selected from each population are 7

PART B:

Variance=49
sigma=7
d=(60-50)/sigma
print(d)
## [1] 1.428571
f=d/2
print(f)
## [1] 0.7142857
pwr.anova.test(k=4,n=NULL,f=0.7142857, sig.level = 0.05, power = 0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 7.998751
##               f = 0.7142857
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Conclusion:

---> Number of observations to be selected from each population are 8

PART C:

Answer: As the variance increases, number of samples to be collected from each population must also increase to obtain the results with same power level.

PART D:

Answer: When an experiment has to be set up and samples have to be collected, it would be a better approach to determine the upper and lower limits of variance i.e. generate a range of possible variances and then make the best possible estimate of number of samples to be collected.

5 Source Code:

getwd()

#QUESTION 3.7 PART C:

#Reading the Data:
Mixingtech1<-c(3129,3000,2865,2890)
Mixingtech2<-c(3200,3300,2975,3150)
Mixingtech3<-c(2800,2900,2985,3050)
Mixingtech4<-c(2600,2700,2600,2765)
dat<-data.frame(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4)
#Up until now dat in NOT tidy
library(tidyr)
dat<-pivot_longer(dat,c(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4))
print(dat)
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)

#To conduct the LSD test, we will calculate the least significant difference was calculated using the equation below. We use this equation because our data is balanced and n1=n2=n3=n4
#From t-table, t-statistic=
t<-2.179
MSE<-12826
n=4
LSD = t*sqrt(2*MSE/n)
print(LSD)
#If difference of any treatment average exceeds by more than 174.5 would imply that that pair of means significantly differs
abs(mean(Mixingtech1)-mean(Mixingtech2))
abs(mean(Mixingtech1)-mean(Mixingtech3))
abs(mean(Mixingtech1)-mean(Mixingtech4))
abs(mean(Mixingtech2)-mean(Mixingtech3))
abs(mean(Mixingtech2)-mean(Mixingtech4))
abs(mean(Mixingtech3)-mean(Mixingtech4))


#QUESTION 3.7 PART D:
#Already Done

#QUESTION 3.7 PART E:
#Already Done

#QUESTION 3.7 PART F:
Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
install.packages("car")
library(car)
library(carData)
scatterplot(Strength ~ Type, data=dat,
            xlab="Type", ylab="Strength",
            main="Scatter Plot")

#QUESTION 3.10 PART B:
CW15<-c(7,7,15,11,9)
CW20<-c(12,17,12,18,18)
CW25<-c(14,19,19,18,18)
CW30<-c(19,25,22,19,23)
CW35<-c(7,10,11,15,11)
dat<-data.frame(CW15,CW20,CW25,CW30,CW35)
library(tidyr)
dat<-pivot_longer(dat,c(CW15,CW20,CW25,CW30,CW35))
print(dat)
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)

t<-2.086
MSE<-8.06
n=5
LSD = t*sqrt(2*MSE/n)
print(LSD)

abs(mean(CW15)-mean(CW20))
abs(mean(CW15)-mean(CW25))
abs(mean(CW15)-mean(CW30))
abs(mean(CW15)-mean(CW35))
abs(mean(CW20)-mean(CW25))
abs(mean(CW20)-mean(CW30))
abs(mean(CW20)-mean(CW35))
abs(mean(CW25)-mean(CW30))
abs(mean(CW25)-mean(CW35))
abs(mean(CW30)-mean(CW35))

#QUESTION 3.10 PART C:
#Already Done

#QUESTION 3.44 :
library(pwr)
power.anova.test(groups = 4, n=NULL, between.var = var(c(50,50,60,60)), within.var = 25, sig.level = 0.05, power = 0.90)
#OR
sigma=5
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=1, sig.level = 0.05, power = 0.90)

#QUESTION 3.45 :
#Part A:
Variance=36
sigma=6
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=0.8333333, sig.level = 0.05, power = 0.90)

#Part B:
Variance=49
sigma=7
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=0.7142857, sig.level = 0.05, power = 0.90)