Setup

Load Libraries Into Session

# setup Libraries
library(dplyr)
library(knitr)
library(agricolae)
library(lawstat)
library(BSDA)
library(kableExtra)
library(tidyr)
library(pwr)
library(car)
library(ggplot2)

Problem 3.7 (c,d,e,f)

The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the following data were recorded:

MixMethod 1 2 3 4
1 3129 3000 2865 2890
2 3200 3300 2975 3150
3 2800 2900 2985 3050
4 2600 2700 2600 2765

Part (c)

Use the Fisher LSD method with \(\alpha=0.05\) to make comparisons between pairs of means.

Setting up Data Frame for Part (c)

MixTeq1 <- c(3129,3000,2865,2890)
MixTeq2 <- c(3200,3300,2975,3150)
MixTeq3 <- c(2800,2900,2985,3050)
MixTeq4 <- c(2600,2700,2600,2765)

CementTestTable2 <- data.frame(MixTeq1,MixTeq2,MixTeq3,MixTeq4)
CementTestTableLong <- pivot_longer(CementTestTable2,c(MixTeq1,MixTeq2,MixTeq3,MixTeq4))
Tidied Cement Test Table
name value
MixTeq1 3129
MixTeq2 3200
MixTeq3 2800
MixTeq4 2600
MixTeq1 3000
MixTeq2 3300
MixTeq3 2900
MixTeq4 2700
MixTeq1 2865
MixTeq2 2975
MixTeq3 2985
MixTeq4 2600
MixTeq1 2890
MixTeq2 3150
MixTeq3 3050
MixTeq4 2765

Creating AOV Model and Running LSD Test

CementTestaovModel<-aov(value~name,data=CementTestTableLong)
summary(CementTestaovModel)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         3 489740  163247   12.73 0.000489 ***
## Residuals   12 153908   12826                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R calculated \(MSE=12826\) and from Appendix II in the book, \(t=2.179\) at \(\alpha/2=.025\),\(df=12\).

tCement <- 2.179
CementMSE <- 12826
nCement <- 4
CementLSD <- tCement*sqrt(2*CementMSE/nCement)

The LSD value is calculated as 174.4969539.

Now we will compare the difference in means between each set of Mixing Technique results to the LSD.

MixTeq1to2 <- abs(mean(MixTeq1)-mean(MixTeq2))
MixTeq1to3 <- abs(mean(MixTeq1)-mean(MixTeq3))
MixTeq1to4 <- abs(mean(MixTeq1)-mean(MixTeq4))
MixTeq2to3 <- abs(mean(MixTeq2)-mean(MixTeq3))
MixTeq2to4 <- abs(mean(MixTeq2)-mean(MixTeq4))
MixTeq3to4 <- abs(mean(MixTeq3)-mean(MixTeq4))

Mean difference between pairs of Mixing Techniques

Technique 1 to Technique 2: 185.25

Technique 1 to Technique 3: 37.25

Technique 1 to Technique 4: 304.75

Technique 2 to Technique 3: 222.5

Technique 2 to Technique 4: 490

Technique 3 to Technique 4: 267.5

When compared to the LSD value of 174.4969539, we can see that the only pair of Mixing Techniques that do not differ significantly in means is Mixing Technique 1 to Mixing Technique 3 because its difference in means of 37.25\(<\) the LSD of 174.4969539. The other pairs do differ significantly in means.

Part (d)

Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?

Answer: Based on the NPP, the data appears to be in a fairly straight line meaning the normality assumption is validated.

Part (e)

Plot the residuals versus the predicted tensile strength. Comment on the plot.

The plot shows that the data has a fairly constant variance, showing that the constant variance portion of the assumptions is met.

Part (f)

Prepare a scatter plot of the results to aid the interpretation of this experiment.

For some reason, R is not displaying the 4th plot in the AOV plot series, which I believe is what the question is looking for.

Problem 3.10 (b,c)

A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton and replicates the experiment five times. The data are shown in the following table.

PercentCotton 1 2 3 4 5
15 7 7 15 11 9
20 12 17 12 18 18
25 14 19 19 18 18
30 19 25 22 19 23
35 7 10 11 15 11

Part (b)

Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?

Setting up Data Frame for Part (b)

CottonWt15 <- c(7,7,15,11,9)
CottonWt20 <- c(12,17,12,18,18)
CottonWt25 <- c(14,19,19,18,18)
CottonWt30 <- c(19,25,22,19,23)
CottonWt35 <- c(7,10,11,15,11)

CottonTableNew <- data.frame(CottonWt15,CottonWt20,CottonWt25,CottonWt30,CottonWt35)
CottonTableLong <- pivot_longer(CottonTableNew,c(CottonWt15,CottonWt20,CottonWt25,CottonWt30,CottonWt35))
Tidied Cotton Table
name value
CottonWt15 7
CottonWt20 12
CottonWt25 14
CottonWt30 19
CottonWt35 7
CottonWt15 7
CottonWt20 17
CottonWt25 19
CottonWt30 25
CottonWt35 10
CottonWt15 15
CottonWt20 12
CottonWt25 19
CottonWt30 22
CottonWt35 11
CottonWt15 11
CottonWt20 18
CottonWt25 18
CottonWt30 19
CottonWt35 15
CottonWt15 9
CottonWt20 18
CottonWt25 18
CottonWt30 23
CottonWt35 11

Creating AOV Model and Running LSD Test

CottonTestaovModel<-aov(value~name,data=CottonTableLong)
summary(CottonTestaovModel)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## name         4  475.8  118.94   14.76 9.13e-06 ***
## Residuals   20  161.2    8.06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

R calculated \(MSE=8.06\) and from Appendix II in the book, \(t=2.086\) at \(\alpha/2=.025\),\(df=20\).

tCotton <- 2.086
CottonMSE <- 8.06
nCotton <- 5
CottonLSD <- tCotton*sqrt(2*CottonMSE/nCotton)

The LSD value is calculated as 3.7455174.

Now we will compare the difference in means between each set of Mixing Technique results to the LSD.

CottonWt15to20 <- abs(mean(CottonWt15)-mean(CottonWt20))
CottonWt15to25 <- abs(mean(CottonWt15)-mean(CottonWt25))
CottonWt15to30 <- abs(mean(CottonWt15)-mean(CottonWt30))
CottonWt15to35 <- abs(mean(CottonWt15)-mean(CottonWt35))
CottonWt20to25 <- abs(mean(CottonWt20)-mean(CottonWt25))
CottonWt20to30 <- abs(mean(CottonWt20)-mean(CottonWt30))
CottonWt20to35 <- abs(mean(CottonWt20)-mean(CottonWt35))
CottonWt25to30 <- abs(mean(CottonWt25)-mean(CottonWt30))
CottonWt25to35 <- abs(mean(CottonWt25)-mean(CottonWt35))
CottonWt30to35 <- abs(mean(CottonWt30)-mean(CottonWt35))

Mean difference between pairs of Cotton Weights

Cotton Wt 15 to Cotton Wt 20: 5.6

Cotton Wt 15 to Cotton Wt 25: 7.8

Cotton Wt 15 to Cotton Wt 30: 11.8

Cotton Wt 15 to Cotton Wt 35: 1

Cotton Wt 20 to Cotton Wt 25: 2.2

Cotton Wt 20 to Cotton Wt 30: 6.2

Cotton Wt 20 to Cotton Wt 35: 4.6

Cotton Wt 25 to Cotton Wt 30: 4

Cotton Wt 25 to Cotton Wt 35: 6.8

Cotton Wt 30 to Cotton Wt 35: 10.8

When compared to the LSD value of 3.7455174, we can see that the 15% Cotton Weight-35% Cotton Weight mean difference of 1 and 20% Cotton Weight-25% Cotton Weight mean difference of 2.2 are both \(<\) the LSD of 3.7455174. This means that those two pairs do not differ significantly in mean and the other pairs do differ significantly in mean.

Part (c)

Analyze the residuals from this experiment and comment on model adequacy.

Answer: This is hard to answer. The Residuals vs Fitted plot shows that there is a fairly constant variance across the data. The Normal Probability plot shows the data in a fairly straight line with a slight tail at the beginning. However, I will say that the model is adequate and the two plots show that the data is normal with a constant variance.

Problem 3.44

Suppose that four normal populations have means of \(u_1=50\), \(u_2=60\), \(u_3=50\), and \(u_4=60\). How many observations should be taken from each population so that the probability of rejecting the null hypothesis of equal population means is at least 0.90? Assume that \(\alpha=.05\) and that the reasonable estimate of the error variance is \(\sigma^2=25\).

Hypotheses

\(H_0: u_1 = u_2 = u_3 = u_4\)

\(H_a\): at least one \(u\) differs from the other \(u\)’s.

Running Power Test

mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 25
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 4.658119
##               f = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

R calculates that \(n=4.65\) (rounded up to 5) samples per population should be taken. However, I don’t believe that this value is correct because I don’t think I correctly solved for the difference in means to use in the power.anova.test.

Problem 3.45

Refer to Problem 3.44.

Part (a)

How would your answer change if a reasonable estimate of the experimental error variance were \(\sigma^2=36\)?

mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 36
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 6.180857
##               f = 0.8333333
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

R calculates R calculates that \(n=6.18\) (rounded up to 7) samples per population should be taken. However, I don’t believe that this value is correct because, as before, I don’t think I correctly solved for the difference in means to use in the power.anova.test.

Part (b)

How would your answer change if a reasonable estimate of the experimental error variance were \(\sigma^2=49\)?

mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 49
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 7.998751
##               f = 0.7142857
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

R calculates R calculates that \(n=8\) samples per population should be taken. However, I don’t believe that this value is correct because, as before, I don’t think I correctly solved for the difference in means to use in the power.anova.test.

Part (c)

Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of \(\sigma\) affects the decision about sample size?

Answer: Yes. As \(\sigma\) increases, the sample size needed per population also increases.

Part (d)

Can you make any recommendations about how we should use this general approach to choosing \(n\) in practice?

Answer: Yes. This general approach should only be used when the variance is known or can be reasonably estimated. As seen previously, when the variance changes, the required sample size also changes.