# setup Libraries
library(dplyr)
library(knitr)
library(agricolae)
library(lawstat)
library(BSDA)
library(kableExtra)
library(tidyr)
library(pwr)
library(car)
library(ggplot2)
The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the following data were recorded:
| MixMethod | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 1 | 3129 | 3000 | 2865 | 2890 |
| 2 | 3200 | 3300 | 2975 | 3150 |
| 3 | 2800 | 2900 | 2985 | 3050 |
| 4 | 2600 | 2700 | 2600 | 2765 |
Use the Fisher LSD method with \(\alpha=0.05\) to make comparisons between pairs of means.
MixTeq1 <- c(3129,3000,2865,2890)
MixTeq2 <- c(3200,3300,2975,3150)
MixTeq3 <- c(2800,2900,2985,3050)
MixTeq4 <- c(2600,2700,2600,2765)
CementTestTable2 <- data.frame(MixTeq1,MixTeq2,MixTeq3,MixTeq4)
CementTestTableLong <- pivot_longer(CementTestTable2,c(MixTeq1,MixTeq2,MixTeq3,MixTeq4))
| name | value |
|---|---|
| MixTeq1 | 3129 |
| MixTeq2 | 3200 |
| MixTeq3 | 2800 |
| MixTeq4 | 2600 |
| MixTeq1 | 3000 |
| MixTeq2 | 3300 |
| MixTeq3 | 2900 |
| MixTeq4 | 2700 |
| MixTeq1 | 2865 |
| MixTeq2 | 2975 |
| MixTeq3 | 2985 |
| MixTeq4 | 2600 |
| MixTeq1 | 2890 |
| MixTeq2 | 3150 |
| MixTeq3 | 3050 |
| MixTeq4 | 2765 |
CementTestaovModel<-aov(value~name,data=CementTestTableLong)
summary(CementTestaovModel)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 489740 163247 12.73 0.000489 ***
## Residuals 12 153908 12826
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R calculated \(MSE=12826\) and from Appendix II in the book, \(t=2.179\) at \(\alpha/2=.025\),\(df=12\).
tCement <- 2.179
CementMSE <- 12826
nCement <- 4
CementLSD <- tCement*sqrt(2*CementMSE/nCement)
The LSD value is calculated as 174.4969539.
Now we will compare the difference in means between each set of Mixing Technique results to the LSD.
MixTeq1to2 <- abs(mean(MixTeq1)-mean(MixTeq2))
MixTeq1to3 <- abs(mean(MixTeq1)-mean(MixTeq3))
MixTeq1to4 <- abs(mean(MixTeq1)-mean(MixTeq4))
MixTeq2to3 <- abs(mean(MixTeq2)-mean(MixTeq3))
MixTeq2to4 <- abs(mean(MixTeq2)-mean(MixTeq4))
MixTeq3to4 <- abs(mean(MixTeq3)-mean(MixTeq4))
Mean difference between pairs of Mixing Techniques
Technique 1 to Technique 2: 185.25
Technique 1 to Technique 3: 37.25
Technique 1 to Technique 4: 304.75
Technique 2 to Technique 3: 222.5
Technique 2 to Technique 4: 490
Technique 3 to Technique 4: 267.5
When compared to the LSD value of 174.4969539, we can see that the only pair of Mixing Techniques that do not differ significantly in means is Mixing Technique 1 to Mixing Technique 3 because its difference in means of 37.25\(<\) the LSD of 174.4969539. The other pairs do differ significantly in means.
Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
Answer: Based on the NPP, the data appears to be in a fairly straight line meaning the normality assumption is validated.
Plot the residuals versus the predicted tensile strength. Comment on the plot.
The plot shows that the data has a fairly constant variance, showing that the constant variance portion of the assumptions is met.
Prepare a scatter plot of the results to aid the interpretation of this experiment.
For some reason, R is not displaying the 4th plot in the AOV plot series, which I believe is what the question is looking for.
A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton and replicates the experiment five times. The data are shown in the following table.
| PercentCotton | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| 15 | 7 | 7 | 15 | 11 | 9 |
| 20 | 12 | 17 | 12 | 18 | 18 |
| 25 | 14 | 19 | 19 | 18 | 18 |
| 30 | 19 | 25 | 22 | 19 | 23 |
| 35 | 7 | 10 | 11 | 15 | 11 |
Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?
CottonWt15 <- c(7,7,15,11,9)
CottonWt20 <- c(12,17,12,18,18)
CottonWt25 <- c(14,19,19,18,18)
CottonWt30 <- c(19,25,22,19,23)
CottonWt35 <- c(7,10,11,15,11)
CottonTableNew <- data.frame(CottonWt15,CottonWt20,CottonWt25,CottonWt30,CottonWt35)
CottonTableLong <- pivot_longer(CottonTableNew,c(CottonWt15,CottonWt20,CottonWt25,CottonWt30,CottonWt35))
| name | value |
|---|---|
| CottonWt15 | 7 |
| CottonWt20 | 12 |
| CottonWt25 | 14 |
| CottonWt30 | 19 |
| CottonWt35 | 7 |
| CottonWt15 | 7 |
| CottonWt20 | 17 |
| CottonWt25 | 19 |
| CottonWt30 | 25 |
| CottonWt35 | 10 |
| CottonWt15 | 15 |
| CottonWt20 | 12 |
| CottonWt25 | 19 |
| CottonWt30 | 22 |
| CottonWt35 | 11 |
| CottonWt15 | 11 |
| CottonWt20 | 18 |
| CottonWt25 | 18 |
| CottonWt30 | 19 |
| CottonWt35 | 15 |
| CottonWt15 | 9 |
| CottonWt20 | 18 |
| CottonWt25 | 18 |
| CottonWt30 | 23 |
| CottonWt35 | 11 |
CottonTestaovModel<-aov(value~name,data=CottonTableLong)
summary(CottonTestaovModel)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 475.8 118.94 14.76 9.13e-06 ***
## Residuals 20 161.2 8.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
R calculated \(MSE=8.06\) and from Appendix II in the book, \(t=2.086\) at \(\alpha/2=.025\),\(df=20\).
tCotton <- 2.086
CottonMSE <- 8.06
nCotton <- 5
CottonLSD <- tCotton*sqrt(2*CottonMSE/nCotton)
The LSD value is calculated as 3.7455174.
Now we will compare the difference in means between each set of Mixing Technique results to the LSD.
CottonWt15to20 <- abs(mean(CottonWt15)-mean(CottonWt20))
CottonWt15to25 <- abs(mean(CottonWt15)-mean(CottonWt25))
CottonWt15to30 <- abs(mean(CottonWt15)-mean(CottonWt30))
CottonWt15to35 <- abs(mean(CottonWt15)-mean(CottonWt35))
CottonWt20to25 <- abs(mean(CottonWt20)-mean(CottonWt25))
CottonWt20to30 <- abs(mean(CottonWt20)-mean(CottonWt30))
CottonWt20to35 <- abs(mean(CottonWt20)-mean(CottonWt35))
CottonWt25to30 <- abs(mean(CottonWt25)-mean(CottonWt30))
CottonWt25to35 <- abs(mean(CottonWt25)-mean(CottonWt35))
CottonWt30to35 <- abs(mean(CottonWt30)-mean(CottonWt35))
Mean difference between pairs of Cotton Weights
Cotton Wt 15 to Cotton Wt 20: 5.6
Cotton Wt 15 to Cotton Wt 25: 7.8
Cotton Wt 15 to Cotton Wt 30: 11.8
Cotton Wt 15 to Cotton Wt 35: 1
Cotton Wt 20 to Cotton Wt 25: 2.2
Cotton Wt 20 to Cotton Wt 30: 6.2
Cotton Wt 20 to Cotton Wt 35: 4.6
Cotton Wt 25 to Cotton Wt 30: 4
Cotton Wt 25 to Cotton Wt 35: 6.8
Cotton Wt 30 to Cotton Wt 35: 10.8
When compared to the LSD value of 3.7455174, we can see that the 15% Cotton Weight-35% Cotton Weight mean difference of 1 and 20% Cotton Weight-25% Cotton Weight mean difference of 2.2 are both \(<\) the LSD of 3.7455174. This means that those two pairs do not differ significantly in mean and the other pairs do differ significantly in mean.
Analyze the residuals from this experiment and comment on model adequacy.
Answer: This is hard to answer. The Residuals vs Fitted plot shows that there is a fairly constant variance across the data. The Normal Probability plot shows the data in a fairly straight line with a slight tail at the beginning. However, I will say that the model is adequate and the two plots show that the data is normal with a constant variance.
Suppose that four normal populations have means of \(u_1=50\), \(u_2=60\), \(u_3=50\), and \(u_4=60\). How many observations should be taken from each population so that the probability of rejecting the null hypothesis of equal population means is at least 0.90? Assume that \(\alpha=.05\) and that the reasonable estimate of the error variance is \(\sigma^2=25\).
\(H_0: u_1 = u_2 = u_3 = u_4\)
\(H_a\): at least one \(u\) differs from the other \(u\)’s.
mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 25
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 4.658119
## f = 1
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
R calculates that \(n=4.65\) (rounded up to 5) samples per population should be taken. However, I don’t believe that this value is correct because I don’t think I correctly solved for the difference in means to use in the power.anova.test.
Refer to Problem 3.44.
How would your answer change if a reasonable estimate of the experimental error variance were \(\sigma^2=36\)?
mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 36
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 6.180857
## f = 0.8333333
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
R calculates R calculates that \(n=6.18\) (rounded up to 7) samples per population should be taken. However, I don’t believe that this value is correct because, as before, I don’t think I correctly solved for the difference in means to use in the power.anova.test.
How would your answer change if a reasonable estimate of the experimental error variance were \(\sigma^2=49\)?
mu1 <- 50
mu2 <- 60
mu3 <- 50
mu4 <- 60
k3.44 <- 4
var3.44 <- 49
alpha3.44 <- 0.05
power3.44 <- 0.9
diffmeans3.44=abs((mu1-mu2)+(mu3-mu4))/k3.44
pwr.anova.test(k=k3.44,n=NULL,f=sqrt((diffmeans3.44)^2/var3.44),sig.level=alpha3.44,power=power3.44)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 7.998751
## f = 0.7142857
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
R calculates R calculates that \(n=8\) samples per population should be taken. However, I don’t believe that this value is correct because, as before, I don’t think I correctly solved for the difference in means to use in the power.anova.test.
Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of \(\sigma\) affects the decision about sample size?
Answer: Yes. As \(\sigma\) increases, the sample size needed per population also increases.
Can you make any recommendations about how we should use this general approach to choosing \(n\) in practice?
Answer: Yes. This general approach should only be used when the variance is known or can be reasonably estimated. As seen previously, when the variance changes, the required sample size also changes.