The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the following data were collected:
| Mixing Technique | 1 | 2 | 3 | 4 |
|---|---|---|---|---|
| 1 | 3129 | 3000 | 2865 | 2890 |
| 2 | 3200 | 3300 | 2975 | 3150 |
| 3 | 2800 | 2900 | 2985 | 3050 |
| 4 | 2600 | 2700 | 2600 | 2765 |
Use the Fisher LSD method with \(\alpha\) 0.05 to make comparisons between pairs of means.
Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
Plot the residuals versus the predicted tensile strength. Comment on the plot.
Prepare a scatter plot of the results to aid the interpretation of the results of this experiment
PART C:
Reading the Data in TIDYR Format:
Mixingtech1<-c(3129,3000,2865,2890)
Mixingtech2<-c(3200,3300,2975,3150)
Mixingtech3<-c(2800,2900,2985,3050)
Mixingtech4<-c(2600,2700,2600,2765)
dat<-data.frame(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4)
Now using Pivot_Longer command to create tidy data:
library(tidyr)
dat<-pivot_longer(dat,c(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4))
print(dat)
## # A tibble: 16 × 2
## name value
## <chr> <dbl>
## 1 Mixingtech1 3129
## 2 Mixingtech2 3200
## 3 Mixingtech3 2800
## 4 Mixingtech4 2600
## 5 Mixingtech1 3000
## 6 Mixingtech2 3300
## 7 Mixingtech3 2900
## 8 Mixingtech4 2700
## 9 Mixingtech1 2865
## 10 Mixingtech2 2975
## 11 Mixingtech3 2985
## 12 Mixingtech4 2600
## 13 Mixingtech1 2890
## 14 Mixingtech2 3150
## 15 Mixingtech3 3050
## 16 Mixingtech4 2765
Now performing ONE Way ANOVA:
aov.model<-aov(value~name,data=dat)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 3 489740 163247 12.73 0.000489 ***
## Residuals 12 153908 12826
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
To conduct the LSD test: Defining the hypothesis first:
\[H_o:\space \mu_{i} \space - \mu_{j}= 0\]\[H_a:\space \mu_{i} \space - \mu_{j}\neq 0\] where, i and j correspond to mixing techniques.
To calculate the least significant difference using the equation below:
\[ LSD= \tau _{\alpha /2,N-1} * \sqrt{\frac{2MSE}{n}} \]
We use this equation because our data is balanced and n1=n2=n3=n4
From t-table, We note down our t-statistic and MSE from ANOVA results:
t<-2.179
MSE<-12826
n=4
Using the LSD Equation:
LSD = t*sqrt(2*MSE/n)
print(LSD)
## [1] 174.497
Now, If difference of any treatment average exceeds by more than 174.5 would imply that that pair of means significantly differs:
abs(mean(Mixingtech1)-mean(Mixingtech2))
## [1] 185.25
abs(mean(Mixingtech1)-mean(Mixingtech3))
## [1] 37.25
abs(mean(Mixingtech1)-mean(Mixingtech4))
## [1] 304.75
abs(mean(Mixingtech2)-mean(Mixingtech3))
## [1] 222.5
abs(mean(Mixingtech2)-mean(Mixingtech4))
## [1] 490
abs(mean(Mixingtech3)-mean(Mixingtech4))
## [1] 267.5
Conclusion:
---> the only pair of means that we fail to reject is \(\mu_{1}\) & \(\mu_{3}\) , because the difference in means is less than the LSD. i.e.,
\[\bar{y}_{1.} - \bar{y}_{2.} = 37.5 <
LSD\]
For all other pairs of means, we reject the null hypothesis and
conclude that there is a significant difference between population means
because their differences are > LSD Value as shown above
PART D:
Since we already ran ONE Way ANOVA in Part C, we also ran
ANOVA Validation i.e.: Plot (aov.model) command and got Normal
Probability Plot.
Conclusion:
---> Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.
PART E:
Conclusion:
---> looking at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.
PART F:
Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
library(car)
library(carData)
scatterplot(Strength ~ Type, data=dat,
xlab="Mixing Technique", ylab="Tensile Strength",
main="Scatter Plot")
Conclusion:
---> Mixing techniques one and three look as if they would be significantly similar. This was actually proven in part c to be true. Graphically, we can tell that all other means are different.
---> The plot above also shows the sample average for each treatment and the 95 percent confidence interval on the treatment mean.
---> If we look at the graph “Residuals vs Factor Levels” values, we can see the scatter plot, depicting similar results.
A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton content and replicates the experiment five times. The data are shown in the following table.
| Cotton Weight % | 1 | 2 | 3 | 4 | 5 |
|---|---|---|---|---|---|
| 15 | 7 | 7 | 15 | 11 | 9 |
| 20 | 12 | 17 | 12 | 18 | 18 |
| 25 | 14 | 19 | 19 | 18 | 18 |
| 30 | 19 | 25 | 22 | 19 | 23 |
| 35 | 7 | 10 | 11 | 15 | 11 |
Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?
Analyze the residuals from this experiment and comment on model adequacy.
PART B:
Reading the Data in TIDYR Format:
CW15<-c(7,7,15,11,9)
CW20<-c(12,17,12,18,18)
CW25<-c(14,19,19,18,18)
CW30<-c(19,25,22,19,23)
CW35<-c(7,10,11,15,11)
dat<-data.frame(CW15,CW20,CW25,CW30,CW35)
Now using Pivot_Longer command to create tidy data:
library(tidyr)
dat<-pivot_longer(dat,c(CW15,CW20,CW25,CW30,CW35))
print(dat)
## # A tibble: 25 × 2
## name value
## <chr> <dbl>
## 1 CW15 7
## 2 CW20 12
## 3 CW25 14
## 4 CW30 19
## 5 CW35 7
## 6 CW15 7
## 7 CW20 17
## 8 CW25 19
## 9 CW30 25
## 10 CW35 10
## # … with 15 more rows
Now performing ONE Way ANOVA:
aov.model<-aov(value~name,data=dat)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## name 4 475.8 118.94 14.76 9.13e-06 ***
## Residuals 20 161.2 8.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)
To conduct the LSD test: Defining the hypothesis first:
\[H_o:\space \mu_{i} \space - \mu_{j}= 0\]\[H_a:\space \mu_{i} \space - \mu_{j}\neq 0\] where, i and j correspond to Cotton Weight Percentages.
To calculate the least significant difference using the equation below:
\[ LSD= \tau _{\alpha /2,N-1} * \sqrt{\frac{2MSE}{n}} \]
We use this equation because our data is balanced and n1=n2=n3=n4=n5
From t-table, We note down our t-statistic and MSE from ANOVA results:
t<-2.086
MSE<-8.06
n=5
Using the LSD Equation:
LSD = t*sqrt(2*MSE/n)
print(LSD)
## [1] 3.745517
Now, If difference of any treatment average exceeds by more than 3.745 would imply that that pair of means significantly differs:
abs(mean(CW15)-mean(CW20))
## [1] 5.6
abs(mean(CW15)-mean(CW25))
## [1] 7.8
abs(mean(CW15)-mean(CW30))
## [1] 11.8
abs(mean(CW15)-mean(CW35))
## [1] 1
abs(mean(CW20)-mean(CW25))
## [1] 2.2
abs(mean(CW20)-mean(CW30))
## [1] 6.2
abs(mean(CW20)-mean(CW35))
## [1] 4.6
abs(mean(CW25)-mean(CW30))
## [1] 4
abs(mean(CW25)-mean(CW35))
## [1] 6.8
abs(mean(CW30)-mean(CW35))
## [1] 10.8
Conclusion:
---> the only pair of means that we fail to reject is \(\mu_{1}\) & \(\mu_{5}\) , and \(\mu_{2}\) & \(\mu_{3}\) because the difference in means is less than the LSD=3.745. i.e.,
\[\bar{y}_{1.} - \bar{y}_{5.} = 1 <
LSD\]
\[\bar{y}_{2.} - \bar{y}_{3.} = 1 <
LSD\]
For all other pairs of means, we reject the null hypothesis and
conclude that there is a significant difference between population means
because their differences are > LSD Value as shown above.
PART C:
Since we already ran ONE Way ANOVA in Part B, we also ran ANOVA Validation i.e.: Plot (aov.model) command and got Plots for Residuals:
Conclusion:
---> Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.
---> From residual to fitted values plot we can see that points fairly lie in rectangular shape , which accepts the assumptions of constant variance
Suppose that four normal populations have means of \(\mu_{1}=50\) , \(\mu_{2}=60\), \(\mu_{3}=50\), and \(\mu_{4}=60\) How many observations should be taken from each population so that the probability of rejecting the null hypothesis of equal population means is at least 0.90? Assume that \(\alpha\) = 0.05 and that a reasonable estimate of the error variance is \(\sigma=5\).
DATA:
Groups=4
Within Variance=25
\(\alpha=0.05\)
Power=90%
Seems like a maximum variability case, nonetheless since variance is given, we can use power.anova.test:
power.anova.test(groups = 4, n=NULL, between.var = var(c(50,50,60,60)), within.var = 25, sig.level = 0.05, power = 0.90)
##
## Balanced one-way analysis of variance power calculation
##
## groups = 4
## n = 4.658128
## between.var = 33.33333
## within.var = 25
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
OR We can use Cohen’s way:
Since Groups are even i.e. 4, we Use effect size as mentioned
below:
\(f=\frac{d}{2}\)
where, \(d=\frac{\mu_{max}-\mu_{min}}{\sigma}\)
sigma=5
d=(60-50)/sigma
print(d)
## [1] 2
f=d/2
print(f)
## [1] 1
library(pwr)
pwr.anova.test(k=4,n=NULL,f=1, sig.level = 0.05, power = 0.90)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 4.658119
## f = 1
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
Conclusion:
---> From both the methods its proved that number of observations to be selected from each population are 5.
Refer to Problem 3.44.
How would your answer change if a reasonable estimate of the experimental error variance were 36?
How would your answer change if a reasonable estimate of the experimental error variance were 49?
Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of 2 affects the decision about sample size?
Can you make any recommendations about how we should use this general approach to choosing n in practice?
PART A:
Variance=36
sigma=6
d=(60-50)/sigma
print(d)
## [1] 1.666667
f=d/2
print(f)
## [1] 0.8333333
pwr.anova.test(k=4,n=NULL,f=0.8333333, sig.level = 0.05, power = 0.90)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 6.180858
## f = 0.8333333
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
Conclusion:
---> Number of observations to be selected from each population are 7
PART B:
Variance=49
sigma=7
d=(60-50)/sigma
print(d)
## [1] 1.428571
f=d/2
print(f)
## [1] 0.7142857
pwr.anova.test(k=4,n=NULL,f=0.7142857, sig.level = 0.05, power = 0.90)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 7.998751
## f = 0.7142857
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
Conclusion:
---> Number of observations to be selected from each population are 8
PART C:
Answer: As the variance increases, number of samples to be collected from each population must also increase to obtain the results with same power level.
PART D:
Answer: When an experiment has to be set up and samples have to be collected, it would be a better approach to determine the upper and lower limits of variance i.e. generate a range of possible variances and then make the best possible estimate of number of samples to be collected.
getwd()
#QUESTION 3.7 PART C:
#Reading the Data:
Mixingtech1<-c(3129,3000,2865,2890)
Mixingtech2<-c(3200,3300,2975,3150)
Mixingtech3<-c(2800,2900,2985,3050)
Mixingtech4<-c(2600,2700,2600,2765)
dat<-data.frame(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4)
#Up until now dat in NOT tidy
library(tidyr)
dat<-pivot_longer(dat,c(Mixingtech1,Mixingtech2,Mixingtech3,Mixingtech4))
print(dat)
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)
#To conduct the LSD test, we will calculate the least significant difference was calculated using the equation below. We use this equation because our data is balanced and n1=n2=n3=n4
#From t-table, t-statistic=
t<-2.179
MSE<-12826
n=4
LSD = t*sqrt(2*MSE/n)
print(LSD)
#If difference of any treatment average exceeds by more than 174.5 would imply that that pair of means significantly differs
abs(mean(Mixingtech1)-mean(Mixingtech2))
abs(mean(Mixingtech1)-mean(Mixingtech3))
abs(mean(Mixingtech1)-mean(Mixingtech4))
abs(mean(Mixingtech2)-mean(Mixingtech3))
abs(mean(Mixingtech2)-mean(Mixingtech4))
abs(mean(Mixingtech3)-mean(Mixingtech4))
#QUESTION 3.7 PART D:
#Already Done
#QUESTION 3.7 PART E:
#Already Done
#QUESTION 3.7 PART F:
Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
install.packages("car")
library(car)
library(carData)
scatterplot(Strength ~ Type, data=dat,
xlab="Type", ylab="Strength",
main="Scatter Plot")
#QUESTION 3.10 PART B:
CW15<-c(7,7,15,11,9)
CW20<-c(12,17,12,18,18)
CW25<-c(14,19,19,18,18)
CW30<-c(19,25,22,19,23)
CW35<-c(7,10,11,15,11)
dat<-data.frame(CW15,CW20,CW25,CW30,CW35)
library(tidyr)
dat<-pivot_longer(dat,c(CW15,CW20,CW25,CW30,CW35))
print(dat)
aov.model<-aov(value~name,data=dat)
summary(aov.model)
plot(aov.model)
t<-2.086
MSE<-8.06
n=5
LSD = t*sqrt(2*MSE/n)
print(LSD)
abs(mean(CW15)-mean(CW20))
abs(mean(CW15)-mean(CW25))
abs(mean(CW15)-mean(CW30))
abs(mean(CW15)-mean(CW35))
abs(mean(CW20)-mean(CW25))
abs(mean(CW20)-mean(CW30))
abs(mean(CW20)-mean(CW35))
abs(mean(CW25)-mean(CW30))
abs(mean(CW25)-mean(CW35))
abs(mean(CW30)-mean(CW35))
#QUESTION 3.10 PART C:
#Already Done
#QUESTION 3.44 :
library(pwr)
power.anova.test(groups = 4, n=NULL, between.var = var(c(50,50,60,60)), within.var = 25, sig.level = 0.05, power = 0.90)
#OR
sigma=5
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=1, sig.level = 0.05, power = 0.90)
#QUESTION 3.45 :
#Part A:
Variance=36
sigma=6
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=0.8333333, sig.level = 0.05, power = 0.90)
#Part B:
Variance=49
sigma=7
d=(60-50)/sigma
print(d)
f=d/2
print(f)
pwr.anova.test(k=4,n=NULL,f=0.7142857, sig.level = 0.05, power = 0.90)