Question 3.7
The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the data were collected.
Part C. Use the Fisher LSD method with a=0.05 to make comparisons between pairs of means.
Reading Data
Pop1 <- c(3129, 3000, 2865, 2890)
Pop2 <- c(3200, 3300, 2975, 3150)
Pop3 <- c(2800, 2900, 2985, 3050)
Pop4 <- c(2600, 2700, 2600, 2765)
Pop <- rbind(Pop1, Pop2, Pop3, Pop4)
GA <- c(mean(Pop))
a <- mean(Pop1)
b <- mean(Pop2)
c <- mean(Pop3)
d <- mean(Pop4)
SSE1 <- (3129-a)^2 + (3000-a)^2 + (2865-a)^2 + (2890-a)^2
SSE2 <- (3200-b)^2 + (3300-b)^2 + (2975-b)^2 + (3150-b)^2
SSE3 <- (2800-c)^2 + (2900-c)^2 + (2985-c)^2 + (3050-c)^2
SSE4 <- (2600-d)^2 + (2700-d)^2 + (2600-d)^2 + (2765-d)^2
SSE <- SSE1 + SSE2 + SSE3 + SSE4
MSE <- SSE / (12)
SSTr <- 4*((a - GA)^2 + (b - GA)^2 + (c - GA)^2 + (d - GA)^2)
MSTr <- SSTr / (3)
SST <- SSE + SSTr
Statistic <- MSTr / MSE
Answer Part C: For Fisher LSD Method, first we’ll calculate LSD value, and the T Value that we’ll use to calculate LSD value will be determined by a/2 = 0.025.
T <- c(2.179)
Since number of observations in all populations are equal so ni = nj = n = 4.
LSD <- c(T*sqrt(2*MSE/4))
str(LSD)
## num 174
Vectors a,b,c,d have means of populations 1,2,3,4 respectively.
Population2 vs Population1, Population2 vs Population3, Population2 vs Population4, Population1 vs Population4, Population3 vs Population4.
D21 <- b - a
str(D21)
## num 185
D23 <- b - c
str(D23)
## num 222
D24 <- b - d
str(D24)
## num 490
D14 <- a - d
str(D14)
## num 305
D34 <- c - d
str(D34)
## num 268
Since D21 is greater than LSD value, so populations 2 & 1 differ.
Since D23 is greater than LSD value, so populations 2 & 3 differ.
Since D24 is greater than LSD value, so populations 2 & 4 differ.
Since D14 is greater than LSD value, so populations 1 & 4 differ.
Since D34 is greater than LSD value, so populations 3 & 4 differ.
These are all the population combinations that differ, we only tested those which differ from each other and guessing that is very simple by looking at the population mean values.
Part D: Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?
Part F: Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.
Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
Data <- cbind(Strength,Type)
Data <- data.frame(Data)
Data$Type <- as.factor(Data$Type)
aov.model<-aov(Strength~Type,data=Data)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Type 3 489740 163247 12.73 0.000489 ***
## Residuals 12 153908 12826
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)




Answers Part D: Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.
Answers Part E: If we look at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.
Answers Part F: If we look at the graph “Residuals vs Factor Levels” values, we can see the scatter plot.
Question 3.10
A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton content and replicates the experiment five times. The data are shown in the following table.
Part B. Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?
Reading Data
Pop1 <- c(7, 7, 15, 11, 9)
Pop2 <- c(12, 17, 12, 18, 18)
Pop3 <- c(14, 19, 19, 18, 18)
Pop4 <- c(19, 25, 22, 19, 23)
Pop5 <- c(7, 10, 11, 15, 11)
PopT <- rbind(Pop1, Pop2, Pop3, Pop4, Pop5)
GA2 <- c(mean(PopT))
e <- mean(Pop1)
f <- mean(Pop2)
g <- mean(Pop3)
h <- mean(Pop4)
k <- mean(Pop5)
SSEA <- (7-e)^2 + (7-e)^2 + (15-e)^2 + (11-e)^2 + (9-e)^2
SSEB <- (12-f)^2 + (17-f)^2 + (12-f)^2 + (18-f)^2 + (18-f)^2
SSEC <- (14-g)^2 + (19-g)^2 + (19-g)^2 + (18-g)^2 + (18-g)^2
SSED <- (19-h)^2 + (25-h)^2 + (22-h)^2 + (19-h)^2 + (23-h)^2
SSEE <- (7-k)^2 + (10-k)^2 + (11-k)^2 + (15-k)^2 + (11-k)^2
SSE <- SSEA + SSEB + SSEC + SSED + SSEE
MSE <- SSE / (20)
SSTr <- 5*((e - GA2)^2 + (f - GA2)^2 + (g - GA2)^2 + (h - GA2)^2 + (k - GA2)^2)
MSTr <- SSTr / (4)
SST <- SSE + SSTr
Statistic <- MSTr / MSE
Answer Part B: For Fisher LSD Method, first we’ll calculate LSD value, and the T Value that we’ll use to calculate LSD value will be determined by a/2 = 0.025.
T <- c(2.086)
Since number of observations in all populations are equal so ni = nj = n = 5.
LSD <- c(T*sqrt(2*MSE/5))
str(LSD)
## num 3.75
Vectors e,f,g,h,k have means of populations 1,2,3,4,5 respectively.
Population2 vs Population1, Population3 vs Population1, Population4 vs Population1, Population4 vs Population2, Population3 vs Population5, Population4 vs Population5, Population4 vs Population3, Population2 vs Population5.
D21 <- f - e
str(D21)
## num 5.6
D31 <- g - e
str(D31)
## num 7.8
D41 <- h - e
str(D41)
## num 11.8
D42 <- h - f
str(D42)
## num 6.2
D35 <- g - k
str(D35)
## num 6.8
D45 <- h - k
str(D45)
## num 10.8
D43 <- h - g
str(D43)
## num 4
D25 <- f - k
str(D25)
## num 4.6
Since D21 is greater than LSD value, so populations 2 & 1 differ.
Since D31 is greater than LSD value, so populations 3 & 1 differ.
Since D41 is greater than LSD value, so populations 4 & 1 differ.
Since D42 is greater than LSD value, so populations 4 & 2 differ.
Since D35 is greater than LSD value, so populations 3 & 5 differ.
Since D45 is greater than LSD value, so populations 4 & 5 differ.
Since D43 is greater than LSD value, so populations 4 & 3 differ.
Since D25 is greater than LSD value, so populations 2 & 5 differ.
These are all the population combinations that differ, we only tested those which differ from each other and guessing that is very simple by looking at the population mean values.
Part C. Analyze the residuals from this experiment and comment on model adequacy.
Strength <- c(7, 7, 15, 11, 9, 12, 17, 12, 18, 18, 14, 19, 19, 18, 18, 19, 25, 22, 19, 23, 7, 10, 11, 15, 11)
Cotton <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)
Data <- cbind(Strength,Cotton)
Data <- data.frame(Data)
Data$Cotton <- as.factor(Data$Cotton)
aov.model<-aov(Strength~Cotton,data=Data)
summary(aov.model)
## Df Sum Sq Mean Sq F value Pr(>F)
## Cotton 4 475.8 118.94 14.76 9.13e-06 ***
## Residuals 20 161.2 8.06
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)




Answer Part C: Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.
Further,if we look at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.
Question 3.45
Part A: How would your answer change if a reasonable estimate of the experimental error variance were Sigma^2 = 36?
Variance = 36
library(pwr)
pwr.anova.test(k=4,n=NULL,f=sqrt((5)^2/36),sig.level=0.05,power=0.90)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 6.180857
## f = 0.8333333
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
Answer Part A: Number of observations to be selected from each population are 7.
Part B: How would your answer change if a reasonable estimate of the experimental error variance were Sigma^2 = 49?
Variance = 49
library(pwr)
pwr.anova.test(k=4,n=NULL,f=sqrt((5)^2/49),sig.level=0.05,power=0.90)
##
## Balanced one-way analysis of variance power calculation
##
## k = 4
## n = 7.998751
## f = 0.7142857
## sig.level = 0.05
## power = 0.9
##
## NOTE: n is number in each group
Answer Part B: Number of observations to be selected from each population are 8.
Part C: Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of sigma affects the decision about sample size?
Answer: As the variance increases, number of samples to be collected from each population must also increase to obtain the results with same power level.
Part D: Can you make any recommendations about how we should use this general approach to choosing n in practice?
Answer Part D: In cases when an experiment has to be set up and samples have to be collected, it would be a better approach to determine the upper and lower limits of variance and then make the best possible estimate of number of samples to be collected.