Question 3.7

The tensile strength of Portland cement is being studied. Four different mixing techniques can be used economically. A completely randomized experiment was conducted and the data were collected.

Part C. Use the Fisher LSD method with a=0.05 to make comparisons between pairs of means.

Reading Data

Pop1 <- c(3129, 3000, 2865, 2890)
Pop2 <- c(3200, 3300, 2975, 3150)
Pop3 <- c(2800, 2900, 2985, 3050)
Pop4 <- c(2600, 2700, 2600, 2765)
Pop <- rbind(Pop1, Pop2, Pop3, Pop4)

GA <- c(mean(Pop))

a <- mean(Pop1)
b <- mean(Pop2)
c <- mean(Pop3)
d <- mean(Pop4)

SSE1 <- (3129-a)^2 + (3000-a)^2 + (2865-a)^2 + (2890-a)^2
SSE2 <- (3200-b)^2 + (3300-b)^2 + (2975-b)^2 + (3150-b)^2
SSE3 <- (2800-c)^2 + (2900-c)^2 + (2985-c)^2 + (3050-c)^2
SSE4 <- (2600-d)^2 + (2700-d)^2 + (2600-d)^2 + (2765-d)^2

SSE <- SSE1 + SSE2 + SSE3 + SSE4

MSE <- SSE / (12)

SSTr <- 4*((a - GA)^2 + (b - GA)^2 + (c - GA)^2 + (d - GA)^2)

MSTr <- SSTr / (3)

SST <- SSE + SSTr

Statistic <- MSTr / MSE

Answer Part C: For Fisher LSD Method, first we’ll calculate LSD value, and the T Value that we’ll use to calculate LSD value will be determined by a/2 = 0.025.

T <- c(2.179)

Since number of observations in all populations are equal so ni = nj = n = 4.

LSD <- c(T*sqrt(2*MSE/4))

str(LSD)
##  num 174

Vectors a,b,c,d have means of populations 1,2,3,4 respectively.

Population2 vs Population1, Population2 vs Population3, Population2 vs Population4, Population1 vs Population4, Population3 vs Population4.

D21 <- b - a

str(D21)
##  num 185
D23 <- b - c

str(D23)
##  num 222
D24 <- b - d

str(D24)
##  num 490
D14 <- a - d

str(D14)
##  num 305
D34 <- c - d

str(D34)
##  num 268

Since D21 is greater than LSD value, so populations 2 & 1 differ.

Since D23 is greater than LSD value, so populations 2 & 3 differ.

Since D24 is greater than LSD value, so populations 2 & 4 differ.

Since D14 is greater than LSD value, so populations 1 & 4 differ.

Since D34 is greater than LSD value, so populations 3 & 4 differ.

These are all the population combinations that differ, we only tested those which differ from each other and guessing that is very simple by looking at the population mean values.

Part D: Construct a normal probability plot of the residuals. What conclusion would you draw about the validity of the normality assumption?

Part E: Plot the residuals versus the predicted tensile strength.Comment on the plot.

Part F: Prepare a scatter plot of the results to aid the interpretation of the results of this experiment.

Strength <- c(3129, 3000, 2865, 2890, 3200, 3300, 2975, 3150, 2800, 2900, 2985, 3050, 2600, 2700, 2600, 2765)
Type <- c(1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4)
Data <- cbind(Strength,Type)
Data <- data.frame(Data)
Data$Type <- as.factor(Data$Type)
aov.model<-aov(Strength~Type,data=Data)
summary(aov.model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Type         3 489740  163247   12.73 0.000489 ***
## Residuals   12 153908   12826                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Answers Part D: Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.

Answers Part E: If we look at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.

Answers Part F: If we look at the graph “Residuals vs Factor Levels” values, we can see the scatter plot.

Question 3.10

A product developer is investigating the tensile strength of a new synthetic fiber that will be used to make cloth for men’s shirts. Strength is usually affected by the percentage of cotton used in the blend of materials for the fiber. The engineer conducts a completely randomized experiment with five levels of cotton content and replicates the experiment five times. The data are shown in the following table.

Part B. Use the Fisher LSD method to make comparisons between the pairs of means. What conclusions can you draw?

Reading Data

Pop1 <- c(7, 7, 15, 11, 9)
Pop2 <- c(12, 17, 12, 18, 18)
Pop3 <- c(14, 19, 19, 18, 18)
Pop4 <- c(19, 25, 22, 19, 23)
Pop5 <- c(7, 10, 11, 15, 11)

PopT <- rbind(Pop1, Pop2, Pop3, Pop4, Pop5)

GA2 <- c(mean(PopT))

e <- mean(Pop1)
f <- mean(Pop2)
g <- mean(Pop3)
h <- mean(Pop4)
k <- mean(Pop5)
SSEA <- (7-e)^2 + (7-e)^2 + (15-e)^2 + (11-e)^2 + (9-e)^2
SSEB <- (12-f)^2 + (17-f)^2 + (12-f)^2 + (18-f)^2 + (18-f)^2
SSEC <- (14-g)^2 + (19-g)^2 + (19-g)^2 + (18-g)^2 + (18-g)^2
SSED <- (19-h)^2 + (25-h)^2 + (22-h)^2 + (19-h)^2 + (23-h)^2
SSEE <- (7-k)^2 + (10-k)^2 + (11-k)^2 + (15-k)^2 + (11-k)^2

SSE <- SSEA + SSEB + SSEC + SSED + SSEE

MSE <- SSE / (20)

SSTr <- 5*((e - GA2)^2 + (f - GA2)^2 + (g - GA2)^2 + (h - GA2)^2 + (k - GA2)^2)

MSTr <- SSTr / (4)

SST <- SSE + SSTr

Statistic <- MSTr / MSE

Answer Part B: For Fisher LSD Method, first we’ll calculate LSD value, and the T Value that we’ll use to calculate LSD value will be determined by a/2 = 0.025.

T <- c(2.086)

Since number of observations in all populations are equal so ni = nj = n = 5.

LSD <- c(T*sqrt(2*MSE/5))

str(LSD)
##  num 3.75

Vectors e,f,g,h,k have means of populations 1,2,3,4,5 respectively.

Population2 vs Population1, Population3 vs Population1, Population4 vs Population1, Population4 vs Population2, Population3 vs Population5, Population4 vs Population5, Population4 vs Population3, Population2 vs Population5.

D21 <- f - e

str(D21)
##  num 5.6
D31 <- g - e

str(D31)
##  num 7.8
D41 <- h - e

str(D41)
##  num 11.8
D42 <- h - f

str(D42)
##  num 6.2
D35 <- g - k

str(D35)
##  num 6.8
D45 <- h - k

str(D45)
##  num 10.8
D43 <- h - g

str(D43)
##  num 4
D25 <- f - k

str(D25)
##  num 4.6

Since D21 is greater than LSD value, so populations 2 & 1 differ.

Since D31 is greater than LSD value, so populations 3 & 1 differ.

Since D41 is greater than LSD value, so populations 4 & 1 differ.

Since D42 is greater than LSD value, so populations 4 & 2 differ.

Since D35 is greater than LSD value, so populations 3 & 5 differ.

Since D45 is greater than LSD value, so populations 4 & 5 differ.

Since D43 is greater than LSD value, so populations 4 & 3 differ.

Since D25 is greater than LSD value, so populations 2 & 5 differ.

These are all the population combinations that differ, we only tested those which differ from each other and guessing that is very simple by looking at the population mean values.

Part C. Analyze the residuals from this experiment and comment on model adequacy.

Strength <- c(7, 7, 15, 11, 9, 12, 17, 12, 18, 18, 14, 19, 19, 18, 18, 19, 25, 22, 19, 23, 7, 10, 11, 15, 11)
Cotton <- c(1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4,5,5,5,5,5)
Data <- cbind(Strength,Cotton)
Data <- data.frame(Data)
Data$Cotton <- as.factor(Data$Cotton)
aov.model<-aov(Strength~Cotton,data=Data)
summary(aov.model)
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## Cotton       4  475.8  118.94   14.76 9.13e-06 ***
## Residuals   20  161.2    8.06                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov.model)

Answer Part C: Normal probability plot of residuals is very close to normality as almost data lies in a single line and thus assumption of normality holds.

Further,if we look at the graph “Residuals vs Fitted” values, we can see that the spread of residuals is not that far off and thus assumption of constant variance holds.

Question 3.44

How many observations should be taken from each population so that the probability of rejecting the null hypothesis of equal population means is at least 0.90?

Answer: For an average difference in means between all four populations = 5.

Significance Level a = 0.05

Variance = 25

Power = 90%

Number of populations = k = 4

library(pwr)
pwr.anova.test(k=4,n=NULL,f=sqrt((5)^2/25),sig.level=0.05,power=0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 4.658119
##               f = 1
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Answer: Number of observations to be selected from each population are 5.

Question 3.45

Part A: How would your answer change if a reasonable estimate of the experimental error variance were Sigma^2 = 36?

Variance = 36

library(pwr)
pwr.anova.test(k=4,n=NULL,f=sqrt((5)^2/36),sig.level=0.05,power=0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 6.180857
##               f = 0.8333333
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Answer Part A: Number of observations to be selected from each population are 7.

Part B: How would your answer change if a reasonable estimate of the experimental error variance were Sigma^2 = 49?

Variance = 49

library(pwr)
pwr.anova.test(k=4,n=NULL,f=sqrt((5)^2/49),sig.level=0.05,power=0.90)
## 
##      Balanced one-way analysis of variance power calculation 
## 
##               k = 4
##               n = 7.998751
##               f = 0.7142857
##       sig.level = 0.05
##           power = 0.9
## 
## NOTE: n is number in each group

Answer Part B: Number of observations to be selected from each population are 8.

Part C: Can you draw any conclusions about the sensitivity of your answer in this particular situation about how your estimate of sigma affects the decision about sample size?

Answer: As the variance increases, number of samples to be collected from each population must also increase to obtain the results with same power level.

Part D: Can you make any recommendations about how we should use this general approach to choosing n in practice?

Answer Part D: In cases when an experiment has to be set up and samples have to be collected, it would be a better approach to determine the upper and lower limits of variance and then make the best possible estimate of number of samples to be collected.