6.8.

A bacteriologist is interested in the effects of two different culture media and two different times on the growth of a particular virus. He or she performs six replicates of a 22 design, making the runs in random order. Analyze the bacterial growth data that follow and draw appropriate conclusions. Analyze the residuals and comment on the model’s adequacy

time <- c(rep(12,12),rep(18,12))
medium <- c(rep(c(1,1,2,2),6))
results <- c(21,    22, 25, 26,
              23,   28, 24, 25,
              20,   26, 29, 27,
              37,   39, 31, 34,
              38,   38, 29, 33,
              35,   36, 30, 35
              )
summary(aov(results~as.factor(medium)*as.factor(time)))
##                                   Df Sum Sq Mean Sq F value   Pr(>F)    
## as.factor(medium)                  1    9.4     9.4   1.835 0.190617    
## as.factor(time)                    1  590.0   590.0 115.506 9.29e-10 ***
## as.factor(medium):as.factor(time)  1   92.0    92.0  18.018 0.000397 ***
## Residuals                         20  102.2     5.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see here that time is significant, and that the interaction effect between time and the medium is significant too. This means that we cannot ignore the medium, as it does have an effect on the model.

We then look at the residual plots to see how well this model fits the data.

par(mfrow = c(2, 2)) 
plot(aov(results~as.factor(medium)*as.factor(time)))

We can see that our variances do appear to be equal, our normal probability plots are in a straight line (implying normality) and therefore, our model is adequete and our ANOVA results valid.

6.12

An article in the AT&T Technical Journal (March/April 1986,Vol. 65, pp. 39–50) describes the application of two-level factorial designs to integrated circuit manufacturing. A basic processing step is to grow an epitaxial layer on polished silicon wafers. The wafers mounted on a susceptor are positioned inside a bell jar, and chemical vapors are introduced. The susceptor is rotated, and heat is applied until the epitaxial layer is thick enough. An experiment was run using two factors: arsenic flow rate (A) and deposition time (B). Four replicates were run, and the epitaxial layer thickness was measured ( m). The data are shown:

results <- c(14.037, 16.165, 13.972, 13.907,
            13.880, 13.860, 14.032, 13.914,
            14.821, 14.757, 14.843, 14.878,
            14.888, 14.921, 14.415, 14.932)
replication <- c(rep(seq(1,4),4))
A <-( c(rep(-1,4),rep(1,4),rep(-1,4),rep(1,4)))
B <- (c(rep(-1,8),rep(1,8)))
dafr <- data.frame(A,B,results)

(a)

Estimate the factor effects.

model<-aov(results~A*B)
coef(model)
## (Intercept)           A           B         A:B 
##   14.513875   -0.158625    0.293000    0.140750

(b)

Conduct an analysis of variance. Which factors are important?

A <- as.factor(A)
B <- as.factor(B)
summary(aov(results~A*B))
##             Df Sum Sq Mean Sq F value Pr(>F)  
## A            1  0.403  0.4026   1.262 0.2833  
## B            1  1.374  1.3736   4.305 0.0602 .
## A:B          1  0.317  0.3170   0.994 0.3386  
## Residuals   12  3.828  0.3190                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

None of these factors are important, since none of them are significant.

(c)

Write down a regression equation that could be used to predict epitaxial layer thickness over the region of arsenic flow rate and deposition time used in this experiment.

Using the results of a) to see the coefficients: \(Thickness=-0.158625A+0.293000B+0.140750AB\)

(d)

Analyze the residuals. Are there any residuals that should cause concern?

par(mfrow = c(2, 2))
plot(model)
## hat values (leverages) are all = 0.25
##  and there are no factor predictors; no plot no. 5

There is pretty clearly an outlier on the charts.

(e)

Discuss how you might deal with the potential outlier found in part (d). We could replace the outlier with an estimation of an appropriate value or remove the value from the calculations entirely.

6.21

I am always interested in improving my golf scores. Since a typical golfer uses the putter for about 35–45 percent of his or her strokes, it seems reasonable that improving one’s putting is a logical and perhaps simple way to improve a golf score (“The man who can putt is a match for any man.”— Willie Parks, 1864–1925, two time winner of the British Open). An experiment was conducted to study the effects of four factors on putting accuracy. The design factors are length of putt, type of putter, breaking putt versus straight putt, and level versus downhill putt. The response variable is distance from the ball to the center of the cup after the ball comes to rest. One golfer performs the experiment, a 24 factorial design with seven replicates was used, and all putts are made in random order. The results are shown:

dafr <- read.csv('https://raw.githubusercontent.com/vernonkat/Coursework/main/New%20Microsoft%20Excel%20Worksheet.csv',stringsAsFactors = FALSE)
head(dafr)
##   Length   Type    Break Slope  name value
## 1     10 Mallet Straight Level   One  10.0
## 2     10 Mallet Straight Level   Two  18.0
## 3     10 Mallet Straight Level Three  14.0
## 4     10 Mallet Straight Level  Four  12.5
## 5     10 Mallet Straight Level  Five  19.0
## 6     10 Mallet Straight Level   Six  16.0

(a)

Analyze the data from this experiment. Which factors significantly affect putting performance?

dafr["Length"][dafr["Length"] == 10] <- -1
dafr["Length"][dafr["Length"] == 30] <- 1

dafr["Type"][dafr["Type"] == "Cavity"] <- -1
dafr["Type"][dafr["Type"] == "Mallet"] <- 1

dafr["Break"][dafr["Break"] == "Breaking"] <- -1
dafr["Break"][dafr["Break"] == "Straight"] <- 1

dafr["Slope"][dafr["Slope"] == "Downhill"] <- -1
dafr["Slope"][dafr["Slope"] == "Level"] <- 1

dafr$Type <- as.numeric(dafr$Type)
dafr$Break <- as.numeric(dafr$Break)
dafr$Slope <- as.numeric(dafr$Slope)
dafr$Length <- as.numeric(dafr$Length)
summary(aov(value~Length*Type*Break*Slope,dafr))
##                         Df Sum Sq Mean Sq F value  Pr(>F)   
## Length                   1    917   917.1  10.588 0.00157 **
## Type                     1    388   388.1   4.481 0.03686 * 
## Break                    1    145   145.1   1.676 0.19862   
## Slope                    1      1     1.4   0.016 0.89928   
## Length:Type              1    219   218.7   2.525 0.11538   
## Length:Break             1     12    11.9   0.137 0.71178   
## Type:Break               1    115   115.0   1.328 0.25205   
## Length:Slope             1     94    93.8   1.083 0.30066   
## Type:Slope               1     56    56.4   0.651 0.42159   
## Break:Slope              1      2     1.6   0.019 0.89127   
## Length:Type:Break        1      7     7.3   0.084 0.77294   
## Length:Type:Slope        1    113   113.0   1.305 0.25623   
## Length:Break:Slope       1     39    39.5   0.456 0.50121   
## Type:Break:Slope         1     34    33.8   0.390 0.53386   
## Length:Type:Break:Slope  1     96    95.6   1.104 0.29599   
## Residuals               96   8316    86.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It would seem that Length and Type only are significant.

library(DoE.base)
halfnormal(aov(value~Length*Type*Break*Slope,dafr))

The half normal graph supports this idea, with only Length being significantly far away from the line of points.

(b)

Analyze the residuals from this experiment. Are there any indications of model inadequacy?

dafr$Type <- as.factor(dafr$Type)
dafr$Break <- as.factor(dafr$Break)
dafr$Slope <- as.factor(dafr$Slope)
dafr$Length <- as.factor(dafr$Length)
par(mfrow = c(2, 2))
plot(aov(value~Length*Type*Break*Slope,dafr))

It would seem that our plots indicate a possible breach of normality, and they do appear to have somewhat significant differences in variances. This model may inadequate.

6.36

Resistivity on a silicon wafer is influenced by several factors. The results of a 24 factorial experiment performed during a critical processing step is shown:

A <- rep(c(-1,1),8)
B <- rep(c(-1,-1,1,1),4)
C <- c(rep(-1,4),rep(1,4),rep(-1,4),rep(1,4))
D <- c(rep(-1,8),rep(1,8))
results <- c(1.92,11.28,1.09, 5.75,2.13,9.53,1.03,5.35,1.60,11.73,1.16,4.68,2.16,9.11,1.07,5.30)

(a)

Estimate the factor effects. Plot the effect estimates on a normal probability plot and select a tentative model.

halfnormal(aov(results~A*B*C*D))
## 
## Significant effects (alpha=0.05, Lenth method):
## [1] A     B     A:B   A:B:C

Given this, we can expect to see that A, A*B, and B are significant.

coef(aov(results~A*B))
## (Intercept)           A           B         A:B 
##    4.680625    3.160625   -1.501875   -1.069375

Estimated model:

\(Result=4.680625+3.160625A-1.501875B-1.069375AB\)

(b)

Fit the model identified in part (a) and analyze the residuals. Is there any indication of model inadequacy?

Afac <- as.factor(A)
Bfac <- as.factor(B)

par(mfrow = c(2, 2))
plot(aov(results~Afac*Bfac))

It’s clear that our data has both a large difference in variation and is not normally distributed.

(c)

Repeat the analysis from parts (a) and (b) using ln (y) as the response variable. Is there an indication that the transformation has been useful?

results2 <- log(results)

halfnormal(aov(results2~A*B*C*D))
## 
## Significant effects (alpha=0.05, Lenth method):
## [1] A     B     A:B:C

We still see factors A and B as significant, though the interaction between then is no longer.

par(mfrow = c(2, 2))
plot(aov(results2~Afac+Bfac))

It does appear that our ln transformation had a beneficial effect. Our model now seems accurate.

(d)

Fit a model in terms of the coded variables that can be used to predict the resistivity.

halfnormal(aov(results2~A*B*C*D))
## 
## Significant effects (alpha=0.05, Lenth method):
## [1] A     B     A:B:C

We can see that it is now just A and B that are significant.

coef(aov(results2~A*B))
## (Intercept)           A           B         A:B 
##  1.18541712  0.81287034 -0.31427755 -0.02468457

Our new model is now:

\(Result=1.18541712+0.81287034A-0.31427755B\)

6.39

results <- c(8.11,  5.56,   5.77,   5.82,   9.17,   7.8,    3.23,   5.69,   8.82,   14.23,  9.2 ,8.94,  8.68    ,11.49, 6.25,   9.12,   7.93,   5   ,7.47,  12, 9.86    ,3.65   ,6.4    ,11.61, 12.43   ,17.55  ,8.87   ,25.38  ,13.06, 18.85,  11.78,  26.05)
A <- rep(c(-1,1),16)
B <- rep(c(-1,-1,1,1),8)
C <- rep(c(rep(-1,4),rep(1,4)),4)
D <- rep(c(rep(-1,8),rep(1,8)),2)
E <- c(rep(-1,16),rep(1,16))

(a)

Analyze the data from this experiment. Identify the significant factors and interactions.

halfnormal(aov(results~A*B*C*D*E))
## 
## Significant effects (alpha=0.05, Lenth method):
##  [1] D     E     A:D   A     D:E   B:E   A:B   A:B:E A:E   A:D:E

It would seem all but C are significant.

(b)

Analyze the residuals from this experiment. Are there any indications of model inadequacy or violations of the assumptions?

Afac <- as.factor(A)
Bfac <- as.factor(B)
Efac <- as.factor(E)
Dfac <- as.factor(D)
par(mfrow = c(2, 2))
plot(aov(results~Afac*Bfac*Efac*Dfac))

Our residuals seem fine, and no violations of normality.

(c)

One of the factors from this experiment does not seem to be important. If you drop this factor, what type of design remains? Analyze the data using the full factorial model for only the four active factors. Compare your results with those obtained in part (a).

This looks like we will have A,D,E & B, plus most of their interaction effect significant. We will now perform the anova test using those factors to see what is significant,

summary(aov(results~A*B*E*D))
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  83.56   83.56  57.233 1.14e-06 ***
## B            1   0.06    0.06   0.041 0.841418    
## E            1 153.17  153.17 104.910 1.97e-08 ***
## D            1 285.78  285.78 195.742 2.16e-10 ***
## A:B          1  48.93   48.93  33.514 2.77e-05 ***
## A:E          1  33.76   33.76  23.126 0.000193 ***
## B:E          1  52.71   52.71  36.103 1.82e-05 ***
## A:D          1  88.88   88.88  60.875 7.66e-07 ***
## B:D          1   0.01    0.01   0.004 0.950618    
## E:D          1  61.80   61.80  42.328 7.24e-06 ***
## A:B:E        1  44.96   44.96  30.794 4.40e-05 ***
## A:B:D        1   3.82    3.82   2.613 0.125501    
## A:E:D        1  26.01   26.01  17.815 0.000650 ***
## B:E:D        1   0.05    0.05   0.035 0.854935    
## A:B:E:D      1   5.31    5.31   3.634 0.074735 .  
## Residuals   16  23.36    1.46                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It seems like our significant factors (at .05 level) are: A, E, D, A:B, A:E, B:E, A:D, E:D, A:B:E, and A:E:D

(d)

Find settings of the active factors that maximize the predicted response.

coef(aov(results~A*B*E*D))
## (Intercept)           A           B           E           D         A:B 
##  10.1803125   1.6159375   0.0434375   2.1878125   2.9884375   1.2365625 
##         A:E         B:E         A:D         B:D         E:D       A:B:E 
##   1.0271875   1.2834375   1.6665625  -0.0134375   1.3896875   1.1853125 
##       A:B:D       A:E:D       B:E:D     A:B:E:D 
##  -0.3453125   0.9015625  -0.0396875   0.4071875

Our recommended equation is:

\(results=10.1803125+1.61A+0.04B+2.185E+2.98D+1.23AB+1.02AE+1.28BE+1.66AD+1.38ED+1.185ABE+0.901AED\)

Given that all of these effect are positive, to achieve the highest result we should set all the factors to the high level (+1) and none to the low (-1)