Question 6.8

A bacteriologist is interested in the effects of two different culture media and two different times on the growth of a particular virus. He or she performs six replicates of a 22 design, making the runs in random order. Analyze the bacterial growth data that follow and draw appropriate conclusions. Analyze the residuals and comment on the model’s adequacy.

This is a 2^2 Factorial Design, where low value of a is ao = 12 and high value of a is a1 = 18. While low value of b is bo = 1 and high value of b is b1 = 2.

Model Equation:

\(Y_{i,j,k}\) = \(\mu\) + \(\alpha_i\) + \(\beta_j\) + \(\alpha\beta_{i,j}\) + \(\epsilon_{i,j,k}\)

Loading Data

library(DoE.base)
A <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1)
B <- c(-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1)
Obs <- c(21,22,23,28,20,26,25,26,24,25,29,27,37,39,38,38,35,36,31,34,29,33,30,35)
A <- as.factor(A)
B <- as.factor(B)
Data <- data.frame(A,B,Obs)

Model <- aov(Obs~A*B,data = Data)
summary(Model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  590.0   590.0 115.506 9.29e-10 ***
## B            1    9.4     9.4   1.835 0.190617    
## A:B          1   92.0    92.0  18.018 0.000397 ***
## Residuals   20  102.2     5.1                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Answer: From the ANOVA analysis, we can conclude that interaction of Factor A & B is significant. Therefore, we’ll stop here and we will look at the interaction plot.

Interaction Plot:

interaction.plot(A,B,Obs)

Residual Analysis:

plot(Model)

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the data is normally distributed. Evaluating constant variance assumption, we can see that varaince spread is slightly off and thus we’ll consider the model inadequate.

Question 6.12

An article in the AT&T Technical Journal (March/April 1986, Vol. 65, pp. 39–50) describes the application of two-level factorial designs to integrated circuit manufacturing. A basic processing step is to grow an epitaxial layer on polished silicon wafers. The wafers mounted on a susceptor are positioned inside a bell jar, and chemical vapors are introduced. The susceptor is rotated, and heat is applied until the epitaxial layer is thick enough. An experiment was run using two factors: arsenic flow rate (A) and deposition time (B). Four replicates were run, and the epitaxial layer thickness was measured ( m). The data are shown in Table P6.1.

Model Equation:

\(Y_{i,j,k}\) = \(\mu\) + \(\alpha_i\) + \(\beta_j\) + \(\alpha\beta_{i,j}\) + \(\epsilon_{i,j,k}\)

Loading Data

library(DoE.base)
A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
Obs <- c(14.037,13.880,14.821,14.888,16.165,13.860,14.757,14.921,13.972,14.032,14.843,14.415,13.907,13.914,14.878,14.932)
A <- as.factor(A)
B <- as.factor(B)
Data <- data.frame(A,B,Obs)

Part A

Estimate the factor effects.

One <- c(14.037,16.165,13.972,13.907)
A <- c(13.88,13.86,14.032,13.914)
B <- c(14.821,14.757,14.843,14.878)
AB <- c(14.888,14.921,14.415,14.932)

S1 <- sum(One)
SA <- sum(A)
SB <- sum(B)
SAB <- sum(AB)

EffectA <- (2*(SA+SAB-S1-SB)/(4*4))
EffectB <- (2*(SB+SAB-S1-SA)/(4*4))
EffectAB <- (2*(SA+SB-S1-SAB)/(4*4))

Factor Effects:

EffectA

## [1] -0.31725

EffectB

## [1] 0.586

EffectAB

## [1] -0.2815

Effect of A: -0.31725

Effect of B: 0.586

Effect of AB: -0.2815

Part B

Conduct an analysis of variance. Which factors are important?

Model <- aov(Obs~A*B,data = Data)
summary(Model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## A            1  0.403  0.4026   1.262 0.2833  
## B            1  1.374  1.3736   4.305 0.0602 .
## A:B          1  0.317  0.3170   0.994 0.3386  
## Residuals   12  3.828  0.3190                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the results of ANOVA analysis, we can conclude that interaction between factors A & B is insignificant. Thus removing interaction effect and testing for main effects.

Model <- aov(Obs~A+B,data = Data)
summary(Model)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## A            1  0.403  0.4026   1.263 0.2815  
## B            1  1.374  1.3736   4.308 0.0584 .
## Residuals   13  4.145  0.3189                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As evident from the ANOVA analysis, our main effects are also insignificant.

Part C

Write down a regression equation that could be used to predict epitaxial layer thickness over the region of arsenic flow rate and deposition time used in this experiment.

Model <- lm(Obs~A*B,data = Data)
coef(Model)

## (Intercept)          A1          B1       A1:B1 
##    14.52025    -0.59875     0.30450     0.56300

summary(Model)

## 
## Call:
## lm.default(formula = Obs ~ A * B, data = Data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.61325 -0.14431 -0.00563  0.10188  1.64475 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  14.5202     0.2824  51.414 1.93e-15 ***
## A1           -0.5987     0.3994  -1.499    0.160    
## B1            0.3045     0.3994   0.762    0.461    
## A1:B1         0.5630     0.5648   0.997    0.339    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5648 on 12 degrees of freedom
## Multiple R-squared:  0.3535, Adjusted R-squared:  0.1918 
## F-statistic: 2.187 on 3 and 12 DF,  p-value: 0.1425

\(Y_{i,j,k}\) = 14.52025 - 0.59875\(\alpha_i\) + 0.30450\(\beta_j\) + 0.56300\(\alpha\beta\gamma_{i,j}\) + \(\epsilon_{i,j,k}\)

Part D

Analyze the residuals. Are there any residuals that should cause concern?

plot(Model)

Plotting the residuals, we can visualize that the model is inadequate and plots display that the data point 5 is an outlier which may be a point of concern for our analysis.

Part E

Discuss how you might deal with the potential outlier found in part (d).

We can perform a BoxCox transformation on the data and find out the appropriate value of lambda and then perform the ANOVA analysis on the transformed data.

Question 6.21

Loading Data

library(DoE.base)
A <- c(-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1)
B <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
C <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
Obs <- c(10,18,14,12.5,19,16,18.5,0,16.5,4.5,17.5,20.5,17.5,33,4,6,1,14.5,12,14,5,0,10,34,11,25.5,21.5,0,0,0,18.5,19.5,16,15,11,5,20.5,18,20,29.5,19,10,6.5,18.5,7.5,6,0,10,0,16.5,4.5,0,23.5,8,8,8,4.5,18,14.5,10,0,17.5,6,19.5,18,16,5.5,10,7,36,15,16,8.5,0,0.5,9,3,41.5,39,6.5,3.5,7,8.5,36,8,4.5,6.5,10,13,41,14,21.5,10.5,6.5,0,15.5,24,16,0,0,0,4.5,1,4,6.5,18,5,7,10,32.5,18.5,8)
A <- as.factor(A)
B <- as.factor(B)
C <- as.factor(C)
D <- as.factor(D)
Data <- data.frame(A,B,C,D,Obs)

Part A

Analyze the data from this experiment. Which factors significantly affect putting performance?

Model <- aov(Obs~A*B*C*D,data = Data)
summary(Model)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## A            1    917   917.1  10.588 0.00157 **
## B            1    388   388.1   4.481 0.03686 * 
## C            1    145   145.1   1.676 0.19862   
## D            1      1     1.4   0.016 0.89928   
## A:B          1    219   218.7   2.525 0.11538   
## A:C          1     12    11.9   0.137 0.71178   
## B:C          1    115   115.0   1.328 0.25205   
## A:D          1     94    93.8   1.083 0.30066   
## B:D          1     56    56.4   0.651 0.42159   
## C:D          1      2     1.6   0.019 0.89127   
## A:B:C        1      7     7.3   0.084 0.77294   
## A:B:D        1    113   113.0   1.305 0.25623   
## A:C:D        1     39    39.5   0.456 0.50121   
## B:C:D        1     34    33.8   0.390 0.53386   
## A:B:C:D      1     96    95.6   1.104 0.29599   
## Residuals   96   8316    86.6                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

At a significance level 0f 0.05, only main effects of factors A & B, aka length of putt and type of putter significantly affect putting performance.

Part B

Analyze the residuals from this experiment. Are there any indications of model inadequacy?

plot(Model)

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the model is inadequate. Though the data satisfies the normality assumption, there’s a wide spread in variance of data.

Question 6.36

Resistivity on a silicon wafer is influenced by several factors. The results of a 24 factorial experiment performed during a critical processing step is shown in Table P6.10.

Loading Data

library(DoE.base)
A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C <- c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
Obs <- c(1.92,11.28,1.09,5.75,2.13,9.53,1.03,5.35,1.6,11.73,1.16,4.68,2.16,9.11,1.07,5.3)
Data <- data.frame(A,B,C,D,Obs)

Part A

Estimate the factor effects. Plot the effect estimates on a normal probability plot and select a tentative model.

Model <- lm(Obs~A*B*C*D,data = Data)
coef(Model)

## (Intercept)           A           B           C           D         A:B 
##    4.680625    3.160625   -1.501875   -0.220625   -0.079375   -1.069375 
##         A:C         B:C         A:D         B:D         C:D       A:B:C 
##   -0.298125    0.229375   -0.056875   -0.046875    0.029375    0.344375 
##       A:B:D       A:C:D       B:C:D     A:B:C:D 
##   -0.096875   -0.010625    0.094375    0.141875

halfnormal(Model)

## 
## Significant effects (alpha=0.05, Lenth method):

## [1] A     B     A:B   A:B:C

summary(Model)

## 
## Call:
## lm.default(formula = Obs ~ A * B * C * D, data = Data)
## 
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  4.68062        NaN     NaN      NaN
## A            3.16062        NaN     NaN      NaN
## B           -1.50187        NaN     NaN      NaN
## C           -0.22062        NaN     NaN      NaN
## D           -0.07937        NaN     NaN      NaN
## A:B         -1.06938        NaN     NaN      NaN
## A:C         -0.29812        NaN     NaN      NaN
## B:C          0.22937        NaN     NaN      NaN
## A:D         -0.05687        NaN     NaN      NaN
## B:D         -0.04688        NaN     NaN      NaN
## C:D          0.02937        NaN     NaN      NaN
## A:B:C        0.34437        NaN     NaN      NaN
## A:B:D       -0.09688        NaN     NaN      NaN
## A:C:D       -0.01063        NaN     NaN      NaN
## B:C:D        0.09438        NaN     NaN      NaN
## A:B:C:D      0.14188        NaN     NaN      NaN
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 15 and 0 DF,  p-value: NA

Factor effects are visible in summary of the model. From the half normal plot of the model we can observe that Factors A,B,AB are significant. We’ll ignore the interaction effect ABC since it doesn’t differ considerably from the distribution. Therefore, we’ll select a tentative model comprising of these main and interaction effects and we’ll drop the rest from our model.

\(Y_{i,j,k}\) = 4.680625 + 3.160625\(\alpha_i\) - 1.501875\(\beta_j\) - 1.069375\(\alpha\beta{i,j}\) + \(\epsilon_{i,j,k}\)

Part B

Fit the model identified in part (a) and analyze the residuals. Is there any indication of model inadequacy?

Model2 <- aov(Obs~A+B+C+A*B+A*B*C,data = Data)
summary(Model2)

##             Df Sum Sq Mean Sq  F value   Pr(>F)    
## A            1 159.83  159.83 1563.061 1.84e-10 ***
## B            1  36.09   36.09  352.937 6.66e-08 ***
## C            1   0.78    0.78    7.616  0.02468 *  
## A:B          1  18.30   18.30  178.933 9.33e-07 ***
## A:C          1   1.42    1.42   13.907  0.00579 ** 
## B:C          1   0.84    0.84    8.232  0.02085 *  
## A:B:C        1   1.90    1.90   18.556  0.00259 ** 
## Residuals    8   0.82    0.10                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(Model2)

## hat values (leverages) are all = 0.5
##  and there are no factor predictors; no plot no. 5

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the model is inadequate. We can visualize that the data is neither normally distributed nor the variance can be characterized constant.

Further the ANOVA analysis presents that all factors and interactions A,B,C,AB,AC,BC & ABC are significant at a significance level of 0.05.

Part C

Repeat the analysis from parts (a) and (b) using ln (y) as the response variable. Is there an indication that the transformation has been useful?

logobs <- log(Obs)

Data2 <- data.frame(A,B,C,D,logobs)

Model3 <- lm(logobs~A*B*C*D,data = Data2)
coef(Model3)

##  (Intercept)            A            B            C            D          A:B 
##  1.185417116  0.812870345 -0.314277554 -0.006408558 -0.018077390 -0.024684570 
##          A:C          B:C          A:D          B:D          C:D        A:B:C 
## -0.039723700 -0.004225796 -0.009578245  0.003708723  0.017780432  0.063434408 
##        A:B:D        A:C:D        B:C:D      A:B:C:D 
## -0.029875960 -0.003740235  0.003765760  0.031322043

halfnormal(Model3)

## 
## Significant effects (alpha=0.05, Lenth method):

## [1] A     B     A:B:C

summary(Model3)

## 
## Call:
## lm.default(formula = logobs ~ A * B * C * D, data = Data2)
## 
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  1.185417        NaN     NaN      NaN
## A            0.812870        NaN     NaN      NaN
## B           -0.314278        NaN     NaN      NaN
## C           -0.006409        NaN     NaN      NaN
## D           -0.018077        NaN     NaN      NaN
## A:B         -0.024685        NaN     NaN      NaN
## A:C         -0.039724        NaN     NaN      NaN
## B:C         -0.004226        NaN     NaN      NaN
## A:D         -0.009578        NaN     NaN      NaN
## B:D          0.003709        NaN     NaN      NaN
## C:D          0.017780        NaN     NaN      NaN
## A:B:C        0.063434        NaN     NaN      NaN
## A:B:D       -0.029876        NaN     NaN      NaN
## A:C:D       -0.003740        NaN     NaN      NaN
## B:C:D        0.003766        NaN     NaN      NaN
## A:B:C:D      0.031322        NaN     NaN      NaN
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 15 and 0 DF,  p-value: NA

Model4 <- aov(logobs~A+B+C+A*B*C,data = Data2)
summary(Model4)

##             Df Sum Sq Mean Sq  F value   Pr(>F)    
## A            1 10.572  10.572 1994.556 6.98e-11 ***
## B            1  1.580   1.580  298.147 1.29e-07 ***
## C            1  0.001   0.001    0.124  0.73386    
## A:B          1  0.010   0.010    1.839  0.21207    
## A:C          1  0.025   0.025    4.763  0.06063 .  
## B:C          1  0.000   0.000    0.054  0.82223    
## A:B:C        1  0.064   0.064   12.147  0.00826 ** 
## Residuals    8  0.042   0.005                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(Model4)

## hat values (leverages) are all = 0.5
##  and there are no factor predictors; no plot no. 5

After the log transformation, we can observe that the halfnormal plot only characterizes factors A, B and ABC as significant. Interaction effect AB has been deemed insignificant as per halfnormal plot after transformation. ANOVA analysis also presents factors A, B and ABC as significant.

After the log transformation, we can observe that data is now almost falling in a straight line and spread has also became better. It hasn’t improved considerable but variance has been stabilized to some extent and thus we can conclude that transformation has been useful.

Part D

Fit a model in terms of the coded variables that can be used to predict the resistivity

Based on the transformed data:

\(Y_{i,j,k,l}\) = 1.185417 + 0.812870345\(\alpha_i\) - 0.314277554\(\beta_j\) + \(\epsilon_{i,j,k,l}\)

Question 6.39

An article in Quality and Reliability Engineering International (2010, Vol. 26, pp. 223–233) presents a 25 factorial design. The experiment is shown in Table P6.12.

Loading Data

library(DoE.base)
A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C <- c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
E <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
Obs <- c(8.11,5.56,5.77,5.82,9.17,7.8,3.23,5.69,8.82,14.23,9.2,8.94,8.68,11.49,6.25,9.12,7.93,5,7.47,12,9.86,3.65,6.4,11.61,12.43,17.55,8.87,25.38,13.06,18.85,11.78,26.05)
Data <- data.frame(A,B,C,D,E,Obs)

Part A

Model <- lm(Obs~A*B*C*D*E,data = Data)
coef(Model)

## (Intercept)           A           B           C           D           E 
##  10.1803125   1.6159375   0.0434375  -0.0121875   2.9884375   2.1878125 
##         A:B         A:C         B:C         A:D         B:D         C:D 
##   1.2365625  -0.0015625  -0.1953125   1.6665625  -0.0134375   0.0034375 
##         A:E         B:E         C:E         D:E       A:B:C       A:B:D 
##   1.0271875   1.2834375   0.3015625   1.3896875   0.2503125  -0.3453125 
##       A:C:D       B:C:D       A:B:E       A:C:E       B:C:E       A:D:E 
##  -0.0634375   0.3053125   1.1853125  -0.2590625   0.1709375   0.9015625 
##       B:D:E       C:D:E     A:B:C:D     A:B:C:E     A:B:D:E     A:C:D:E 
##  -0.0396875   0.3959375  -0.0740625  -0.1846875   0.4071875   0.1278125 
##     B:C:D:E   A:B:C:D:E 
##  -0.0746875  -0.3553125

halfnormal(Model)

## 
## Significant effects (alpha=0.05, Lenth method):

##  [1] D     E     A:D   A     D:E   B:E   A:B   A:B:E A:E   A:D:E

summary(Model)

## 
## Call:
## lm.default(formula = Obs ~ A * B * C * D * E, data = Data)
## 
## Residuals:
## ALL 32 residuals are 0: no residual degrees of freedom!
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.180312        NaN     NaN      NaN
## A            1.615938        NaN     NaN      NaN
## B            0.043438        NaN     NaN      NaN
## C           -0.012187        NaN     NaN      NaN
## D            2.988437        NaN     NaN      NaN
## E            2.187813        NaN     NaN      NaN
## A:B          1.236562        NaN     NaN      NaN
## A:C         -0.001563        NaN     NaN      NaN
## B:C         -0.195313        NaN     NaN      NaN
## A:D          1.666563        NaN     NaN      NaN
## B:D         -0.013438        NaN     NaN      NaN
## C:D          0.003437        NaN     NaN      NaN
## A:E          1.027188        NaN     NaN      NaN
## B:E          1.283437        NaN     NaN      NaN
## C:E          0.301563        NaN     NaN      NaN
## D:E          1.389687        NaN     NaN      NaN
## A:B:C        0.250313        NaN     NaN      NaN
## A:B:D       -0.345312        NaN     NaN      NaN
## A:C:D       -0.063437        NaN     NaN      NaN
## B:C:D        0.305312        NaN     NaN      NaN
## A:B:E        1.185313        NaN     NaN      NaN
## A:C:E       -0.259062        NaN     NaN      NaN
## B:C:E        0.170938        NaN     NaN      NaN
## A:D:E        0.901563        NaN     NaN      NaN
## B:D:E       -0.039687        NaN     NaN      NaN
## C:D:E        0.395938        NaN     NaN      NaN
## A:B:C:D     -0.074063        NaN     NaN      NaN
## A:B:C:E     -0.184688        NaN     NaN      NaN
## A:B:D:E      0.407187        NaN     NaN      NaN
## A:C:D:E      0.127812        NaN     NaN      NaN
## B:C:D:E     -0.074688        NaN     NaN      NaN
## A:B:C:D:E   -0.355312        NaN     NaN      NaN
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 31 and 0 DF,  p-value: NA

Model2 <- aov(Obs~A+B+D+E+A*B+A*D+A*E+B*E+D*E+A*B*E+A*D*E,data = Data)
summary(Model2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  83.56   83.56  51.362 6.10e-07 ***
## B            1   0.06    0.06   0.037 0.849178    
## D            1 285.78  285.78 175.664 2.30e-11 ***
## E            1 153.17  153.17  94.149 5.24e-09 ***
## A:B          1  48.93   48.93  30.076 2.28e-05 ***
## A:D          1  88.88   88.88  54.631 3.87e-07 ***
## A:E          1  33.76   33.76  20.754 0.000192 ***
## B:E          1  52.71   52.71  32.400 1.43e-05 ***
## D:E          1  61.80   61.80  37.986 5.07e-06 ***
## A:B:E        1  44.96   44.96  27.635 3.82e-05 ***
## A:D:E        1  26.01   26.01  15.988 0.000706 ***
## Residuals   20  32.54    1.63                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Half normal plot displays factors A,D,E,A:D,D:E,B:E,A:B,A:E,A:B:E,A:D:E as significant. ANOVA analysis also presents the factors A,D,E,AB,AD,AE,BE,DE,ABE,ADE as significant.

Part B

Analyze the residuals from this experiment. Are there any indications of model inadequacy or violations of the assumptions?

plot(Model2)

## hat values (leverages) are all = 0.375
##  and there are no factor predictors; no plot no. 5

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can conclude that the model is inadequate. Though the data satisfies the normality assumption, there’s a wide spread in variance of data.

Part C

One of the factors from this experiment does not seem to be important. If you drop this factor, what type of design remains? Analyze the data using the full factorial model for only the four active factors. Compare your results with those obtained in part (a).

A <- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B <- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
D <- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
E <- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
Obs <- c(8.11,5.56,5.77,5.82,9.17,7.8,3.23,5.69,8.82,14.23,9.2,8.94,8.68,11.49,6.25,9.12,7.93,5,7.47,12,9.86,3.65,6.4,11.61,12.43,17.55,8.87,25.38,13.06,18.85,11.78,26.05)
Data <- data.frame(A,B,D,E,Obs)

Model <- lm(Obs~A*B*D*E,data = Data)
coef(Model)

## (Intercept)           A           B           D           E         A:B 
##  10.1803125   1.6159375   0.0434375   2.9884375   2.1878125   1.2365625 
##         A:D         B:D         A:E         B:E         D:E       A:B:D 
##   1.6665625  -0.0134375   1.0271875   1.2834375   1.3896875  -0.3453125 
##       A:B:E       A:D:E       B:D:E     A:B:D:E 
##   1.1853125   0.9015625  -0.0396875   0.4071875

halfnormal(Model)

## 
## Significant effects (alpha=0.05, Lenth method):

##  [1] D     E     A:D   A     D:E   B:E   A:B   A:B:E A:E   A:D:E e10

summary(Model)

## 
## Call:
## lm.default(formula = Obs ~ A * B * D * E, data = Data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.4750 -0.5637  0.0000  0.5637  1.4750 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 10.18031    0.21360  47.661  < 2e-16 ***
## A            1.61594    0.21360   7.565 1.14e-06 ***
## B            0.04344    0.21360   0.203 0.841418    
## D            2.98844    0.21360  13.991 2.16e-10 ***
## E            2.18781    0.21360  10.243 1.97e-08 ***
## A:B          1.23656    0.21360   5.789 2.77e-05 ***
## A:D          1.66656    0.21360   7.802 7.66e-07 ***
## B:D         -0.01344    0.21360  -0.063 0.950618    
## A:E          1.02719    0.21360   4.809 0.000193 ***
## B:E          1.28344    0.21360   6.009 1.82e-05 ***
## D:E          1.38969    0.21360   6.506 7.24e-06 ***
## A:B:D       -0.34531    0.21360  -1.617 0.125501    
## A:B:E        1.18531    0.21360   5.549 4.40e-05 ***
## A:D:E        0.90156    0.21360   4.221 0.000650 ***
## B:D:E       -0.03969    0.21360  -0.186 0.854935    
## A:B:D:E      0.40719    0.21360   1.906 0.074735 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.208 on 16 degrees of freedom
## Multiple R-squared:  0.9744, Adjusted R-squared:  0.9504 
## F-statistic: 40.58 on 15 and 16 DF,  p-value: 7.07e-10

Model2 <- aov(Obs~A+B+D+E+A*B+A*D+A*E+B*E+D*E+A*B*E+A*D*E,data = Data)
summary(Model2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  83.56   83.56  51.362 6.10e-07 ***
## B            1   0.06    0.06   0.037 0.849178    
## D            1 285.78  285.78 175.664 2.30e-11 ***
## E            1 153.17  153.17  94.149 5.24e-09 ***
## A:B          1  48.93   48.93  30.076 2.28e-05 ***
## A:D          1  88.88   88.88  54.631 3.87e-07 ***
## A:E          1  33.76   33.76  20.754 0.000192 ***
## B:E          1  52.71   52.71  32.400 1.43e-05 ***
## D:E          1  61.80   61.80  37.986 5.07e-06 ***
## A:B:E        1  44.96   44.96  27.635 3.82e-05 ***
## A:D:E        1  26.01   26.01  15.988 0.000706 ***
## Residuals   20  32.54    1.63                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

plot(Model2)

## hat values (leverages) are all = 0.375
##  and there are no factor predictors; no plot no. 5

Since factor C was insignificant and thus didn’t seem important, so dropped it. Even after dropping factor C and analyzing the data with four active factors still the results are same as in part A. ANOVA analysis in this case too presents the factors A,D,E,AB,AD,AE,BE,DE,ABE,ADE as significant.

Homework Week 11

Muneeb

11/10/2021

Question 6.8

This is a 2^2 Factorial Design, where low value of a is ao = 12 and high value of a is a1 = 18. While low value of b is bo = 1 and high value of b is b1 = 2.

Model Equation:

\(Y_{i,j,k}\) = \(\mu\) + \(\alpha_i\) + \(\beta_j\) + \(\alpha\beta_{i,j}\) + \(\epsilon_{i,j,k}\)

Loading Data

Answer: From the ANOVA analysis, we can conclude that interaction of Factor A & B is significant. Therefore, we’ll stop here and we will look at the interaction plot.

Interaction Plot:

Residual Analysis:

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the data is normally distributed. Evaluating constant variance assumption, we can see that varaince spread is slightly off and thus we’ll consider the model inadequate.

Question 6.12

Model Equation:

\(Y_{i,j,k}\) = \(\mu\) + \(\alpha_i\) + \(\beta_j\) + \(\alpha\beta_{i,j}\) + \(\epsilon_{i,j,k}\)

Loading Data

Part A

Estimate the factor effects.

Factor Effects:

Effect of A: -0.31725

Effect of B: 0.586

Effect of AB: -0.2815

Part B

Conduct an analysis of variance. Which factors are important?

From the results of ANOVA analysis, we can conclude that interaction between factors A & B is insignificant. Thus removing interaction effect and testing for main effects.

As evident from the ANOVA analysis, our main effects are also insignificant.

Part C

Write down a regression equation that could be used to predict epitaxial layer thickness over the region of arsenic flow rate and deposition time used in this experiment.

\(Y_{i,j,k}\) = 14.52025 - 0.59875\(\alpha_i\) + 0.30450\(\beta_j\) + 0.56300\(\alpha\beta\gamma_{i,j}\) + \(\epsilon_{i,j,k}\)

Part D

Analyze the residuals. Are there any residuals that should cause concern?

Plotting the residuals, we can visualize that the model is inadequate and plots display that the data point 5 is an outlier which may be a point of concern for our analysis.

Part E

Discuss how you might deal with the potential outlier found in part (d).

We can perform a BoxCox transformation on the data and find out the appropriate value of lambda and then perform the ANOVA analysis on the transformed data.

Question 6.21

Loading Data

Part A

Analyze the data from this experiment. Which factors significantly affect putting performance?

At a significance level 0f 0.05, only main effects of factors A & B, aka length of putt and type of putter significantly affect putting performance.

Part B

Analyze the residuals from this experiment. Are there any indications of model inadequacy?

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the model is inadequate. Though the data satisfies the normality assumption, there’s a wide spread in variance of data.

Question 6.36

Resistivity on a silicon wafer is influenced by several factors. The results of a 24 factorial experiment performed during a critical processing step is shown in Table P6.10.

Loading Data

Part A

Estimate the factor effects. Plot the effect estimates on a normal probability plot and select a tentative model.

\(Y_{i,j,k}\) = 4.680625 + 3.160625\(\alpha_i\) - 1.501875\(\beta_j\) - 1.069375\(\alpha\beta{i,j}\) + \(\epsilon_{i,j,k}\)

Part B

Fit the model identified in part (a) and analyze the residuals. Is there any indication of model inadequacy?

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can see that the model is inadequate. We can visualize that the data is neither normally distributed nor the variance can be characterized constant.

Further the ANOVA analysis presents that all factors and interactions A,B,C,AB,AC,BC & ABC are significant at a significance level of 0.05.

Part C

Repeat the analysis from parts (a) and (b) using ln (y) as the response variable. Is there an indication that the transformation has been useful?

After the log transformation, we can observe that the halfnormal plot only characterizes factors A, B and ABC as significant. Interaction effect AB has been deemed insignificant as per halfnormal plot after transformation. ANOVA analysis also presents factors A, B and ABC as significant.

After the log transformation, we can observe that data is now almost falling in a straight line and spread has also became better. It hasn’t improved considerable but variance has been stabilized to some extent and thus we can conclude that transformation has been useful.

Part D

Fit a model in terms of the coded variables that can be used to predict the resistivity

Based on the transformed data:

\(Y_{i,j,k,l}\) = 1.185417 + 0.812870345\(\alpha_i\) - 0.314277554\(\beta_j\) + \(\epsilon_{i,j,k,l}\)

Question 6.39

An article in Quality and Reliability Engineering International (2010, Vol. 26, pp. 223–233) presents a 25 factorial design. The experiment is shown in Table P6.12.

Loading Data

Part A

Half normal plot displays factors A,D,E,A:D,D:E,B:E,A:B,A:E,A:B:E,A:D:E as significant. ANOVA analysis also presents the factors A,D,E,AB,AD,AE,BE,DE,ABE,ADE as significant.

Part B

Analyze the residuals from this experiment. Are there any indications of model inadequacy or violations of the assumptions?

Looking at the “Normal Q-Q” & “Residuals vs Fitted” plots we can conclude that the model is inadequate. Though the data satisfies the normality assumption, there’s a wide spread in variance of data.

Part C

One of the factors from this experiment does not seem to be important. If you drop this factor, what type of design remains? Analyze the data using the full factorial model for only the four active factors. Compare your results with those obtained in part (a).

Since factor C was insignificant and thus didn’t seem important, so dropped it. Even after dropping factor C and analyzing the data with four active factors still the results are same as in part A. ANOVA analysis in this case too presents the factors A,D,E,AB,AD,AE,BE,DE,ABE,ADE as significant.

Part D

Find settings of the active factors that maximize the predicted response.

Based on the transformed data:

Denoting factors A,B,D,E with alpha,beta,gamma & delta.

\(Y_{i,j,k,l}\) = 10.1803125 + 1.6159375\(\alpha_i\) + 0.0434375\(\beta_j\) + 2.9884375\(\gamma_k\) + 2.1878125\(\delta_l\) + \(\epsilon_{i,j,k,l}\)

Since in the equation all the factors have positive coefficients, thus they must be at +1 level (high) to produce max response.