library(dplyr)
library(tidyr)
library(GAD)
library(DoE.base)
A bacteriologist is interested in the effects of two different culture media and two different times on the growth of a particular virus. He or she performs six replicates of a \(2^2\) design, making the runs in random order. Analyze the bacterial growth data that follow and draw appropriate conclusions. Analyze the residuals and comment on the model’s adequacy.
BacteriaData <- read.csv("~/Grad School/IE 5342/Homework/6.8Data.csv")
\(y_{ijk} = \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon{ijk}\)
\(H_0: \alpha_i=0\) for all i
\(H_a: \alpha_i\neq0\) for some i
\(H_0: \beta_j=0\) for all j
\(H_a: \beta_j\neq0\) for some j
\(H_0: \alpha\beta_{ij}=0\) for all ij
\(H_a: \alpha\beta_{ij}\neq0\) for some ij
BacteriaData$Hour <- as.factor(BacteriaData$Hour)
BacteriaData$Medium <- as.factor(BacteriaData$Medium)
BacteriaDataModel <- aov(Response~Medium+Hours+Medium*Hours,data=BacteriaData)
summary(BacteriaDataModel)
## Df Sum Sq Mean Sq F value Pr(>F)
## Medium 1 9.4 9.4 1.835 0.190617
## Hours 1 590.0 590.0 115.506 9.29e-10 ***
## Medium:Hours 1 92.0 92.0 18.018 0.000397 ***
## Residuals 20 102.2 5.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AOV resulted in a p-value for the Medium effect of \(0.1906 > 0.05\), meaning the Feed Rate is not significant and we fail to reject \(H_0\).
AOV resulted in a p-value for the Hour effect of \(9.29*10^{-10} < 0.05\), meaning the Hour effect is significant and we reject \(H_0\).
AOV resulted in a p-value for Medium-Hour Interaction effect of \(0.000397 < 0.05\), meaning the Interaction effect is significant and we reject \(H_0\).
plot(BacteriaDataModel,2)
plot(BacteriaDataModel,1)
attach(BacteriaData)
interaction.plot(Medium,Hour,Response)
The residual plot and normal probability plots tell us that the data is normal and has a little variance. THe interaction plot tells us that interaction occurs since the two lines are non-parallel.
An article in the AT&T Technical Journal…..
WaferData <- read.csv("~/Grad School/IE 5342/Homework/6.12Data.csv",header=TRUE)
\(y_{ijk} = \alpha_i + \beta_j + \alpha\beta_{ij} + \epsilon{ijk}\)
\(H_0: \alpha_i=0\) for all i
\(H_a: \alpha_i\neq0\) for some i
\(H_0: \beta_j=0\) for all j
\(H_a: \beta_j\neq0\) for some j
\(H_0: \alpha\beta_{ij}=0\) for all ij
\(H_a: \alpha\beta_{ij}\neq0\) for some ij
Estimate the Factor Effects
n <- 4
Trt1Sum <- 58.081
Trt2Sum <- 55.686
Trt3Sum <- 59.299
Trt4Sum <- 59.156
EffectA <- 1/(2*n)*(Trt4Sum+Trt2Sum-Trt3Sum-Trt1Sum)
EffectB <- 1/(2*n)*(Trt4Sum+Trt3Sum-Trt2Sum-Trt1Sum)
EffectAB <- 1/(2*n)*(Trt1Sum+Trt4Sum-Trt3Sum-Trt2Sum)
mean(WaferData$Response)
## [1] 14.51388
EffectA
## [1] -0.31725
EffectB
## [1] 0.586
EffectAB
## [1] 0.2815
Conduct an Analysis of Variance. Which factors are important?
WaferDataModel <- lm(Response~FlowRate*Time,data=WaferData)
anova(WaferDataModel)
## Analysis of Variance Table
##
## Response: Response
## Df Sum Sq Mean Sq F value Pr(>F)
## FlowRate 1 0.4026 0.40259 1.2619 0.28327
## Time 1 1.3736 1.37358 4.3054 0.06016 .
## FlowRate:Time 1 0.3170 0.31697 0.9935 0.33856
## Residuals 12 3.8285 0.31904
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
AOV resulted in a p-value for the FlowRate effect of \(0.283 > 0.05\), meaning the Feed Rate is not significant and we fail to reject \(H_0\).
AOV resulted in a p-value for the Time effect of \(0.060 > 0.05\), meaning the Hour effect is not significant and we fail to reject \(H_0\).
AOV resulted in a p-value for FlowRate-Time Interaction effect of \(0.339 > 0.05\), meaning the Interaction effect is not significant and we fail to reject \(H_0\).
\(y=14.514-(0.317*A)/2+(0.586*B)/2+(0.282*AB)/2\)
Analyze the residuals. Are there any residuals that should cause concern?
plot(WaferDataModel,2)
plot(WaferDataModel,1)
The NPP and RVF plots show us that data point 2 is an outlier. Other than data point 2, the data is normal and has constant variance.
We could replace the outlier’s value with an average of other observations.
PuttData <- read.csv("~/Grad School/IE 5342/Homework/6.21Data3.csv",header=TRUE)
Analyze the data from this experiment. Which factors significantly affect putting performance?
PuttModel1 <- lm(Response~PuttLength*PutterType*PuttBreak*PuttSlope,data=PuttData)
halfnormal(PuttModel1)
## Warning in halfnormal.lm(PuttModel1): halfnormal not recommended for models with
## more residual df than model df
##
## Significant effects (alpha=0.05, Lenth method):
## [1] PuttLength e95 e28 e44 e49 PutterType e84
##
## [8] e32 e78
The half normal plot tells us that the Putt Length and Putt Type are significant factors, while the other two (Putt Break and Putt Slope) are not significant factors. With this, we can narrow down and do our ANOVA on those two significant factors.
PuttModel2 <- lm(Response~PuttLength*PutterType,data=PuttData)
anova(PuttModel2)
## Analysis of Variance Table
##
## Response: Response
## Df Sum Sq Mean Sq F value Pr(>F)
## PuttLength 1 917.1 917.15 10.9689 0.001261 **
## PutterType 1 388.1 388.15 4.6421 0.033418 *
## PuttLength:PutterType 1 218.7 218.68 2.6154 0.108750
## Residuals 108 9030.3 83.61
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Running ANOVA on our new model confirms what the halfnormal plot showed us that Putt Length is significant with a p-value of \(0.0013 << 0.05\) and that Putter Type is significant with a p-value of \(0.0334 < 0.05\). The Putt Length-Putter Type interaction is not significant with a p-value of \(0.1088 > 0.05\).
Analyze the residuals from this experiment. Are there any indications of model inadequacy?
plot(PuttModel1,2)
plot(PuttModel1,1)
The data is normal until it reaches Quantile 1 and then loses normality. The data has constant variance in difference parts, but overall the data does not have constant variance. No, the model is not adequate.
Resistivity on a silicon wafer is influenced by several factors. The results of a \(2^4\) factorial experiment performed during a critical processing step is shown in Table P6.10.
ResistivityDataa <- read.csv("~/Grad School/IE 5342/Homework/6.36Dataa.csv",header=TRUE)
Estimate the factor effects. Plot the effect estimates on a normal probability plot and select a tentative model.
ResistivityModel1 <- lm(Response~A*B*C*D, data=ResistivityDataa)
halfnormal(ResistivityModel1)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] A B A:B A:B:C
The half normal plot shows that Factor A, Factor B, and the interaction between Factor A and Factor B are significant, so we can narrow down our factors to those for running ANOVA.
Fit the model identified in part (a) and analyze the residuals. Is there any indication of model inadequacy?
ResistivityModel2 <- lm(Response~A+B+A*B,data=ResistivityDataa)
anova(ResistivityModel2)
## Analysis of Variance Table
##
## Response: Response
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 159.833 159.833 333.088 4.049e-10 ***
## B 1 36.090 36.090 75.211 1.630e-06 ***
## A:B 1 18.297 18.297 38.130 4.763e-05 ***
## Residuals 12 5.758 0.480
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ResistivityModel2,2)
plot(ResistivityModel2,1)
The NPP shows that the data is not normally distributed and the Residuals vs Fitted plot shows that the data does not have constant variance. This means that the model is inadequate.
Repeat the analysis from parts (a) and (b) using \(ln(y)\) as the response variable. Is there an indication that the transformation has been useful?
ResistivityDatab <- read.csv("~/Grad School/IE 5342/Homework/6.36Datab.csv",header=TRUE)
ResistivityModel3 <- lm(Response~A*B*C*D, data=ResistivityDatab)
halfnormal(ResistivityModel3)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] A B A:B:C
The half normal plot again shows that Factor A and Factor B are significant, but no longer shows significance of the interaction between Factor A and Factor B. It does however show significance between Factors A, B, and C. So we will use these three for our ANOVA.
ResistivityModel4 <- lm(Response~A+B+A*B*C,data=ResistivityDatab)
anova(ResistivityModel4)
## Analysis of Variance Table
##
## Response: Response
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 10.5721 10.5721 1994.5559 6.975e-11 ***
## B 1 1.5803 1.5803 298.1470 1.289e-07 ***
## C 1 0.0007 0.0007 0.1240 0.733861
## A:B 1 0.0097 0.0097 1.8393 0.212066
## A:C 1 0.0252 0.0252 4.7632 0.060627 .
## B:C 1 0.0003 0.0003 0.0539 0.822233
## A:B:C 1 0.0644 0.0644 12.1466 0.008256 **
## Residuals 8 0.0424 0.0053
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(ResistivityModel4,2)
plot(ResistivityModel4,1)
While the normality has improved with the \(ln\) transformation, the data is still not normal. The data still does not have constant variance. No, this transformation has not been useful.
Fit a model in terms of the coded variables that can be used to predict the resistivity.
coef(ResistivityModel4)
## (Intercept) A B C A:B A:C
## 1.185417116 0.812870345 -0.314277554 -0.006408558 -0.024684570 -0.039723700
## B:C A:B:C
## -0.004225796 0.063434408
From the coefficients, we get the following model equation:
\(y = 1.185+0.813A-0.314B-0.006C-0.025AB-0.040AC-0.004BC-0.063ABC\)
An article……..
Analyze the data from this experiment. Identify the significant factors and interactions.
dat639 <- read.csv("~/Grad School/IE 5342/Homework/6.39Data.csv",header=TRUE)
dat639model1 <- lm(Response~A*B*C*D*E, data = dat639)
halfnormal(dat639model1)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] D E A:D A D:E B:E A:B A:B:E A:E A:D:E
The half normal plot shows that A, D, and E are significant factors and A:B, A:D, A:E, B:E, D:E, A:B:E, and A:D:E are significant interactions.
Analyze the residuals from this experiment. Are there any indications of model inadequacy or violations of the assumptions?
dat639model2 <- aov(Response~A*B*D*E,data=dat639)
plot(dat639model2,2)
plot(dat639model2,1)
The plots show a fairly normal distribution of data and that the data has constant variance, so yes the model is adequate.
One of the factors does not seem to be important. If you drop this factor, what type of design remains? Analyze the data using the full factorial model for only the four active factors. Compare your results with those obtained in part (a).
Factor C does not appear to be significant based on previous results. If we drop Factor C, the design will be come a \(2^4\) design.
dat639c <- read.csv("~/Grad School/IE 5342/Homework/6.39Datac.csv",header=TRUE)
dat639model3 <- lm(Response~A*B*D*E, data = dat639c)
halfnormal(dat639model3)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] D E A:D A D:E B:E A:B A:B:E A:E A:D:E e10
The half normal plot again shows that A, D, and E are significant factors and A:B, A:D, A:E, B:E, D:E, A:B:E, and A:D:E are significant interactions. Thus dropping Factor C did not change any of the results.