The cheapskates at your firm have run a study using no replicants and a 25 design and have not told you what each effect is. The data is below. Find the meaningful effects, analyze the data, check your assumptions (particularly model fit), and provide a succinct write up of the final results
library(moments)
library(kableExtra)cData <- read.csv("WeeklyLab5.csv")
kable(cData, "html") %>%
kable_styling(bootstrap_options = "striped")| Effect.A | Effect.B | Effect.C | Effect.D | Effect.E | Y |
|---|---|---|---|---|---|
| 1 | 0 | 0 | 0 | 0 | 24.04330 |
| 0 | 1 | 0 | 1 | 0 | 89.39603 |
| 0 | 1 | 1 | 0 | 1 | 80.69679 |
| 1 | 0 | 1 | 0 | 0 | 39.48919 |
| 0 | 1 | 1 | 0 | 0 | 85.00000 |
| 1 | 1 | 1 | 0 | 1 | 98.88348 |
| 1 | 1 | 0 | 0 | 0 | 69.38102 |
| 0 | 0 | 0 | 0 | 1 | 85.84027 |
| 0 | 1 | 0 | 1 | 1 | 96.17769 |
| 0 | 1 | 1 | 1 | 0 | 29.07845 |
| 0 | 0 | 1 | 1 | 0 | 104.28087 |
| 1 | 0 | 0 | 1 | 1 | 93.00000 |
| 0 | 0 | 0 | 0 | 0 | 57.00000 |
| 0 | 0 | 0 | 1 | 0 | 52.10862 |
| 1 | 0 | 0 | 1 | 0 | 78.14386 |
| 1 | 0 | 1 | 0 | 1 | 77.80883 |
| 1 | 1 | 1 | 1 | 1 | 105.58464 |
| 0 | 1 | 0 | 0 | 1 | 112.74031 |
| 1 | 1 | 0 | 0 | 1 | 70.22562 |
| 1 | 1 | 1 | 1 | 0 | 101.52252 |
| 0 | 0 | 1 | 0 | 1 | 106.23952 |
| 0 | 1 | 1 | 1 | 1 | 51.08884 |
| 1 | 1 | 1 | 0 | 0 | 62.42781 |
| 0 | 0 | 1 | 0 | 0 | 39.00000 |
| 0 | 0 | 1 | 1 | 1 | 40.76413 |
| 1 | 1 | 0 | 1 | 0 | 27.01754 |
| 1 | 0 | 0 | 0 | 1 | 57.64639 |
| 0 | 0 | 0 | 1 | 1 | 27.78301 |
| 1 | 1 | 0 | 1 | 1 | 81.87806 |
| 1 | 0 | 1 | 1 | 0 | 88.96438 |
| 0 | 1 | 0 | 0 | 0 | 20.90311 |
| 1 | 0 | 1 | 1 | 1 | 71.62483 |
summary(cData)## Effect.A Effect.B Effect.C Effect.D Effect.E
## Min. :0.0 Min. :0.0 Min. :0.0 Min. :0.0 Min. :0.0
## 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0 1st Qu.:0.0
## Median :0.5 Median :0.5 Median :0.5 Median :0.5 Median :0.5
## Mean :0.5 Mean :0.5 Mean :0.5 Mean :0.5 Mean :0.5
## 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.:1.0 3rd Qu.:1.0
## Max. :1.0 Max. :1.0 Max. :1.0 Max. :1.0 Max. :1.0
## Y
## Min. : 20.90
## 1st Qu.: 48.51
## Median : 74.72
## Mean : 69.55
## 3rd Qu.: 90.30
## Max. :112.74
plot(density(cData$Y))From the above plot, it seems that there is a bit of negative skewness in the data. The level of skewness can be identified by using D’Agostino skewness test
agostino.test(cData$Y)##
## D'Agostino skewness test
##
## data: cData$Y
## skew = -0.28417, z = -0.75491, p-value = 0.4503
## alternative hypothesis: data have a skewness
The skewness value is close to zero so we can assume that the Y values are normally distributed.
Now this data follows a 2^k factorial design with k being 5 so there could be multiple levels of interactions between these effect variables. These interactions are pretty complex to model
Converting the variables from int to factors
cData$Effect.A <- factor(cData$Effect.A)
cData$Effect.B <- factor(cData$Effect.B)
cData$Effect.C <- factor(cData$Effect.C)
cData$Effect.D <- factor(cData$Effect.D)
cData$Effect.E <- factor(cData$Effect.E)Start by plotting the response data several ways to see if any trends or anomalies appear that would not be accounted for by the standard linear response models.
par(bg=rgb(1,1,0.8), mfrow=c(2,2))
qqnorm(cData$Y)
qqline(cData$Y, col = 2)
boxplot(cData$Y, horizontal=TRUE, main="Box Plot", xlab="Experiment Response")
hist(cData$Y, main="Histogram", xlab="Experiment Response")
par(mfrow=c(1,1))par(bg=rgb(1,1,0.8),mfrow=c(2,3))
boxplot(Y~Effect.A, data=cData, main="Response by Effect A",
xlab="Effect A",ylab="Response")
boxplot(Y~Effect.B, data=cData, main="Response by Effect B",
xlab="Effect B",ylab="Response")
boxplot(Y~Effect.C, data=cData, main="Response by Effect C",
xlab="Effect C",ylab="Response")
boxplot(Y~Effect.D, data=cData, main="Response by Effect D",
xlab="Effect D",ylab="Response")
boxplot(Y~Effect.E, data=cData, main="Response by Effect E",
xlab="Effect E",ylab="Response")
par(mfrow=c(1,1))From the above, it seems that Effect.E followed by Effect.B appear to change the response variable Y as compared to the other effects. However, in the model building process, we will try all 5 effects and their interaction effects.
Let’s build the first model. For a 25 full factorial experiment we can fit a model containing a mean term, Five main effect terms, Ten two-factor interaction terms, Ten three-factor interaction terms, Five four-factor interaction terms, and a Five-factor interaction term (32 parameters). However, we start by assuming all four-factor and higher interaction terms are non-existent. It’s very rare for such high-order interactions to be significant, and they are very difficult to interpret from an engineering viewpoint.
Model1 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^3, data = cData)
summary(Model1)## Df Sum Sq Mean Sq F value Pr(>F)
## Effect.A 1 151 151.1 0.135 0.726
## Effect.B 1 597 597.4 0.534 0.492
## Effect.C 1 605 605.3 0.541 0.490
## Effect.D 1 82 81.6 0.073 0.796
## Effect.E 1 2632 2632.2 2.355 0.176
## Effect.A:Effect.B 1 36 36.4 0.033 0.863
## Effect.A:Effect.C 1 710 710.4 0.635 0.456
## Effect.A:Effect.D 1 1869 1869.2 1.672 0.244
## Effect.A:Effect.E 1 53 52.8 0.047 0.835
## Effect.B:Effect.C 1 66 66.2 0.059 0.816
## Effect.B:Effect.D 1 243 242.6 0.217 0.658
## Effect.B:Effect.E 1 568 568.5 0.509 0.503
## Effect.C:Effect.D 1 61 61.5 0.055 0.822
## Effect.C:Effect.E 1 483 483.4 0.432 0.535
## Effect.D:Effect.E 1 2728 2727.8 2.440 0.169
## Effect.A:Effect.B:Effect.C 1 1737 1737.1 1.554 0.259
## Effect.A:Effect.B:Effect.D 1 677 677.2 0.606 0.466
## Effect.A:Effect.B:Effect.E 1 207 206.6 0.185 0.682
## Effect.A:Effect.C:Effect.D 1 345 344.9 0.308 0.599
## Effect.A:Effect.C:Effect.E 1 48 47.6 0.043 0.843
## Effect.A:Effect.D:Effect.E 1 1127 1126.7 1.008 0.354
## Effect.B:Effect.C:Effect.D 1 188 187.9 0.168 0.696
## Effect.B:Effect.C:Effect.E 1 144 143.8 0.129 0.732
## Effect.B:Effect.D:Effect.E 1 1529 1529.2 1.368 0.287
## Effect.C:Effect.D:Effect.E 1 251 250.6 0.224 0.653
## Residuals 6 6707 1117.9
From the above Anova model, it seems that none of the variables including the interactions effects came out to be significant. Let’s try 2 order interactions instead of 3.
Model2 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^2, data = cData)
summary(Model2)## Df Sum Sq Mean Sq F value Pr(>F)
## Effect.A 1 151 151.1 0.187 0.6715
## Effect.B 1 597 597.4 0.738 0.4031
## Effect.C 1 605 605.3 0.747 0.4001
## Effect.D 1 82 81.6 0.101 0.7551
## Effect.E 1 2632 2632.2 3.250 0.0903 .
## Effect.A:Effect.B 1 36 36.4 0.045 0.8348
## Effect.A:Effect.C 1 710 710.4 0.877 0.3629
## Effect.A:Effect.D 1 1869 1869.2 2.308 0.1482
## Effect.A:Effect.E 1 53 52.8 0.065 0.8018
## Effect.B:Effect.C 1 66 66.2 0.082 0.7785
## Effect.B:Effect.D 1 243 242.6 0.300 0.5917
## Effect.B:Effect.E 1 568 568.5 0.702 0.4145
## Effect.C:Effect.D 1 61 61.5 0.076 0.7864
## Effect.C:Effect.E 1 483 483.4 0.597 0.4511
## Effect.D:Effect.E 1 2728 2727.8 3.368 0.0851 .
## Residuals 16 12959 809.9
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ModelStepWise = step(Model2,direction="backward")## Start: AIC=224.12
## Y ~ (Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^2
##
## Df Sum of Sq RSS AIC
## - Effect.A:Effect.B 1 36.41 12995 222.21
## - Effect.A:Effect.E 1 52.78 13012 222.25
## - Effect.C:Effect.D 1 61.50 13020 222.27
## - Effect.B:Effect.C 1 66.25 13025 222.28
## - Effect.B:Effect.D 1 242.64 13201 222.72
## - Effect.C:Effect.E 1 483.37 13442 223.29
## - Effect.B:Effect.E 1 568.45 13527 223.50
## - Effect.A:Effect.C 1 710.37 13669 223.83
## <none> 12959 224.12
## - Effect.A:Effect.D 1 1869.24 14828 226.43
## - Effect.D:Effect.E 1 2727.79 15687 228.23
##
## Step: AIC=222.21
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.A:Effect.E + Effect.B:Effect.C +
## Effect.B:Effect.D + Effect.B:Effect.E + Effect.C:Effect.D +
## Effect.C:Effect.E + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.A:Effect.E 1 52.78 13048 220.34
## - Effect.C:Effect.D 1 61.50 13057 220.36
## - Effect.B:Effect.C 1 66.25 13061 220.37
## - Effect.B:Effect.D 1 242.64 13238 220.80
## - Effect.C:Effect.E 1 483.37 13479 221.38
## - Effect.B:Effect.E 1 568.45 13564 221.58
## - Effect.A:Effect.C 1 710.37 13706 221.91
## <none> 12995 222.21
## - Effect.A:Effect.D 1 1869.24 14864 224.51
## - Effect.D:Effect.E 1 2727.79 15723 226.31
##
## Step: AIC=220.34
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.B:Effect.C + Effect.B:Effect.D +
## Effect.B:Effect.E + Effect.C:Effect.D + Effect.C:Effect.E +
## Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.C:Effect.D 1 61.50 13110 218.49
## - Effect.B:Effect.C 1 66.25 13114 218.50
## - Effect.B:Effect.D 1 242.64 13291 218.93
## - Effect.C:Effect.E 1 483.37 13531 219.50
## - Effect.B:Effect.E 1 568.45 13616 219.71
## - Effect.A:Effect.C 1 710.37 13758 220.04
## <none> 13048 220.34
## - Effect.A:Effect.D 1 1869.24 14917 222.62
## - Effect.D:Effect.E 1 2727.79 15776 224.42
##
## Step: AIC=218.49
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.B:Effect.C + Effect.B:Effect.D +
## Effect.B:Effect.E + Effect.C:Effect.E + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.B:Effect.C 1 66.25 13176 216.65
## - Effect.B:Effect.D 1 242.64 13352 217.08
## - Effect.C:Effect.E 1 483.37 13593 217.65
## - Effect.B:Effect.E 1 568.45 13678 217.85
## - Effect.A:Effect.C 1 710.37 13820 218.18
## <none> 13110 218.49
## - Effect.A:Effect.D 1 1869.24 14979 220.76
## - Effect.D:Effect.E 1 2727.79 15837 222.54
##
## Step: AIC=216.65
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.B:Effect.D + Effect.B:Effect.E +
## Effect.C:Effect.E + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.B:Effect.D 1 242.64 13418 215.24
## - Effect.C:Effect.E 1 483.37 13659 215.81
## - Effect.B:Effect.E 1 568.45 13744 216.00
## - Effect.A:Effect.C 1 710.37 13886 216.33
## <none> 13176 216.65
## - Effect.A:Effect.D 1 1869.24 15045 218.90
## - Effect.D:Effect.E 1 2727.79 15904 220.67
##
## Step: AIC=215.24
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.B:Effect.E + Effect.C:Effect.E +
## Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.C:Effect.E 1 483.37 13902 214.37
## - Effect.B:Effect.E 1 568.45 13987 214.56
## - Effect.A:Effect.C 1 710.37 14129 214.89
## <none> 13418 215.24
## - Effect.A:Effect.D 1 1869.24 15288 217.41
## - Effect.D:Effect.E 1 2727.79 16146 219.16
##
## Step: AIC=214.37
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.B:Effect.E + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.B:Effect.E 1 568.45 14470 213.65
## - Effect.A:Effect.C 1 710.37 14612 213.96
## <none> 13902 214.37
## - Effect.A:Effect.D 1 1869.24 15771 216.41
## - Effect.D:Effect.E 1 2727.79 16630 218.10
##
## Step: AIC=213.65
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.B 1 597.41 15068 212.95
## - Effect.A:Effect.C 1 710.37 15180 213.19
## <none> 14470 213.65
## - Effect.A:Effect.D 1 1869.24 16339 215.54
## - Effect.D:Effect.E 1 2727.79 17198 217.18
##
## Step: AIC=212.95
## Y ~ Effect.A + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C +
## Effect.A:Effect.D + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.A:Effect.C 1 710.37 15778 212.42
## <none> 15068 212.95
## - Effect.A:Effect.D 1 1869.24 16937 214.69
## - Effect.D:Effect.E 1 2727.79 17795 216.27
##
## Step: AIC=212.42
## Y ~ Effect.A + Effect.C + Effect.D + Effect.E + Effect.A:Effect.D +
## Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## - Effect.C 1 605.25 16383 211.62
## <none> 15778 212.42
## - Effect.A:Effect.D 1 1869.24 17647 214.00
## - Effect.D:Effect.E 1 2727.79 18506 215.52
##
## Step: AIC=211.62
## Y ~ Effect.A + Effect.D + Effect.E + Effect.A:Effect.D + Effect.D:Effect.E
##
## Df Sum of Sq RSS AIC
## <none> 16383 211.62
## - Effect.A:Effect.D 1 1869.2 18252 213.08
## - Effect.D:Effect.E 1 2727.8 19111 214.55
summary(ModelStepWise)## Df Sum Sq Mean Sq F value Pr(>F)
## Effect.A 1 151 151.1 0.240 0.6284
## Effect.D 1 82 81.6 0.129 0.7219
## Effect.E 1 2632 2632.2 4.177 0.0512 .
## Effect.A:Effect.D 1 1869 1869.2 2.966 0.0969 .
## Effect.D:Effect.E 1 2728 2727.8 4.329 0.0475 *
## Residuals 26 16383 630.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Building a model without insignificant terms
Model3 <- aov(Y~ Effect.E + Effect.A:Effect.D + Effect.E:Effect.D, data = cData)
summary(Model3)## Df Sum Sq Mean Sq F value Pr(>F)
## Effect.E 1 2632 2632.2 4.177 0.0512 .
## Effect.A:Effect.D 3 2102 700.6 1.112 0.3622
## Effect.E:Effect.D 1 2728 2727.8 4.329 0.0475 *
## Residuals 26 16383 630.1
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(Model3)##
## Call:
## aov(formula = Y ~ Effect.E + Effect.A:Effect.D + Effect.E:Effect.D,
## data = cData)
##
## Residuals:
## Min 1Q Median 3Q Max
## -54.113 -12.306 -2.984 18.780 42.783
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.13 10.87 7.464 6.33e-08 ***
## Effect.E1 36.60 12.55 2.916 0.0072 **
## Effect.A0:Effect.D0 -26.00 15.37 -1.692 0.1027
## Effect.A1:Effect.D0 -36.94 15.37 -2.403 0.0237 *
## Effect.A0:Effect.D1 -19.63 12.55 -1.564 0.1299
## Effect.A1:Effect.D1 NA NA NA NA
## Effect.E1:Effect.D1 -36.93 17.75 -2.081 0.0475 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 25.1 on 26 degrees of freedom
## Multiple R-squared: 0.3129, Adjusted R-squared: 0.1808
## F-statistic: 2.368 on 5 and 26 DF, p-value: 0.06737
Fit a model with all the interaction effects and save those effects
Model4 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^5, data = cData)
ModelEffects <- Model4$effects
ModelEffects <- ModelEffects[-1]
SortModelEffects <- ModelEffects[order(ModelEffects)]
ModelEffectsNames = names(SortModelEffects)ip = ppoints(length(SortModelEffects))
zp = qnorm(ip)
par(bg=rgb(1,1,0.8))
plot(zp,SortModelEffects, ylim=c(-120,70), xlim=c(-2,3),
ylab="Effect", xlab="Theoretical Quantiles",
main="Normal Probability Plot of Model Effects")
qqline(SortModelEffects, col=2)
abline(h=0, col=4)
small = c(1:2)
small2 = c((length(SortModelEffects)-1):(length(SortModelEffects)))
text(zp[small],SortModelEffects[small],label=ModelEffectsNames[small],pos=2,cex=0.8)
text(zp[small2],SortModelEffects[small2],label=ModelEffectsNames[small2],pos=2,cex=0.8)par(mfrow=c(1,1))Based on the above plot, we can see that most of the effects aren’t significant as they follow the fitted normal model straight line. Only a couple of effects - Effect E and its interaction with Effect D - are away from the straight line and are significant (as suggested by our Anova Model3)
We have developed an anova model that is significant with an adjusted R-square of 0.196. However, the adjusted R-square is pretty low because out of the 5 effects only a couple (including interaction) have a significant effect on the response variable Y. Multiple interaction effects of different orders were considered but none of them came out to be significant. This analysis basically shows that Effect E is the main drive of the response along with its interaction with Effect D. To develop a more robust model, further transformations such as square-root, log, etc. can be done on the response variable. More replicants of the experiment design could also be obtained to get better results.