ANLY 510 - Weekly Lab 5

Saurav Chopra

2018-06-04


Objectives

The cheapskates at your firm have run a study using no replicants and a 25 design and have not told you what each effect is. The data is below. Find the meaningful effects, analyze the data, check your assumptions (particularly model fit), and provide a succinct write up of the final results

Loading libraries

library(moments)
library(kableExtra)

Loading the dataset

cData <- read.csv("WeeklyLab5.csv")
kable(cData, "html") %>%
  kable_styling(bootstrap_options = "striped")
Effect.A Effect.B Effect.C Effect.D Effect.E Y
1 0 0 0 0 24.04330
0 1 0 1 0 89.39603
0 1 1 0 1 80.69679
1 0 1 0 0 39.48919
0 1 1 0 0 85.00000
1 1 1 0 1 98.88348
1 1 0 0 0 69.38102
0 0 0 0 1 85.84027
0 1 0 1 1 96.17769
0 1 1 1 0 29.07845
0 0 1 1 0 104.28087
1 0 0 1 1 93.00000
0 0 0 0 0 57.00000
0 0 0 1 0 52.10862
1 0 0 1 0 78.14386
1 0 1 0 1 77.80883
1 1 1 1 1 105.58464
0 1 0 0 1 112.74031
1 1 0 0 1 70.22562
1 1 1 1 0 101.52252
0 0 1 0 1 106.23952
0 1 1 1 1 51.08884
1 1 1 0 0 62.42781
0 0 1 0 0 39.00000
0 0 1 1 1 40.76413
1 1 0 1 0 27.01754
1 0 0 0 1 57.64639
0 0 0 1 1 27.78301
1 1 0 1 1 81.87806
1 0 1 1 0 88.96438
0 1 0 0 0 20.90311
1 0 1 1 1 71.62483
summary(cData)
##     Effect.A      Effect.B      Effect.C      Effect.D      Effect.E  
##  Min.   :0.0   Min.   :0.0   Min.   :0.0   Min.   :0.0   Min.   :0.0  
##  1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0   1st Qu.:0.0  
##  Median :0.5   Median :0.5   Median :0.5   Median :0.5   Median :0.5  
##  Mean   :0.5   Mean   :0.5   Mean   :0.5   Mean   :0.5   Mean   :0.5  
##  3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:1.0   3rd Qu.:1.0  
##  Max.   :1.0   Max.   :1.0   Max.   :1.0   Max.   :1.0   Max.   :1.0  
##        Y         
##  Min.   : 20.90  
##  1st Qu.: 48.51  
##  Median : 74.72  
##  Mean   : 69.55  
##  3rd Qu.: 90.30  
##  Max.   :112.74

Plotting the distribution to see if there is any skewness in the data or not

plot(density(cData$Y))

From the above plot, it seems that there is a bit of negative skewness in the data. The level of skewness can be identified by using D’Agostino skewness test

agostino.test(cData$Y)
## 
##  D'Agostino skewness test
## 
## data:  cData$Y
## skew = -0.28417, z = -0.75491, p-value = 0.4503
## alternative hypothesis: data have a skewness

The skewness value is close to zero so we can assume that the Y values are normally distributed.

Now this data follows a 2^k factorial design with k being 5 so there could be multiple levels of interactions between these effect variables. These interactions are pretty complex to model

Converting the variables from int to factors

cData$Effect.A <- factor(cData$Effect.A)
cData$Effect.B <- factor(cData$Effect.B)
cData$Effect.C <- factor(cData$Effect.C)
cData$Effect.D <- factor(cData$Effect.D)
cData$Effect.E <- factor(cData$Effect.E)

Data exploration:

Start by plotting the response data several ways to see if any trends or anomalies appear that would not be accounted for by the standard linear response models.

par(bg=rgb(1,1,0.8), mfrow=c(2,2))
qqnorm(cData$Y)
qqline(cData$Y, col = 2)
boxplot(cData$Y, horizontal=TRUE, main="Box Plot", xlab="Experiment Response")
hist(cData$Y, main="Histogram", xlab="Experiment Response")
par(mfrow=c(1,1))

par(bg=rgb(1,1,0.8),mfrow=c(2,3))
boxplot(Y~Effect.A, data=cData, main="Response by Effect A",
        xlab="Effect A",ylab="Response")

boxplot(Y~Effect.B, data=cData, main="Response by Effect B",
        xlab="Effect B",ylab="Response")

boxplot(Y~Effect.C, data=cData, main="Response by Effect C",
        xlab="Effect C",ylab="Response")

boxplot(Y~Effect.D, data=cData, main="Response by Effect D",
        xlab="Effect D",ylab="Response")

boxplot(Y~Effect.E, data=cData, main="Response by Effect E",
        xlab="Effect E",ylab="Response")
par(mfrow=c(1,1))

From the above, it seems that Effect.E followed by Effect.B appear to change the response variable Y as compared to the other effects. However, in the model building process, we will try all 5 effects and their interaction effects.

Model building:

Let’s build the first model. For a 25 full factorial experiment we can fit a model containing a mean term, Five main effect terms, Ten two-factor interaction terms, Ten three-factor interaction terms, Five four-factor interaction terms, and a Five-factor interaction term (32 parameters). However, we start by assuming all four-factor and higher interaction terms are non-existent. It’s very rare for such high-order interactions to be significant, and they are very difficult to interpret from an engineering viewpoint.

Model1 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^3, data = cData)
summary(Model1)
##                            Df Sum Sq Mean Sq F value Pr(>F)
## Effect.A                    1    151   151.1   0.135  0.726
## Effect.B                    1    597   597.4   0.534  0.492
## Effect.C                    1    605   605.3   0.541  0.490
## Effect.D                    1     82    81.6   0.073  0.796
## Effect.E                    1   2632  2632.2   2.355  0.176
## Effect.A:Effect.B           1     36    36.4   0.033  0.863
## Effect.A:Effect.C           1    710   710.4   0.635  0.456
## Effect.A:Effect.D           1   1869  1869.2   1.672  0.244
## Effect.A:Effect.E           1     53    52.8   0.047  0.835
## Effect.B:Effect.C           1     66    66.2   0.059  0.816
## Effect.B:Effect.D           1    243   242.6   0.217  0.658
## Effect.B:Effect.E           1    568   568.5   0.509  0.503
## Effect.C:Effect.D           1     61    61.5   0.055  0.822
## Effect.C:Effect.E           1    483   483.4   0.432  0.535
## Effect.D:Effect.E           1   2728  2727.8   2.440  0.169
## Effect.A:Effect.B:Effect.C  1   1737  1737.1   1.554  0.259
## Effect.A:Effect.B:Effect.D  1    677   677.2   0.606  0.466
## Effect.A:Effect.B:Effect.E  1    207   206.6   0.185  0.682
## Effect.A:Effect.C:Effect.D  1    345   344.9   0.308  0.599
## Effect.A:Effect.C:Effect.E  1     48    47.6   0.043  0.843
## Effect.A:Effect.D:Effect.E  1   1127  1126.7   1.008  0.354
## Effect.B:Effect.C:Effect.D  1    188   187.9   0.168  0.696
## Effect.B:Effect.C:Effect.E  1    144   143.8   0.129  0.732
## Effect.B:Effect.D:Effect.E  1   1529  1529.2   1.368  0.287
## Effect.C:Effect.D:Effect.E  1    251   250.6   0.224  0.653
## Residuals                   6   6707  1117.9

From the above Anova model, it seems that none of the variables including the interactions effects came out to be significant. Let’s try 2 order interactions instead of 3.

Model2 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^2, data = cData)
summary(Model2)
##                   Df Sum Sq Mean Sq F value Pr(>F)  
## Effect.A           1    151   151.1   0.187 0.6715  
## Effect.B           1    597   597.4   0.738 0.4031  
## Effect.C           1    605   605.3   0.747 0.4001  
## Effect.D           1     82    81.6   0.101 0.7551  
## Effect.E           1   2632  2632.2   3.250 0.0903 .
## Effect.A:Effect.B  1     36    36.4   0.045 0.8348  
## Effect.A:Effect.C  1    710   710.4   0.877 0.3629  
## Effect.A:Effect.D  1   1869  1869.2   2.308 0.1482  
## Effect.A:Effect.E  1     53    52.8   0.065 0.8018  
## Effect.B:Effect.C  1     66    66.2   0.082 0.7785  
## Effect.B:Effect.D  1    243   242.6   0.300 0.5917  
## Effect.B:Effect.E  1    568   568.5   0.702 0.4145  
## Effect.C:Effect.D  1     61    61.5   0.076 0.7864  
## Effect.C:Effect.E  1    483   483.4   0.597 0.4511  
## Effect.D:Effect.E  1   2728  2727.8   3.368 0.0851 .
## Residuals         16  12959   809.9                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ModelStepWise = step(Model2,direction="backward")
## Start:  AIC=224.12
## Y ~ (Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^2
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.A:Effect.B  1     36.41 12995 222.21
## - Effect.A:Effect.E  1     52.78 13012 222.25
## - Effect.C:Effect.D  1     61.50 13020 222.27
## - Effect.B:Effect.C  1     66.25 13025 222.28
## - Effect.B:Effect.D  1    242.64 13201 222.72
## - Effect.C:Effect.E  1    483.37 13442 223.29
## - Effect.B:Effect.E  1    568.45 13527 223.50
## - Effect.A:Effect.C  1    710.37 13669 223.83
## <none>                           12959 224.12
## - Effect.A:Effect.D  1   1869.24 14828 226.43
## - Effect.D:Effect.E  1   2727.79 15687 228.23
## 
## Step:  AIC=222.21
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.A:Effect.E + Effect.B:Effect.C + 
##     Effect.B:Effect.D + Effect.B:Effect.E + Effect.C:Effect.D + 
##     Effect.C:Effect.E + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.A:Effect.E  1     52.78 13048 220.34
## - Effect.C:Effect.D  1     61.50 13057 220.36
## - Effect.B:Effect.C  1     66.25 13061 220.37
## - Effect.B:Effect.D  1    242.64 13238 220.80
## - Effect.C:Effect.E  1    483.37 13479 221.38
## - Effect.B:Effect.E  1    568.45 13564 221.58
## - Effect.A:Effect.C  1    710.37 13706 221.91
## <none>                           12995 222.21
## - Effect.A:Effect.D  1   1869.24 14864 224.51
## - Effect.D:Effect.E  1   2727.79 15723 226.31
## 
## Step:  AIC=220.34
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.B:Effect.C + Effect.B:Effect.D + 
##     Effect.B:Effect.E + Effect.C:Effect.D + Effect.C:Effect.E + 
##     Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.C:Effect.D  1     61.50 13110 218.49
## - Effect.B:Effect.C  1     66.25 13114 218.50
## - Effect.B:Effect.D  1    242.64 13291 218.93
## - Effect.C:Effect.E  1    483.37 13531 219.50
## - Effect.B:Effect.E  1    568.45 13616 219.71
## - Effect.A:Effect.C  1    710.37 13758 220.04
## <none>                           13048 220.34
## - Effect.A:Effect.D  1   1869.24 14917 222.62
## - Effect.D:Effect.E  1   2727.79 15776 224.42
## 
## Step:  AIC=218.49
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.B:Effect.C + Effect.B:Effect.D + 
##     Effect.B:Effect.E + Effect.C:Effect.E + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.B:Effect.C  1     66.25 13176 216.65
## - Effect.B:Effect.D  1    242.64 13352 217.08
## - Effect.C:Effect.E  1    483.37 13593 217.65
## - Effect.B:Effect.E  1    568.45 13678 217.85
## - Effect.A:Effect.C  1    710.37 13820 218.18
## <none>                           13110 218.49
## - Effect.A:Effect.D  1   1869.24 14979 220.76
## - Effect.D:Effect.E  1   2727.79 15837 222.54
## 
## Step:  AIC=216.65
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.B:Effect.D + Effect.B:Effect.E + 
##     Effect.C:Effect.E + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.B:Effect.D  1    242.64 13418 215.24
## - Effect.C:Effect.E  1    483.37 13659 215.81
## - Effect.B:Effect.E  1    568.45 13744 216.00
## - Effect.A:Effect.C  1    710.37 13886 216.33
## <none>                           13176 216.65
## - Effect.A:Effect.D  1   1869.24 15045 218.90
## - Effect.D:Effect.E  1   2727.79 15904 220.67
## 
## Step:  AIC=215.24
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.B:Effect.E + Effect.C:Effect.E + 
##     Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.C:Effect.E  1    483.37 13902 214.37
## - Effect.B:Effect.E  1    568.45 13987 214.56
## - Effect.A:Effect.C  1    710.37 14129 214.89
## <none>                           13418 215.24
## - Effect.A:Effect.D  1   1869.24 15288 217.41
## - Effect.D:Effect.E  1   2727.79 16146 219.16
## 
## Step:  AIC=214.37
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.B:Effect.E + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.B:Effect.E  1    568.45 14470 213.65
## - Effect.A:Effect.C  1    710.37 14612 213.96
## <none>                           13902 214.37
## - Effect.A:Effect.D  1   1869.24 15771 216.41
## - Effect.D:Effect.E  1   2727.79 16630 218.10
## 
## Step:  AIC=213.65
## Y ~ Effect.A + Effect.B + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.B           1    597.41 15068 212.95
## - Effect.A:Effect.C  1    710.37 15180 213.19
## <none>                           14470 213.65
## - Effect.A:Effect.D  1   1869.24 16339 215.54
## - Effect.D:Effect.E  1   2727.79 17198 217.18
## 
## Step:  AIC=212.95
## Y ~ Effect.A + Effect.C + Effect.D + Effect.E + Effect.A:Effect.C + 
##     Effect.A:Effect.D + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.A:Effect.C  1    710.37 15778 212.42
## <none>                           15068 212.95
## - Effect.A:Effect.D  1   1869.24 16937 214.69
## - Effect.D:Effect.E  1   2727.79 17795 216.27
## 
## Step:  AIC=212.42
## Y ~ Effect.A + Effect.C + Effect.D + Effect.E + Effect.A:Effect.D + 
##     Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## - Effect.C           1    605.25 16383 211.62
## <none>                           15778 212.42
## - Effect.A:Effect.D  1   1869.24 17647 214.00
## - Effect.D:Effect.E  1   2727.79 18506 215.52
## 
## Step:  AIC=211.62
## Y ~ Effect.A + Effect.D + Effect.E + Effect.A:Effect.D + Effect.D:Effect.E
## 
##                     Df Sum of Sq   RSS    AIC
## <none>                           16383 211.62
## - Effect.A:Effect.D  1    1869.2 18252 213.08
## - Effect.D:Effect.E  1    2727.8 19111 214.55
summary(ModelStepWise)
##                   Df Sum Sq Mean Sq F value Pr(>F)  
## Effect.A           1    151   151.1   0.240 0.6284  
## Effect.D           1     82    81.6   0.129 0.7219  
## Effect.E           1   2632  2632.2   4.177 0.0512 .
## Effect.A:Effect.D  1   1869  1869.2   2.966 0.0969 .
## Effect.D:Effect.E  1   2728  2727.8   4.329 0.0475 *
## Residuals         26  16383   630.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using regression and anova models, we have got a few variables that are coming out to be significant/close to being significant

Building a model without insignificant terms

Model3 <- aov(Y~ Effect.E + Effect.A:Effect.D + Effect.E:Effect.D, data = cData)
summary(Model3)
##                   Df Sum Sq Mean Sq F value Pr(>F)  
## Effect.E           1   2632  2632.2   4.177 0.0512 .
## Effect.A:Effect.D  3   2102   700.6   1.112 0.3622  
## Effect.E:Effect.D  1   2728  2727.8   4.329 0.0475 *
## Residuals         26  16383   630.1                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary.lm(Model3)
## 
## Call:
## aov(formula = Y ~ Effect.E + Effect.A:Effect.D + Effect.E:Effect.D, 
##     data = cData)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -54.113 -12.306  -2.984  18.780  42.783 
## 
## Coefficients: (1 not defined because of singularities)
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            81.13      10.87   7.464 6.33e-08 ***
## Effect.E1              36.60      12.55   2.916   0.0072 ** 
## Effect.A0:Effect.D0   -26.00      15.37  -1.692   0.1027    
## Effect.A1:Effect.D0   -36.94      15.37  -2.403   0.0237 *  
## Effect.A0:Effect.D1   -19.63      12.55  -1.564   0.1299    
## Effect.A1:Effect.D1       NA         NA      NA       NA    
## Effect.E1:Effect.D1   -36.93      17.75  -2.081   0.0475 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 25.1 on 26 degrees of freedom
## Multiple R-squared:  0.3129, Adjusted R-squared:  0.1808 
## F-statistic: 2.368 on 5 and 26 DF,  p-value: 0.06737

Fit a model with all the interaction effects and save those effects

Model4 <- aov(Y~(Effect.A + Effect.B + Effect.C + Effect.D + Effect.E)^5, data = cData)
ModelEffects <- Model4$effects
ModelEffects <- ModelEffects[-1] 
SortModelEffects <- ModelEffects[order(ModelEffects)]
ModelEffectsNames = names(SortModelEffects)

Quantiles and normal probability plot of all effects

ip = ppoints(length(SortModelEffects))
zp = qnorm(ip)
par(bg=rgb(1,1,0.8))
plot(zp,SortModelEffects, ylim=c(-120,70), xlim=c(-2,3),
     ylab="Effect", xlab="Theoretical Quantiles",
     main="Normal Probability Plot of Model Effects")
qqline(SortModelEffects, col=2)
abline(h=0, col=4)
small = c(1:2)
small2 = c((length(SortModelEffects)-1):(length(SortModelEffects)))
text(zp[small],SortModelEffects[small],label=ModelEffectsNames[small],pos=2,cex=0.8)
text(zp[small2],SortModelEffects[small2],label=ModelEffectsNames[small2],pos=2,cex=0.8)

par(mfrow=c(1,1))

Based on the above plot, we can see that most of the effects aren’t significant as they follow the fitted normal model straight line. Only a couple of effects - Effect E and its interaction with Effect D - are away from the straight line and are significant (as suggested by our Anova Model3)

Conclusion:

We have developed an anova model that is significant with an adjusted R-square of 0.196. However, the adjusted R-square is pretty low because out of the 5 effects only a couple (including interaction) have a significant effect on the response variable Y. Multiple interaction effects of different orders were considered but none of them came out to be significant. This analysis basically shows that Effect E is the main drive of the response along with its interaction with Effect D. To develop a more robust model, further transformations such as square-root, log, etc. can be done on the response variable. More replicants of the experiment design could also be obtained to get better results.