Getting in data
library(DoE.base)
A<- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B<- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C<- c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D<- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
obs<- c(1.92,11.28,1.09,5.75,2.13,9.53,1.03,5.35,1.6,11.73,1.16,4.68,2.16,9.11,1.07,5.3)
dat<- data.frame(A,B,C,D,obs)
model<-lm(obs~A*B*C*D,data=dat)
coef(model)
## (Intercept) A B C D A:B
## 4.680625 3.160625 -1.501875 -0.220625 -0.079375 -1.069375
## A:C B:C A:D B:D C:D A:B:C
## -0.298125 0.229375 -0.056875 -0.046875 0.029375 0.344375
## A:B:D A:C:D B:C:D A:B:C:D
## -0.096875 -0.010625 0.094375 0.141875
halfnormal(model)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] A B A:B A:B:C
From the Half-normal plot,we can see that Factors A,B, A:B , and A:B:C are significant at alpha=0.5
summary(model)
##
## Call:
## lm.default(formula = obs ~ A * B * C * D, data = dat)
##
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.68062 NaN NaN NaN
## A 3.16062 NaN NaN NaN
## B -1.50187 NaN NaN NaN
## C -0.22062 NaN NaN NaN
## D -0.07937 NaN NaN NaN
## A:B -1.06938 NaN NaN NaN
## A:C -0.29812 NaN NaN NaN
## B:C 0.22937 NaN NaN NaN
## A:D -0.05687 NaN NaN NaN
## B:D -0.04688 NaN NaN NaN
## C:D 0.02937 NaN NaN NaN
## A:B:C 0.34437 NaN NaN NaN
## A:B:D -0.09688 NaN NaN NaN
## A:C:D -0.01063 NaN NaN NaN
## B:C:D 0.09438 NaN NaN NaN
## A:B:C:D 0.14188 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 15 and 0 DF, p-value: NA
To select our tentative model, we would consider the significant factors from our half-normal plots,which are factors A,B,A:B,A:B:C but we would neglect the interaction effect A:B:C because they do not differ in such a large amount from our model distribution.
So therefore our tentative model can be written as
\(Y_{i,j,k}=4.68062+3.16062\alpha _{i}-1.50187\beta_{j}-1.06938\alpha\beta_{ij}+\epsilon_{ijk}\)
model2<- aov(obs~A+B+C+A*B+A*B*C,data =dat)
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 159.83 159.83 1563.061 1.84e-10 ***
## B 1 36.09 36.09 352.937 6.66e-08 ***
## C 1 0.78 0.78 7.616 0.02468 *
## A:B 1 18.30 18.30 178.933 9.33e-07 ***
## A:C 1 1.42 1.42 13.907 0.00579 **
## B:C 1 0.84 0.84 8.232 0.02085 *
## A:B:C 1 1.90 1.90 18.556 0.00259 **
## Residuals 8 0.82 0.10
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model2)
## hat values (leverages) are all = 0.5
## and there are no factor predictors; no plot no. 5
From the Normal Q-Q plot, we can’t assume Normality in the data since the data points in the plot doesn’t appears to fall on a straight line
Also, from the Residuals vs fitted plot, we can’t assume constant variance since all the residuals have varying spread, depleting that the variances are not equal.
Using a Log transformation on our data below
logobs <- log(obs)
dat2 <- data.frame(A,B,C,D,logobs)
model3<- lm(logobs~A*B*C*D,data = dat2)
coef(model3)
## (Intercept) A B C D A:B
## 1.185417116 0.812870345 -0.314277554 -0.006408558 -0.018077390 -0.024684570
## A:C B:C A:D B:D C:D A:B:C
## -0.039723700 -0.004225796 -0.009578245 0.003708723 0.017780432 0.063434408
## A:B:D A:C:D B:C:D A:B:C:D
## -0.029875960 -0.003740235 0.003765760 0.031322043
halfnormal(model3)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] A B A:B:C
summary(model3)
##
## Call:
## lm.default(formula = logobs ~ A * B * C * D, data = dat2)
##
## Residuals:
## ALL 16 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.185417 NaN NaN NaN
## A 0.812870 NaN NaN NaN
## B -0.314278 NaN NaN NaN
## C -0.006409 NaN NaN NaN
## D -0.018077 NaN NaN NaN
## A:B -0.024685 NaN NaN NaN
## A:C -0.039724 NaN NaN NaN
## B:C -0.004226 NaN NaN NaN
## A:D -0.009578 NaN NaN NaN
## B:D 0.003709 NaN NaN NaN
## C:D 0.017780 NaN NaN NaN
## A:B:C 0.063434 NaN NaN NaN
## A:B:D -0.029876 NaN NaN NaN
## A:C:D -0.003740 NaN NaN NaN
## B:C:D 0.003766 NaN NaN NaN
## A:B:C:D 0.031322 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 15 and 0 DF, p-value: NA
model4<-aov(logobs~A+B+C+A*B*C,data=dat2)
summary(model4)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 10.572 10.572 1994.556 6.98e-11 ***
## B 1 1.580 1.580 298.147 1.29e-07 ***
## C 1 0.001 0.001 0.124 0.73386
## A:B 1 0.010 0.010 1.839 0.21207
## A:C 1 0.025 0.025 4.763 0.06063 .
## B:C 1 0.000 0.000 0.054 0.82223
## A:B:C 1 0.064 0.064 12.147 0.00826 **
## Residuals 8 0.042 0.005
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model4)
After the Log transformation, we can observe that from the half-normal plots that only factors A, B, and A:B:C are significant and interactions of A:B seems insignificant as compared to pre-log transformation.
The confirmation of the half normal plot was double checked using ANOVA analysis that showed factors A,B, and A:B:C to be significant as shown above.
Based on the residuals plots, after the log transformation was performed we observed that
From the Normal Q-Q plot, we can assume Normality in the data since the data points in the plot does appears to fall on a straight line
Also, from the Residuals vs fitted plot, we can assume constant variance since all the residuals have same spread, depleting that the variances are equal.
Fitting a model of the coded variables we have
\(Y_{i,j,k,l}=1.185417+0.812870\alpha _{i}-0.314278\beta_{j}+\epsilon_{ijkl}\)
Getting in the data we have
library(DoE.base)
A<-c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B<-c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
C<-c(-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1,-1,-1,-1,-1,1,1,1,1)
D<-c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
E<- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
obs<- c(8.11,5.56,5.77,5.82,9.17,7.8,3.23,5.69,8.82,14.23,9.2,8.94,8.68,11.49,6.25,9.12,7.93,5,7.47,12,9.86,3.65,6.4,11.61,12.43,17.55,8.87,25.38,13.06,18.85,11.78,26.05)
dat<- data.frame(A,B,C,D,E,obs)
model<-lm(obs~A*B*C*D*E,data=dat)
coef(model)
## (Intercept) A B C D E
## 10.1803125 1.6159375 0.0434375 -0.0121875 2.9884375 2.1878125
## A:B A:C B:C A:D B:D C:D
## 1.2365625 -0.0015625 -0.1953125 1.6665625 -0.0134375 0.0034375
## A:E B:E C:E D:E A:B:C A:B:D
## 1.0271875 1.2834375 0.3015625 1.3896875 0.2503125 -0.3453125
## A:C:D B:C:D A:B:E A:C:E B:C:E A:D:E
## -0.0634375 0.3053125 1.1853125 -0.2590625 0.1709375 0.9015625
## B:D:E C:D:E A:B:C:D A:B:C:E A:B:D:E A:C:D:E
## -0.0396875 0.3959375 -0.0740625 -0.1846875 0.4071875 0.1278125
## B:C:D:E A:B:C:D:E
## -0.0746875 -0.3553125
halfnormal(model)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] D E A:D A D:E B:E A:B A:B:E A:E A:D:E
summary(model)
##
## Call:
## lm.default(formula = obs ~ A * B * C * D * E, data = dat)
##
## Residuals:
## ALL 32 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.180312 NaN NaN NaN
## A 1.615938 NaN NaN NaN
## B 0.043438 NaN NaN NaN
## C -0.012187 NaN NaN NaN
## D 2.988437 NaN NaN NaN
## E 2.187813 NaN NaN NaN
## A:B 1.236562 NaN NaN NaN
## A:C -0.001563 NaN NaN NaN
## B:C -0.195313 NaN NaN NaN
## A:D 1.666563 NaN NaN NaN
## B:D -0.013438 NaN NaN NaN
## C:D 0.003437 NaN NaN NaN
## A:E 1.027188 NaN NaN NaN
## B:E 1.283437 NaN NaN NaN
## C:E 0.301563 NaN NaN NaN
## D:E 1.389687 NaN NaN NaN
## A:B:C 0.250313 NaN NaN NaN
## A:B:D -0.345312 NaN NaN NaN
## A:C:D -0.063437 NaN NaN NaN
## B:C:D 0.305312 NaN NaN NaN
## A:B:E 1.185313 NaN NaN NaN
## A:C:E -0.259062 NaN NaN NaN
## B:C:E 0.170938 NaN NaN NaN
## A:D:E 0.901563 NaN NaN NaN
## B:D:E -0.039687 NaN NaN NaN
## C:D:E 0.395938 NaN NaN NaN
## A:B:C:D -0.074063 NaN NaN NaN
## A:B:C:E -0.184688 NaN NaN NaN
## A:B:D:E 0.407187 NaN NaN NaN
## A:C:D:E 0.127812 NaN NaN NaN
## B:C:D:E -0.074688 NaN NaN NaN
## A:B:C:D:E -0.355312 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 31 and 0 DF, p-value: NA
summary(model)
##
## Call:
## lm.default(formula = obs ~ A * B * C * D * E, data = dat)
##
## Residuals:
## ALL 32 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.180312 NaN NaN NaN
## A 1.615938 NaN NaN NaN
## B 0.043438 NaN NaN NaN
## C -0.012187 NaN NaN NaN
## D 2.988437 NaN NaN NaN
## E 2.187813 NaN NaN NaN
## A:B 1.236562 NaN NaN NaN
## A:C -0.001563 NaN NaN NaN
## B:C -0.195313 NaN NaN NaN
## A:D 1.666563 NaN NaN NaN
## B:D -0.013438 NaN NaN NaN
## C:D 0.003437 NaN NaN NaN
## A:E 1.027188 NaN NaN NaN
## B:E 1.283437 NaN NaN NaN
## C:E 0.301563 NaN NaN NaN
## D:E 1.389687 NaN NaN NaN
## A:B:C 0.250313 NaN NaN NaN
## A:B:D -0.345312 NaN NaN NaN
## A:C:D -0.063437 NaN NaN NaN
## B:C:D 0.305312 NaN NaN NaN
## A:B:E 1.185313 NaN NaN NaN
## A:C:E -0.259062 NaN NaN NaN
## B:C:E 0.170938 NaN NaN NaN
## A:D:E 0.901563 NaN NaN NaN
## B:D:E -0.039687 NaN NaN NaN
## C:D:E 0.395938 NaN NaN NaN
## A:B:C:D -0.074063 NaN NaN NaN
## A:B:C:E -0.184688 NaN NaN NaN
## A:B:D:E 0.407187 NaN NaN NaN
## A:C:D:E 0.127812 NaN NaN NaN
## B:C:D:E -0.074688 NaN NaN NaN
## A:B:C:D:E -0.355312 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 31 and 0 DF, p-value: NA
model2<- aov(obs~A+B+D+E+A*B+A*D+A*E+B*E+D*E+A*B*E+A*D*E,data=dat)
summary(model2)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 83.56 83.56 51.362 6.10e-07 ***
## B 1 0.06 0.06 0.037 0.849178
## D 1 285.78 285.78 175.664 2.30e-11 ***
## E 1 153.17 153.17 94.149 5.24e-09 ***
## A:B 1 48.93 48.93 30.076 2.28e-05 ***
## A:D 1 88.88 88.88 54.631 3.87e-07 ***
## A:E 1 33.76 33.76 20.754 0.000192 ***
## B:E 1 52.71 52.71 32.400 1.43e-05 ***
## D:E 1 61.80 61.80 37.986 5.07e-06 ***
## A:B:E 1 44.96 44.96 27.635 3.82e-05 ***
## A:D:E 1 26.01 26.01 15.988 0.000706 ***
## Residuals 20 32.54 1.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We can observe that from the half-normal plots that A,D,E,A:D,D:E,B:E,A:B,A:E,A:B:E,A:D:E are significant
The confirmation of the half normal plot was double checked using ANOVA analysis that showed that the above factors mentioned were significant
plot(model2)
## hat values (leverages) are all = 0.375
## and there are no factor predictors; no plot no. 5
From the Normal Q-Q plot, we can’t assume Normality in the data since the data points in the plot doesn’t appears to fall on a straight line
Also, from the Residuals vs fitted plot, we can’t assume constant variance since all the residuals have varying spread, depleting that the variances are not equal.
A<- c(-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1,-1,1)
B<- c(-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1,-1,-1,1,1)
D<- c(-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1)
E<- c(-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1)
obs<- c(8.11,5.56,5.77,5.82,9.17,7.8,3.23,5.69,8.82,14.23,9.2,8.94,8.68,11.49,6.25,9.12,7.93,5,7.47,12,9.86,3.65,6.4,11.61,12.43,17.55,8.87,25.38,13.06,18.85,11.78,26.05)
dat<- data.frame(A,B,D,E,obs)
model4<- lm(obs~A*B*D*E,data =dat)
coef(model4)
## (Intercept) A B D E A:B
## 10.1803125 1.6159375 0.0434375 2.9884375 2.1878125 1.2365625
## A:D B:D A:E B:E D:E A:B:D
## 1.6665625 -0.0134375 1.0271875 1.2834375 1.3896875 -0.3453125
## A:B:E A:D:E B:D:E A:B:D:E
## 1.1853125 0.9015625 -0.0396875 0.4071875
halfnormal(model4)
##
## Significant effects (alpha=0.05, Lenth method):
## [1] D E A:D A D:E B:E A:B A:B:E A:E A:D:E e10
summary(model4)
##
## Call:
## lm.default(formula = obs ~ A * B * D * E, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4750 -0.5637 0.0000 0.5637 1.4750
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 10.18031 0.21360 47.661 < 2e-16 ***
## A 1.61594 0.21360 7.565 1.14e-06 ***
## B 0.04344 0.21360 0.203 0.841418
## D 2.98844 0.21360 13.991 2.16e-10 ***
## E 2.18781 0.21360 10.243 1.97e-08 ***
## A:B 1.23656 0.21360 5.789 2.77e-05 ***
## A:D 1.66656 0.21360 7.802 7.66e-07 ***
## B:D -0.01344 0.21360 -0.063 0.950618
## A:E 1.02719 0.21360 4.809 0.000193 ***
## B:E 1.28344 0.21360 6.009 1.82e-05 ***
## D:E 1.38969 0.21360 6.506 7.24e-06 ***
## A:B:D -0.34531 0.21360 -1.617 0.125501
## A:B:E 1.18531 0.21360 5.549 4.40e-05 ***
## A:D:E 0.90156 0.21360 4.221 0.000650 ***
## B:D:E -0.03969 0.21360 -0.186 0.854935
## A:B:D:E 0.40719 0.21360 1.906 0.074735 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.208 on 16 degrees of freedom
## Multiple R-squared: 0.9744, Adjusted R-squared: 0.9504
## F-statistic: 40.58 on 15 and 16 DF, p-value: 7.07e-10
model5<- aov(obs~A+B+D+E+A*B+A*D+A*E+B*E+D*E+A*B*E+A*D*E,data=dat)
summary(model5)
## Df Sum Sq Mean Sq F value Pr(>F)
## A 1 83.56 83.56 51.362 6.10e-07 ***
## B 1 0.06 0.06 0.037 0.849178
## D 1 285.78 285.78 175.664 2.30e-11 ***
## E 1 153.17 153.17 94.149 5.24e-09 ***
## A:B 1 48.93 48.93 30.076 2.28e-05 ***
## A:D 1 88.88 88.88 54.631 3.87e-07 ***
## A:E 1 33.76 33.76 20.754 0.000192 ***
## B:E 1 52.71 52.71 32.400 1.43e-05 ***
## D:E 1 61.80 61.80 37.986 5.07e-06 ***
## A:B:E 1 44.96 44.96 27.635 3.82e-05 ***
## A:D:E 1 26.01 26.01 15.988 0.000706 ***
## Residuals 20 32.54 1.63
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(model5)
## hat values (leverages) are all = 0.375
## and there are no factor predictors; no plot no. 5
Factor C was dropped totally from our model initially because it seemed insignificant. After dropping factor C we realized that from the half-normal plots that A,D,E,A:D,D:E,B:E,A:B,A:E,A:B:E,A:D:E are significant
The confirmation of the half normal plot was double checked using ANOVA analysis that showed that the above factors mentioned were significant
Which was similar to the results gotten when factor C was included.
Fitting our model in a linear equation we can see that
\(Y_{i,j,k,l}=10.18031+1.61594\alpha _{i}+0.04344\beta_{j}+2.98844\gamma_{k}+2.18781\delta_{l}+ \epsilon_{ijkl}\)
To maximize the predicted response, the above linear model should be follows
Note
Because all the factors are of positive coefficient, therefore they should be at a +1 level to produce maximum response.