library(tidyverse)
library(kableExtra)
library(ggplot2)
library(plotly)
library(plot3D)
Design of experiment (DoE) is a method of planning and conducting experiments in a systematic and efficient way. It allows you to explore the effects of multiple factors on a response variable, and to find the optimal settings for the factors. DoE can help you to improve the quality and performance of a product or a process, and to reduce the costs and time of experimentation. DoE can be applied to various fields, such as engineering, food science, business, and education. On this page there are some examples on how to set some of the most popular DoE models.
We have \(3^2\) experimental conditions for two factors \(x_1\) and \(x_2\)
factors = 2
levels = c(-1,0,1)
x1 = c()
x2 = c()
for (j in 1:length(levels)) {
for (i in 1:length(levels)) {
x1 = c(x1, levels[i])
x2 = c(x2, levels[j])
}
}
twodf = data.frame(row.names = c(1:length(x1)), x1,x2)
kable(twodf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = F) %>%
column_spec(2,color= 'white',
background = spec_color(twodf$x1,end = 0.7, option = 'A' )) %>%
column_spec(3,color= 'white',
background = spec_color(twodf$x2 ,end = 0.7, option = 'A' ))
| x1 | x2 | |
|---|---|---|
| 1 | -1 | -1 |
| 2 | 0 | -1 |
| 3 | 1 | -1 |
| 4 | -1 | 0 |
| 5 | 0 | 0 |
| 6 | 1 | 0 |
| 7 | -1 | 1 |
| 8 | 0 | 1 |
| 9 | 1 | 1 |
ggplot(twodf, aes(x1,x2))+geom_point(color = 'cornflowerblue')+
theme_bw()+
geom_text(label = rownames(twodf), nudge_x = 0.05)+
theme(aspect.ratio = 1)
To calculate the effects you can use the following postulated model:
\[ Y = b_0+b_1x_1+b_2x_2+b_{11}x_1^2+b_{22}x_2^2+b_{12}x_1x_2 \]
factors = 3
levels = c(-1,0,1)
x1 = c()
x2 = c()
x3 = c()
for (k in 1:length(levels))
for (j in 1:length(levels)) {
for (i in 1:length(levels)) {
x1 = c(x1, levels[i])
x2 = c(x2, levels[j])
x3 = c(x3, levels[k])
}
}
threedf = data.frame(row.names = c(1:length(x1)), x1,x2, x3)
kable(threedf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = F) %>%
column_spec(2,color= 'white',
background = spec_color(threedf$x1,end = 0.7, option = 'A' )) %>%
column_spec(3,color= 'white',
background = spec_color(threedf$x2 ,end = 0.7, option = 'A' )) %>%
column_spec(4,color= 'white',
background = spec_color(threedf$x3 ,end = 0.7, option = 'A' )) %>%
scroll_box(threedf, height = '400px')
| x1 | x2 | x3 | |
|---|---|---|---|
| 1 | -1 | -1 | -1 |
| 2 | 0 | -1 | -1 |
| 3 | 1 | -1 | -1 |
| 4 | -1 | 0 | -1 |
| 5 | 0 | 0 | -1 |
| 6 | 1 | 0 | -1 |
| 7 | -1 | 1 | -1 |
| 8 | 0 | 1 | -1 |
| 9 | 1 | 1 | -1 |
| 10 | -1 | -1 | 0 |
| 11 | 0 | -1 | 0 |
| 12 | 1 | -1 | 0 |
| 13 | -1 | 0 | 0 |
| 14 | 0 | 0 | 0 |
| 15 | 1 | 0 | 0 |
| 16 | -1 | 1 | 0 |
| 17 | 0 | 1 | 0 |
| 18 | 1 | 1 | 0 |
| 19 | -1 | -1 | 1 |
| 20 | 0 | -1 | 1 |
| 21 | 1 | -1 | 1 |
| 22 | -1 | 0 | 1 |
| 23 | 0 | 0 | 1 |
| 24 | 1 | 0 | 1 |
| 25 | -1 | 1 | 1 |
| 26 | 0 | 1 | 1 |
| 27 | 1 | 1 | 1 |
plot_ly(threedf, x = ~x1, y = ~x2, z = ~x3) %>%
add_markers(size= 0.5)
The postulated model will be:
\[
Y =
b_0+b_1x_1+b_2x_2+b_3x_3+b_{11}x_1^2+b_{22}x_2^2+b_{33}x_3^2+b_{12}x_1x_2+b_{13}x_1x_3+b_{23}x_2x_3+b_{123}x_1x_2x_3
\]
To decrease the number of experiments with respect to a \(3^n\) full FD, a Central Composite Design (CCD) can be adopted, which consists in three parts:
The total number of design points needed \((N)\) is determined by the formula \(N = 2^k + 2k + C_0\), where \(k\) is the number of variables (factors)
and \(C_0\) is the number of center
points.
The distance from the center of the points of the star design, \(\alpha\), can be for \(f\) factors:
So, for example, for \(2\) factors
\(\alpha\) will be \(2^{2/4} = 2^{1/2} = \sqrt{2}\)
and the postulted model:
\[
Y = b_0+b_1x_1+b_2x_2+b_{11}x_1^2+b_{22}x_2^2+b_{12}x_1x_2
\]
Suppose we want to find the optimal yield of a generic reaction to
the change of time and temperature
We can collect the data following the central composite design then we apply the postulated model.
factor = 2
leveltime = c(81.5,88.5)
leveltemp = c(171.5,178.5)
# initialize two vectors
time = c()
temp = c()
# 2^2 factorial design
for (j in 1:length(leveltime)) {
for (i in 1:length(leveltemp)) {
time = c(time, leveltime[i])
temp = c(temp, leveltemp[j])
}
}
# star design
time = c(time, 80,90,85,85)
temp = c(temp,175,175,170,180)
# adding 5 replicas of point 0
time = c(time,85,85,85,85,85)
temp = c(temp,175,175,175,175,175)
# experimental response
Y = c(76,78,77,79.5,75.6,78.4,77,78.5,79.9,80.3,80,79.7,79.8)
# printing dataframe
ccddf = data.frame(row.names = c(1:length(time)),time,temp, Y)
kable(ccddf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = F) %>%
column_spec(2,color= 'white',
background = spec_color(ccddf$time,end = 0.7, option = 'A' )) %>%
column_spec(3,color= 'white',
background = spec_color(ccddf$temp ,end = 0.7, option = 'A' )) %>%
column_spec(4,color= 'black', bold = T)
| time | temp | Y | |
|---|---|---|---|
| 1 | 81.5 | 171.5 | 76.0 |
| 2 | 88.5 | 171.5 | 78.0 |
| 3 | 81.5 | 178.5 | 77.0 |
| 4 | 88.5 | 178.5 | 79.5 |
| 5 | 80.0 | 175.0 | 75.6 |
| 6 | 90.0 | 175.0 | 78.4 |
| 7 | 85.0 | 170.0 | 77.0 |
| 8 | 85.0 | 180.0 | 78.5 |
| 9 | 85.0 | 175.0 | 79.9 |
| 10 | 85.0 | 175.0 | 80.3 |
| 11 | 85.0 | 175.0 | 80.0 |
| 12 | 85.0 | 175.0 | 79.7 |
| 13 | 85.0 | 175.0 | 79.8 |
factor = 2
levels = c(-1,1)
# initialize two vectors
x1= c()
x2= c()
# 2^2 factorial design
for (j in 1:length(levels)) {
for (i in 1:length(levels)) {
x1 = c(x1, levels[i])
x2 = c(x2, levels[j])
}
}
# star design
x1 = c(x1, -sqrt(factor), sqrt(factor), 0,0) %>% round(digits = 2)
x2 = c(x2,0,0, -sqrt(factor), sqrt(factor)) %>% round(digits = 2)
# adding 5 replicas of point 0
x1 = c(x1,numeric(5))
x2 = c(x2,numeric(5))
# printing dataframe
ccddf = data.frame(row.names = c(1:length(x1)), x1,x2, Y)
kable(ccddf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = F) %>%
column_spec(2,color= 'white',
background = spec_color(ccddf$x1,end = 0.7, option = 'A' )) %>%
column_spec(3,color= 'white',
background = spec_color(ccddf$x2 ,end = 0.7, option = 'A' ))
| x1 | x2 | Y | |
|---|---|---|---|
| 1 | -1.00 | -1.00 | 76.0 |
| 2 | 1.00 | -1.00 | 78.0 |
| 3 | -1.00 | 1.00 | 77.0 |
| 4 | 1.00 | 1.00 | 79.5 |
| 5 | -1.41 | 0.00 | 75.6 |
| 6 | 1.41 | 0.00 | 78.4 |
| 7 | 0.00 | -1.41 | 77.0 |
| 8 | 0.00 | 1.41 | 78.5 |
| 9 | 0.00 | 0.00 | 79.9 |
| 10 | 0.00 | 0.00 | 80.3 |
| 11 | 0.00 | 0.00 | 80.0 |
| 12 | 0.00 | 0.00 | 79.7 |
| 13 | 0.00 | 0.00 | 79.8 |
ccddf1 = ccddf[-3]
names(ccddf1)[1] = '$x_1$'
names(ccddf1)[2] = '$x_2$'
ccddf1['$x_1^2$'] = x1^2 %>% round(digits = 2)
ccddf1['$x_2^2$'] = x2^2 %>% round(digits = 2)
ccddf1['$x_1x_2$'] = x1*x2
ccddf1['$Y$'] = Y
kable(ccddf1, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = T) %>%
column_spec(2,color= 'white',
background = spec_color(ccddf1$`$x_1$`,end = 0.7, option = 'A' )) %>%
column_spec(3,color= 'white',
background = spec_color(ccddf1$`$x_2$` ,end = 0.7, option = 'A' )) %>%
column_spec(4,color= 'white',
background = spec_color(ccddf1$`$x_1^2$` ,end = 0.7, option = 'A' )) %>%
column_spec(5,color= 'white',
background = spec_color(ccddf1$`$x_2^2$` ,end = 0.7, option = 'A' )) %>%
column_spec(6,color= 'white',
background = spec_color(ccddf1$`$x_1x_2$` ,end = 0.7, option = 'A' )) %>%
column_spec(7,color= 'black', bold = T)
| \(x_1\) | \(x_2\) | \(x_1^2\) | \(x_2^2\) | \(x_1x_2\) | \(Y\) | |
|---|---|---|---|---|---|---|
| 1 | -1.00 | -1.00 | 1.00 | 1.00 | 1 | 76.0 |
| 2 | 1.00 | -1.00 | 1.00 | 1.00 | -1 | 78.0 |
| 3 | -1.00 | 1.00 | 1.00 | 1.00 | -1 | 77.0 |
| 4 | 1.00 | 1.00 | 1.00 | 1.00 | 1 | 79.5 |
| 5 | -1.41 | 0.00 | 1.99 | 0.00 | 0 | 75.6 |
| 6 | 1.41 | 0.00 | 1.99 | 0.00 | 0 | 78.4 |
| 7 | 0.00 | -1.41 | 0.00 | 1.99 | 0 | 77.0 |
| 8 | 0.00 | 1.41 | 0.00 | 1.99 | 0 | 78.5 |
| 9 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 79.9 |
| 10 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 80.3 |
| 11 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 80.0 |
| 12 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 79.7 |
| 13 | 0.00 | 0.00 | 0.00 | 0.00 | 0 | 79.8 |
Now perform a regression
model = lm(`$Y$` ~ `$x_1$`+ `$x_2$` + `$x_1^2$` + `$x_2^2$`+ `$x_1x_2$`, ccddf1)
model %>% summary()
##
## Call:
## lm(formula = `$Y$` ~ `$x_1$` + `$x_2$` + `$x_1^2$` + `$x_2^2$` +
## `$x_1x_2$`, data = ccddf1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.23947 -0.13947 -0.03804 0.11134 0.36053
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 79.93947 0.10712 746.280 < 2e-16 ***
## `$x_1$` 1.05915 0.08481 12.488 4.86e-06 ***
## `$x_2$` 0.57860 0.08481 6.822 0.000248 ***
## `$x_1^2$` -1.41107 0.09114 -15.482 1.13e-06 ***
## `$x_2^2$` -1.03419 0.09114 -11.347 9.25e-06 ***
## `$x_1x_2$` 0.12500 0.11976 1.044 0.331304
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2395 on 7 degrees of freedom
## Multiple R-squared: 0.987, Adjusted R-squared: 0.9778
## F-statistic: 106.5 on 5 and 7 DF, p-value: 1.902e-06
The significant coefficients of which are:
modelcoeff = coefficients(model)
modelcoeff[-6]
## (Intercept) `$x_1$` `$x_2$` `$x_1^2$` `$x_2^2$`
## 79.9394742 1.0591510 0.5785963 -1.4110716 -1.0341872
x = seq(from = -1.4, to = 1.4, by = 0.1)
y = seq(from = -1.4, to = 1.4, by = 0.1)
z = c()
for (j in 1:length(x)){
for (i in 1:length(y)){
z = c(z, modelcoeff[1]+modelcoeff[2]*x[i]+modelcoeff[3]*y[j]^2+modelcoeff[4]*x[i]^2+modelcoeff[5]*y[j]^2 )
}
}
time = seq(from = min(time), to = max(time), by= 0.1)
temp = seq(from = min(temp), to = max(temp), by= 0.1)
z = matrix(z, nrow = length(x), byrow = T)
plot_ly(x=time, y=temp, z=z) %>% add_surface() %>%
layout(scene = list(
xaxis = list(title = 'Time'),
yaxis = list(title = 'Temperature'),
zaxis = list(title = 'Yield'),
aspectmode = 'cube'))
plot_ly(x=time, y=temp, z=z,contours = list(showlabels = TRUE)) %>% add_contour() %>%
layout( xaxis = list(title = 'Time'),
yaxis = list(title = 'Temperature')) %>%
config(modeBarButtonsToAdd = list('drawline',
'drawopenpath',
'drawclosedpath',
'drawcircle',
'drawrect',
'eraseshape')) %>%
colorbar(title = "Yield")
An alternative and very useful experimental design for second-order
models is the uniform shell design proposed by Doehlert in 1970 (Doehlert 1970). Doehlert designs are easily
applied to optimize variables and offer advantages in relation to
central composite designs. They need fewer experiments, which are more
efficient and can move through the experimental domain.
The Doehlert design describes a spherical experimental domain and it stresses uniformity in space filling. Although this matrix is neither orthogonal nor rotatable, it does not significantly diverge from the required quality for effective use .
For two variables, the Doehlert design consists of one central point
and six points forming a regular hexagon, and therefore situated on a
circle.
The number of experiments required is given by \(N = k^2 + k + C_0\)
A generic Doehlert matrix for two variables will be:
X1 = c(0,1,0.5,-0.5,-1,-0.5,0.5)
X2 = c(0,0,0.866,0.866,0,-0.866,-0.866)
doedf = data.frame(X1, X2, row.names = c(1:7))
doedf %>% kable(row.names = T) %>% kable_styling('striped',fixed_thead = T, full_width = F)
| X1 | X2 | |
|---|---|---|
| 1 | 0.0 | 0.000 |
| 2 | 1.0 | 0.000 |
| 3 | 0.5 | 0.866 |
| 4 | -0.5 | 0.866 |
| 5 | -1.0 | 0.000 |
| 6 | -0.5 | -0.866 |
| 7 | 0.5 | -0.866 |
Fractional Factorial Design (FFD) is based on the assumption that [significance of interaction terms decreases with increasing the order of interaction]
Take for example a \(2^3\) factorial
design (3 factors and 2 levels)
factors = 3
levels = c(-1,1)
x1 = c()
x2 = c()
x3 = c()
for (k in 1:length(levels)) {
for (j in 1:length(levels)) {
for (i in 1:length(levels)) {
x1 = c(x1, levels[i])
x2 = c(x2, levels[j])
x3 = c(x3, levels[k])
}
}
}
twotwodf = data.frame(row.names = c(1:length(x1)), x1,x2, x3)
p = kable(twotwodf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = F)
for (m in 1:ncol(twotwodf)) {
p = column_spec(p, m+1,color= 'white',
background = spec_color(twotwodf[,m],end = 0.7, option = 'A' )
)
}
p
| x1 | x2 | x3 | |
|---|---|---|---|
| 1 | -1 | -1 | -1 |
| 2 | 1 | -1 | -1 |
| 3 | -1 | 1 | -1 |
| 4 | 1 | 1 | -1 |
| 5 | -1 | -1 | 1 |
| 6 | 1 | -1 | 1 |
| 7 | -1 | 1 | 1 |
| 8 | 1 | 1 | 1 |
And expand it according to the postulated model
twotwodf$x1x2 = x1*x2
twotwodf$x1x3 = x1*x3
twotwodf$x2x3 = x2*x3
twotwodf$x1x2x3 = x1*x2*x3
p = kable(twotwodf, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = T)
for (m in 1:ncol(twotwodf)) {
p = column_spec(p, m+1,color= 'white',
background = spec_color(twotwodf[,m],end = 0.7, option = 'A' )
)
}
p
| x1 | x2 | x3 | x1x2 | x1x3 | x2x3 | x1x2x3 | |
|---|---|---|---|---|---|---|---|
| 1 | -1 | -1 | -1 | 1 | 1 | 1 | -1 |
| 2 | 1 | -1 | -1 | -1 | -1 | 1 | 1 |
| 3 | -1 | 1 | -1 | -1 | 1 | -1 | 1 |
| 4 | 1 | 1 | -1 | 1 | -1 | -1 | -1 |
| 5 | -1 | -1 | 1 | 1 | -1 | -1 | 1 |
| 6 | 1 | -1 | 1 | -1 | 1 | -1 | -1 |
| 7 | -1 | 1 | 1 | -1 | -1 | 1 | -1 |
| 8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Now eliminate all the rows of the column x1x2x3 (maximum
order of interaction) with the value -1
twotwodf2 = twotwodf[!(twotwodf$x1x2x3 == -1),]
p = kable(twotwodf2, row.names = T) %>%
kable_styling('striped',fixed_thead = T, full_width = T)
for (m in 1:ncol(twotwodf2)) {
p = column_spec(p, m+1,color= 'white',
background = spec_color(twotwodf2[,m],end = 0.7, option = 'A' )
)
}
p
| x1 | x2 | x3 | x1x2 | x1x3 | x2x3 | x1x2x3 | |
|---|---|---|---|---|---|---|---|
| 2 | 1 | -1 | -1 | -1 | -1 | 1 | 1 |
| 3 | -1 | 1 | -1 | -1 | 1 | -1 | 1 |
| 5 | -1 | -1 | 1 | 1 | -1 | -1 | 1 |
| 8 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
Plackett–Burman designs are experimental designs presented in 1946 by
Robin L. Plackett and J. P. Burman while working in the British Ministry
of Supply. (PLACKETT and BURMAN
1946)
When the number of factors is fairly large, the constraint that the
number of experiments must equal a power of 2 can be rather restrictive.
For example, a 10 factors and 2 levels
factorial design it would require \(2^{10}\) experiments.
With Plackett-Burman design the number of experiments required is the
first multiple of 4 greater than the number of factors. In this case
therefore you would have only 12 experiments!
The number of total factors will be \(N-1\), with \(N\) the number of experiments.
The additional factors are considered as dummy variables, useful to
estimate the significance of the real factors.
In numerical values:
# ----- First row from literature ------------
r = c(1, 1, -1, 1, 1, 1, -1, -1, -1, 1, -1)
plackdf = data.frame(t(r))
for (i in 0:9) {
plackdf[2+i,] = c(r[(11-i):11],r[-c((11-i):11)])
}
plackdf[12,] = rep(-1,11)
fig = plackdf %>% kable(row.names = T) %>% kable_styling('striped',fixed_thead = T, full_width = F)
for (m in 1:ncol(plackdf)) {
fig = column_spec(fig, m+1,color= 'white',
background = spec_color(plackdf[,m],begin = 0.15, end = 0.6, option = 'B' )
)
}
fig
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 |
| 2 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 |
| 3 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 |
| 4 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 |
| 5 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 |
| 6 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 |
| 7 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 |
| 8 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 |
| 9 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 |
| 10 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 |
| 11 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 |
| 12 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 |
Suppose you have measured an experimental response for all the experiments
y = c(51, 31, 44, 23, 80, 45, 31, 55, 23, 26, 29, 28)
plackdf2 = plackdf
plackdf2 = cbind(plackdf2, 'Y' = y)
fig = plackdf2 %>% kable(row.names = T) %>% kable_styling('striped',fixed_thead = T, full_width = F)
for (m in 1:ncol(plackdf)) {
fig = column_spec(fig, m+1,color= 'white',
background = spec_color(plackdf[,m],begin = 0.15, end = 0.6, option = 'B' )
)
}
fig %>% column_spec(13, color= 'white', background = 'darkgreen')
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | X11 | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 51 |
| 2 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | 31 |
| 3 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 44 |
| 4 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | 23 |
| 5 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | -1 | 80 |
| 6 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 1 | 45 |
| 7 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 1 | 31 |
| 8 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 1 | 55 |
| 9 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | -1 | 23 |
| 10 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 1 | 26 |
| 11 | 1 | -1 | 1 | 1 | 1 | -1 | -1 | -1 | 1 | -1 | 1 | 29 |
| 12 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | -1 | 28 |
The postulated model will be
\[
Y = b_0 + b_1x_1+b_2x_2+
b_{3}x_{3}+ b_{4}x_{4}+ b_{5}x_{5}+ b_{6}x_{6}+ b_{7}x_{7}+ b_{8}x_{8}+ b_{9}x_{9}+ b_{10}x_{10}+b_{11}x_{11}
\]
Now we’re able to perform a regression
colnames(plackdf2)[11] = 'Dummy'
model = lm(Y ~ X1+X2+X3+X4+X5+X6+X7+X8+X9+X10+Dummy, plackdf2)
model %>% summary()
##
## Call:
## lm(formula = Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7 + X8 + X9 +
## X10 + Dummy, data = plackdf2)
##
## Residuals:
## ALL 12 residuals are 0: no residual degrees of freedom!
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.883e+01 NaN NaN NaN
## X1 2.324e-15 NaN NaN NaN
## X2 -4.000e+00 NaN NaN NaN
## X3 -8.547e-16 NaN NaN NaN
## X4 -2.500e+00 NaN NaN NaN
## X5 2.000e+00 NaN NaN NaN
## X6 1.217e+01 NaN NaN NaN
## X7 -6.000e+00 NaN NaN NaN
## X8 4.333e+00 NaN NaN NaN
## X9 3.667e+00 NaN NaN NaN
## X10 3.833e+00 NaN NaN NaN
## Dummy -2.667e+00 NaN NaN NaN
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 11 and 0 DF, p-value: NA
Since there are no residual degrees of freedom we cannot estimate the
significance of the coefficients as usual
We need another approach to understand which of them are actually
significant.
df = coefficients(model)[-1] %>% data.frame()
colnames(df) = 'coeff'
df$X = rownames(df)
df$X = factor(df$X, levels = df$X)
ggplot(df, aes(x= X, coeff, fill =ifelse(X == 'Dummy', 'Dummy', 'normal')))+
geom_col(show.legend = F)+
scale_fill_manual( values = c( "Dummy"="darkred", "normal"="cornflowerblue" ))+
theme_bw()+
xlab('')+
ylab('')+
geom_hline(yintercept = coefficients(model)[12], linetype = 'longdash', color = 'red')+
geom_hline(yintercept = -coefficients(model)[12], linetype = 'longdash', color = 'red')+
geom_hline(yintercept = 0,alpha = 0.5)
Coefficients of similar size to the Dummy are no particularly
significant!
In this case only factors X6 and X7 are
relevant and maybe used later for further experimental design.