Matis 13, 2^k Factorial Designs

2^{k designs are extremely efficient and can lead directly to understanding effects. A 2k design includes k factors each of which has 2 levels, high and low. Of course following a 2k design one can create another 2k design with different values and focus in on better and better values. This is essentially what is done in Response Surface Methodology (RSM). In this lecture we talk about simple 2^k designs and a little about how to deal with experiments when no replicates are available. At the end of this lecture you should be comfortable designing an experiment. So, what is this thing?}

The factors can be quantitative or qualitative, as long as we identify one as low and one as high. We’ll see why this is so effective. It’s because it fully utilizes the sum of squares analysis. Recall that the sum of squares total can be broken up into the explainable bit, due to the treatment, and the unexplainable bit, which can be viewed as noise. The other thing is that with 2^{k designs the experiment can be broken up into contrasts from each factor and their interactions individually. We’ll start by looking at a 2^3 design:}

Each of the populations we’ll be looking at is characterized by the factors assuming either a high or low level, so we have 8 populations to consider for main effects.

Very soon we’ll be looking at these populations with Anova and testing to see if any of them is different from the rest. The next thing to do is write the model equation.

We also need the interactions, A and B, A and C, B and C, and a possible three way interaction ABC, and finally the error:

We might have multiple replicates so there might need to be a fourth index, l say, to keep track of that. The beauty of the 2^k design is that we can break up a complex interaction in a very straightforward way. We’ll be using Anova to understand significances, so we have to assume the error is normal zero sigma squared, which means the observable is nzss. This is an important assumption. If this isn’t true life becomes more complicated and we have to use other tools. What happens next is to build the anova table and look at the sums of squares.

In 2 and 3 dimensions this is easily visualized. Higher dimensions are not accessible.

Each corner point denotes one of the 8 populations we’re interested in, based on the high and low values of each of the 3 factors in the 2^3 design. Building the anova table we have in general

The beauty of the 2^k design is that we can be very specific about isolating all of the effects, main and interaction. This results in the 2^k contrasts with a single degree of freedom for each. Here’s a better look at the cube:

This shows the contrasts: Compare the means when A is high to the means when A is low, repeat for all the main effects, and for the interactions.

We’re going to create the table of values for a 2^3 design and use Yate’s method to calculate the sum of squares and effects. Then we can determine which effects are significant.

So far this table looks like this:

We get the signs for the interactions by multiplying the signs of the main effects involved in the interactions:

We add a final column with is the sum of the observations for however many replicates we have:

Now, the sum of squares for any given factor, main or interaction, is found by Yates method. We take the sum of all the replicates for every measurement where that factor is high and subtract the sum of all the replicates where that factor is low. This is the contrast. Similarly we can find the effect from the contrasts. Below is the SS for factor A.

The coefficients in the model equation are the effects, calculated in a similar way. For the 2^3 design with 3 replicates, 2**3*n is 24.

Now we use the SS’s to fill out the next 2 columns in the anova table. Since each effect has a single degree of freedom, they’re identical:

Now we can look at the totals and generate f numbers, then examine the t distribution and get p values:

We look up the f or t distributions and get p values. The machine does all this for you, but this is exactly what it’s doing. Those p values represent the 7 different hypothesis tests H0: the effect of A is nothing versus H1: the effect of A is significant. Ok, now an example in R:

This will be a 2^3 design with electrode gap, gas flow, and RF power as the 3 factors, each with a high and low value. The response is etch rate for SiN.

Here are the results of the experiment, in a table and on the cube:

Here’s the table of the sums of squares:

Just off the table, looks like gap and RF power have the largest effects and they seem to be interacting also. From a technical standpoint that makes sense, together they determine the power density. Higher RF power and smaller gap both contribute to more energy in the plasma at the wafer. The rest of the table looks like this:

Using p < 0.05 as the reference, A, C and AC are signifcant. If we wanted to pursue this further to optimize the parameters in the etch we could use response surface methodology. We’re going to do this analysis in R but jumping ahead, here’s the interaction plot for A and C.

At small gap, RF power makes a big difference. As the gap increases, the effect of power is not as dramatic. In R,

##### Generate a 2^3 design ####
library(agricolae)

## Warning: package 'agricolae' was built under R version 4.3.3

#?design.ab
design.ab(trt=c(2,2,2),r=2,seed=85933,design="crd")

## $parameters
## $parameters$design
## [1] "factorial"
## 
## $parameters$trt
## [1] "1 1 1" "1 1 2" "1 2 1" "1 2 2" "2 1 1" "2 1 2" "2 2 1" "2 2 2"
## 
## $parameters$r
## [1] 2 2 2 2 2 2 2 2
## 
## $parameters$serie
## [1] 2
## 
## $parameters$seed
## [1] 85933
## 
## $parameters$kinds
## [1] "Super-Duper"
## 
## $parameters[[7]]
## [1] TRUE
## 
## $parameters$applied
## [1] "crd"
## 
## 
## $book
##    plots r A B C
## 1    101 1 1 2 2
## 2    102 1 2 1 2
## 3    103 1 1 1 2
## 4    104 1 2 1 1
## 5    105 1 1 2 1
## 6    106 1 2 2 1
## 7    107 1 2 2 2
## 8    108 1 1 1 1
## 9    109 2 2 2 1
## 10   110 2 1 1 1
## 11   111 2 1 1 2
## 12   112 2 1 2 1
## 13   113 2 2 2 2
## 14   114 2 1 2 2
## 15   115 2 2 1 1
## 16   116 2 2 1 2

#write.csv
#read.csv

##### Replicated 2^3 design ####
design <- expand.grid(
  A=c(-1,1), 
  B=c(-1,1), 
  C=c(-1,1))
design=rbind(design,design)
design

##     A  B  C
## 1  -1 -1 -1
## 2   1 -1 -1
## 3  -1  1 -1
## 4   1  1 -1
## 5  -1 -1  1
## 6   1 -1  1
## 7  -1  1  1
## 8   1  1  1
## 9  -1 -1 -1
## 10  1 -1 -1
## 11 -1  1 -1
## 12  1  1 -1
## 13 -1 -1  1
## 14  1 -1  1
## 15 -1  1  1
## 16  1  1  1

str(design)

## 'data.frame':    16 obs. of  3 variables:
##  $ A: num  -1 1 -1 1 -1 1 -1 1 -1 1 ...
##  $ B: num  -1 -1 1 1 -1 -1 1 1 -1 -1 ...
##  $ C: num  -1 -1 -1 -1 1 1 1 1 -1 -1 ...
##  - attr(*, "out.attrs")=List of 2
##   ..$ dim     : Named int [1:3] 2 2 2
##   .. ..- attr(*, "names")= chr [1:3] "A" "B" "C"
##   ..$ dimnames:List of 3
##   .. ..$ A: chr [1:2] "A=-1" "A= 1"
##   .. ..$ B: chr [1:2] "B=-1" "B= 1"
##   .. ..$ C: chr [1:2] "C=-1" "C= 1"

design$A<-as.factor(design$A)
design$B<-as.factor(design$B)
design$C<-as.factor(design$C)
str(design)

## 'data.frame':    16 obs. of  3 variables:
##  $ A: Factor w/ 2 levels "-1","1": 1 2 1 2 1 2 1 2 1 2 ...
##  $ B: Factor w/ 2 levels "-1","1": 1 1 2 2 1 1 2 2 1 1 ...
##  $ C: Factor w/ 2 levels "-1","1": 1 1 1 1 2 2 2 2 1 1 ...
##  - attr(*, "out.attrs")=List of 2
##   ..$ dim     : Named int [1:3] 2 2 2
##   .. ..- attr(*, "names")= chr [1:3] "A" "B" "C"
##   ..$ dimnames:List of 3
##   .. ..$ A: chr [1:2] "A=-1" "A= 1"
##   .. ..$ B: chr [1:2] "B=-1" "B= 1"
##   .. ..$ C: chr [1:2] "C=-1" "C= 1"

This much (above) generates the design and sets it up with +1 and -1.

Here are the data:

rep1<-c(550,669,633,642,1037,749,1075,729)
rep2<-c(604,650,601,635,1052,868,1063,860)
rep<-c(rep1,rep2)
rep

##  [1]  550  669  633  642 1037  749 1075  729  604  650  601  635 1052  868 1063
## [16]  860

Now create a data frame with the factors and the data:

design$rep<-rep
design

##     A  B  C  rep
## 1  -1 -1 -1  550
## 2   1 -1 -1  669
## 3  -1  1 -1  633
## 4   1  1 -1  642
## 5  -1 -1  1 1037
## 6   1 -1  1  749
## 7  -1  1  1 1075
## 8   1  1  1  729
## 9  -1 -1 -1  604
## 10  1 -1 -1  650
## 11 -1  1 -1  601
## 12  1  1 -1  635
## 13 -1 -1  1 1052
## 14  1 -1  1  868
## 15 -1  1  1 1063
## 16  1  1  1  860

Apply the anova:

model<-aov(rep~A+B+C+A:B+A:C+B:C+A:B:C,data=design)
plot(model)

summary(model)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## A            1  41311   41311  18.339 0.002679 ** 
## B            1    218     218   0.097 0.763911    
## C            1 374850  374850 166.411 1.23e-06 ***
## A:B          1   2475    2475   1.099 0.325168    
## A:C          1  94403   94403  41.909 0.000193 ***
## B:C          1     18      18   0.008 0.930849    
## A:B:C        1    127     127   0.056 0.818586    
## Residuals    8  18020    2253                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What do the interactions look like?’

interaction.plot(design$A,design$C,design$rep)

Here’s an unreplicated 2^5 design:

design <- expand.grid(
  A=c(-1,1), 
  B=c(-1,1), 
  C=c(-1,1),
  D=c(-1,1),
  E=c(-1,1))
design

##     A  B  C  D  E
## 1  -1 -1 -1 -1 -1
## 2   1 -1 -1 -1 -1
## 3  -1  1 -1 -1 -1
## 4   1  1 -1 -1 -1
## 5  -1 -1  1 -1 -1
## 6   1 -1  1 -1 -1
## 7  -1  1  1 -1 -1
## 8   1  1  1 -1 -1
## 9  -1 -1 -1  1 -1
## 10  1 -1 -1  1 -1
## 11 -1  1 -1  1 -1
## 12  1  1 -1  1 -1
## 13 -1 -1  1  1 -1
## 14  1 -1  1  1 -1
## 15 -1  1  1  1 -1
## 16  1  1  1  1 -1
## 17 -1 -1 -1 -1  1
## 18  1 -1 -1 -1  1
## 19 -1  1 -1 -1  1
## 20  1  1 -1 -1  1
## 21 -1 -1  1 -1  1
## 22  1 -1  1 -1  1
## 23 -1  1  1 -1  1
## 24  1  1  1 -1  1
## 25 -1 -1 -1  1  1
## 26  1 -1 -1  1  1
## 27 -1  1 -1  1  1
## 28  1  1 -1  1  1
## 29 -1 -1  1  1  1
## 30  1 -1  1  1  1
## 31 -1  1  1  1  1
## 32  1  1  1  1  1

Now add some data::

design$rep<-c(7,9,34,55,16,20,40,60,8,10,32,50,18,21,44,61,
              8,12,35,52,15,22,45,65,6,10,30,53,15,20,41,63)
design

##     A  B  C  D  E rep
## 1  -1 -1 -1 -1 -1   7
## 2   1 -1 -1 -1 -1   9
## 3  -1  1 -1 -1 -1  34
## 4   1  1 -1 -1 -1  55
## 5  -1 -1  1 -1 -1  16
## 6   1 -1  1 -1 -1  20
## 7  -1  1  1 -1 -1  40
## 8   1  1  1 -1 -1  60
## 9  -1 -1 -1  1 -1   8
## 10  1 -1 -1  1 -1  10
## 11 -1  1 -1  1 -1  32
## 12  1  1 -1  1 -1  50
## 13 -1 -1  1  1 -1  18
## 14  1 -1  1  1 -1  21
## 15 -1  1  1  1 -1  44
## 16  1  1  1  1 -1  61
## 17 -1 -1 -1 -1  1   8
## 18  1 -1 -1 -1  1  12
## 19 -1  1 -1 -1  1  35
## 20  1  1 -1 -1  1  52
## 21 -1 -1  1 -1  1  15
## 22  1 -1  1 -1  1  22
## 23 -1  1  1 -1  1  45
## 24  1  1  1 -1  1  65
## 25 -1 -1 -1  1  1   6
## 26  1 -1 -1  1  1  10
## 27 -1  1 -1  1  1  30
## 28  1  1 -1  1  1  53
## 29 -1 -1  1  1  1  15
## 30  1 -1  1  1  1  20
## 31 -1  1  1  1  1  41
## 32  1  1  1  1  1  63

model<-lm(rep~A*B*C*D*E,data=design)
coef(model)

## (Intercept)           A           B           C           D           E 
##    30.53125     5.90625    16.96875     4.84375    -0.40625     0.21875 
##         A:B         A:C         B:C         A:D         B:D         C:D 
##     3.96875     0.21875     0.03125    -0.03125    -0.34375     0.40625 
##         A:E         B:E         C:E         D:E       A:B:C       A:B:D 
##     0.46875     0.28125     0.15625    -0.59375    -0.21875     0.15625 
##       A:C:D       B:C:D       A:B:E       A:C:E       B:C:E       A:D:E 
##    -0.21875     0.21875    -0.09375     0.15625     0.46875     0.40625 
##       B:D:E       C:D:E     A:B:C:D     A:B:C:E     A:B:D:E     A:C:D:E 
##     0.09375    -0.40625    -0.03125     0.09375     0.46875    -0.15625 
##     B:C:D:E   A:B:C:D:E 
##    -0.46875    -0.09375

Matis 13, 2^k Factorial Designs

Richard Gale

2024-06-27