Design of Experiments Project 3

Fractional Factorial Designs

Felipe Ortiz

RPI

December 10th 2016 V1

1. Setting

For this example we will be using data presented by in “Evaluation of Surfactant Detergency using Statistical Analysis, Texile Research Journal”
Description: Color Change results of fabric of 3 varieties (cotton, silk,and wool) exposed to 4 soils (tea, coffee, wine, and charcoal), washed at 2 temperatures (40C and 60C) 2 surfactants (anionic and non-ionic) in a 4-Way ANOVA

System under test

The data provides the amount of color change based on 4 distinct factors.

Because we are interested in having 2 2-level factors, and 2 3-level factors we will be removing one of the soil level. The data that will be used canwas collected the variables were defined as follows:

  • Soil Type of contaminant (tea, coffee, wine)

  • Fabric Type of fabric (cotton, silk, wool)

  • Temp Temperature of wash (40C and 60C)

  • Surfactants Type of surfactant in detergent (anionic and non-ionic)

Next we look at the stucture of the data.

Data <- read.table("C:/Users/T420ortizf2/Documents/CollegeWork/Fall 2016/DesignOfExperiments/Project3/Data.txt", quote="\"")
names(Data) <- c("Soil", "Fabric", "Temp", "Surfactant", "Result")

str(Data)
## 'data.frame':    36 obs. of  5 variables:
##  $ Soil      : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ Fabric    : int  1 1 1 1 2 2 2 2 3 3 ...
##  $ Temp      : int  1 1 2 2 1 1 2 2 1 1 ...
##  $ Surfactant: int  1 2 1 2 1 2 1 2 1 2 ...
##  $ Result    : num  9.56 6.99 10.81 7.6 6.68 ...

We see that R has incorrectly assigned some data as integers rather than factors, so we will tell it which ones are factors

Data[,1] <- as.factor(Data[,1])

Data[,2] <- as.factor(Data[,2])

Data[,3] <- as.factor(Data[,3])

Data[,4] <- as.factor(Data[,4])

str(Data)
## 'data.frame':    36 obs. of  5 variables:
##  $ Soil      : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Fabric    : Factor w/ 3 levels "1","2","3": 1 1 1 1 2 2 2 2 3 3 ...
##  $ Temp      : Factor w/ 2 levels "1","2": 1 1 2 2 1 1 2 2 1 1 ...
##  $ Surfactant: Factor w/ 2 levels "1","2": 1 2 1 2 1 2 1 2 1 2 ...
##  $ Result    : num  9.56 6.99 10.81 7.6 6.68 ...

Factors and Levels

The Factors present in the data set and their levels.

levels(Data$Soil)
## [1] "1" "2" "3"
levels(Data$Fabric)
## [1] "1" "2" "3"
levels(Data$Temp)
## [1] "1" "2"
levels(Data$Surfactant)
## [1] "1" "2"

Continuous variables

The Continuous variables present are the following

summary(Data$Result)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    4.70    8.66   10.05   12.05   13.62   26.48

Response variables

The response variable that will be used is Result. The color change from the experiment run.

2. Experimental Design

Full Factorial

As can be seen the experiment has been run as a full experimental design, with 2 2-level factors and 2 3-level factors. This results in 36 individual experiment.

Data
##    Soil Fabric Temp Surfactant Result
## 1     1      1    1          1   9.56
## 2     1      1    1          2   6.99
## 3     1      1    2          1  10.81
## 4     1      1    2          2   7.60
## 5     1      2    1          1   6.68
## 6     1      2    1          2   9.30
## 7     1      2    2          1   8.36
## 8     1      2    2          2   8.76
## 9     1      3    1          1   6.16
## 10    1      3    1          2   5.46
## 11    1      3    2          1   6.24
## 12    1      3    2          2   4.70
## 13    2      1    1          1  11.44
## 14    2      1    1          2  12.05
## 15    2      1    2          1  12.34
## 16    2      1    2          2  12.92
## 17    2      2    1          1  10.46
## 18    2      2    1          2  11.96
## 19    2      2    2          1  10.15
## 20    2      2    2          2   9.95
## 21    2      3    1          1   9.22
## 22    2      3    1          2   9.15
## 23    2      3    2          1   9.77
## 24    2      3    2          2   8.90
## 25    3      1    1          1  25.47
## 26    3      1    1          2  24.08
## 27    3      1    2          1  26.48
## 28    3      1    2          2  25.49
## 29    3      2    1          1   9.87
## 30    3      2    1          2  14.58
## 31    3      2    2          1   7.91
## 32    3      2    2          2  13.30
## 33    3      3    1          1  17.96
## 34    3      3    1          2  17.08
## 35    3      3    2          1  16.82
## 36    3      3    2          2  15.88

As we want to convert the 3-level factors into 2-level factors, they must be represented by 2 2-level factors each. This is visualized in the table below.

3 level Factor Dummy1 Dummy2
0 (low) -1 -1
1 (Med) -1 1
1 (Med) 1 -1
2 (High) 1 1

With the two 3-level factors each replaced by two 2-level factors each, the experiment will now have 6 2-level factors. The full factorial experiment is a 26 experiment requiring 64 runs. The 26 or the 2m design is shown below.

library(knitr)

Bin <- c(0,1)
Full <- expand.grid(Bin, Bin, Bin, Bin, Bin, Bin)
names(Full)[1] <- "SoilA"
names(Full)[2] <- "SoilB"
names(Full)[3] <- "FabricA"
names(Full)[4] <- "FabricB"
names(Full)[5] <- "Temp"
names(Full)[6] <- "Surfactant"

kable(Full, caption = "Full Factorial Experiment", row.names = T)
Full Factorial Experiment
SoilA SoilB FabricA FabricB Temp Surfactant
1 0 0 0 0 0 0
2 1 0 0 0 0 0
3 0 1 0 0 0 0
4 1 1 0 0 0 0
5 0 0 1 0 0 0
6 1 0 1 0 0 0
7 0 1 1 0 0 0
8 1 1 1 0 0 0
9 0 0 0 1 0 0
10 1 0 0 1 0 0
11 0 1 0 1 0 0
12 1 1 0 1 0 0
13 0 0 1 1 0 0
14 1 0 1 1 0 0
15 0 1 1 1 0 0
16 1 1 1 1 0 0
17 0 0 0 0 1 0
18 1 0 0 0 1 0
19 0 1 0 0 1 0
20 1 1 0 0 1 0
21 0 0 1 0 1 0
22 1 0 1 0 1 0
23 0 1 1 0 1 0
24 1 1 1 0 1 0
25 0 0 0 1 1 0
26 1 0 0 1 1 0
27 0 1 0 1 1 0
28 1 1 0 1 1 0
29 0 0 1 1 1 0
30 1 0 1 1 1 0
31 0 1 1 1 1 0
32 1 1 1 1 1 0
33 0 0 0 0 0 1
34 1 0 0 0 0 1
35 0 1 0 0 0 1
36 1 1 0 0 0 1
37 0 0 1 0 0 1
38 1 0 1 0 0 1
39 0 1 1 0 0 1
40 1 1 1 0 0 1
41 0 0 0 1 0 1
42 1 0 0 1 0 1
43 0 1 0 1 0 1
44 1 1 0 1 0 1
45 0 0 1 1 0 1
46 1 0 1 1 0 1
47 0 1 1 1 0 1
48 1 1 1 1 0 1
49 0 0 0 0 1 1
50 1 0 0 0 1 1
51 0 1 0 0 1 1
52 1 1 0 0 1 1
53 0 0 1 0 1 1
54 1 0 1 0 1 1
55 0 1 1 0 1 1
56 1 1 1 0 1 1
57 0 0 0 1 1 1
58 1 0 0 1 1 1
59 0 1 0 1 1 1
60 1 1 0 1 1 1
61 0 0 1 1 1 1
62 1 0 1 1 1 1
63 0 1 1 1 1 1
64 1 1 1 1 1 1

If we were going to conduct the experiments and collect the data, we would want to randomize the order in which these 64 experiments are done.

2m-3 Factorial Design

If a full factorial experiment is not possible, there are ways to reduce the number of runs, while still obtaining useable data. In this case we want the 2m-3 design with the highest resolution. The new experiment will have 6 factors, but only 23, or 8 runs. The highest resolution that can be obtained with this experiment is III.

A III resolution desing means that no main effects are aliased with any other main effect. The main effects are aliased however, with 2-factor interactions.

library(FrF2)
Partial <- FrF2(8,6)
names(Partial)[1] <- "SoilA"
names(Partial)[2] <- "SoilB"
names(Partial)[3] <- "FabricA"
names(Partial)[4] <- "FabricB"
names(Partial)[5] <- "Temp"
names(Partial)[6] <- "Surfactant"

summary(Partial)
## Call:
## FrF2(8, 6)
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##    A  B  C  D  E  F
## 1 -1 -1 -1 -1 -1 -1
## 2  1  1  1  1  1  1
## 
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   SoilA SoilB FabricA FabricB Temp Surfactant
## 1    -1    -1      -1       1    1          1
## 2     1     1       1       1    1          1
## 3     1     1      -1       1   -1         -1
## 4     1    -1      -1      -1   -1          1
## 5     1    -1       1      -1    1         -1
## 6    -1    -1       1       1   -1         -1
## 7    -1     1      -1      -1    1         -1
## 8    -1     1       1      -1   -1          1
## class=design, type= FrF2

The FrF2 function also provides us with the Aliasing structure of the experiment.

Alias structure:

$main [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE

$fi2 [1] AF=BE=CD

Using this we can find the generator I for the experimental design.

Multiplying a column by itself will result in a column of all 1’s. This can be used to find the defining relations of the desing.

This is done with all the alaising relations, and repeats are eliminated. The resulting I is:

I = ABD = ACE = BCF = DEF = FBC = AFBE = AFCD

3. Statistical Analysis

Now that we have the Partial Factorial desing, we will add the results that were obtained from the full factorial experiment.

Partial
##   SoilA SoilB FabricA FabricB Temp Surfactant
## 1    -1    -1      -1       1    1          1
## 2     1     1       1       1    1          1
## 3     1     1      -1       1   -1         -1
## 4     1    -1      -1      -1   -1          1
## 5     1    -1       1      -1    1         -1
## 6    -1    -1       1       1   -1         -1
## 7    -1     1      -1      -1    1         -1
## 8    -1     1       1      -1   -1          1
## class=design, type= FrF2
R_Soil <- rep(NA, 8)
R_Fabric <- rep(NA, 8)

for(i in 1:8){R_Soil[i] <- as.numeric(Partial[i,1]) * as.numeric(Partial[i,2])}
R_Soil
## [1] 1 4 4 2 2 1 2 2
for(i in 1:8){R_Fabric[i] <- as.numeric(Partial[i,3]) * as.numeric(Partial[i,4])}
R_Fabric
## [1] 2 4 2 1 2 4 1 2

We can see which of the new experiments correspond to the original Full Factorial experiment

Check_Soil <- c(2, 3, 2, 2, 1, 2, 3, 1)
Check_Fabric <- c(2, 3, 2, 1, 3, 1, 2, 2)
Check_Temp <-c(1, 2, 2, 2, 1, 1, 1, 2)
Check_Suractant <- c(2, 2, 1, 1, 1, 2, 1, 2)
Check_Results <- rep(NA,8)

for(i in 1:8){
  for(j in 1:36){
    if((Check_Soil[i] == Data[j,1]) && (Check_Fabric[i] == Data[j,2]) 
       && (Check_Temp[i] == Data[j,3]) && (Check_Suractant[i] == Data[j,4])){
      Check_Results[i] <- Data[j,5]
       }    
  } 
}
Check_Results
## [1] 11.96 15.88 10.15 12.34  6.16 12.05  9.87  8.76

The nested for loop looks through the full factorial experiment for the correct experiments, and records the results of the correct experiment into the Check_Results vector.

Now we have the results for our experiment.

Comp_Partial <- add.response(Partial, Check_Results)
summary(Comp_Partial)
## Call:
## FrF2(8, 6)
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##    A  B  C  D  E  F
## 1 -1 -1 -1 -1 -1 -1
## 2  1  1  1  1  1  1
## 
## Responses:
## [1] Check_Results
## 
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   SoilA SoilB FabricA FabricB Temp Surfactant Check_Results
## 1    -1    -1      -1       1    1          1         11.96
## 2     1     1       1       1    1          1         15.88
## 3     1     1      -1       1   -1         -1         10.15
## 4     1    -1      -1      -1   -1          1         12.34
## 5     1    -1       1      -1    1         -1          6.16
## 6    -1    -1       1       1   -1         -1         12.05
## 7    -1     1      -1      -1    1         -1          9.87
## 8    -1     1       1      -1   -1          1          8.76
## class=design, type= FrF2

Exploratory BoxPlots

boxplot(Comp_Partial$Check_Results~Comp_Partial$SoilA, main = "Color Change based on SoilA")

We can visually see that there is a difference between the two levels.

boxplot(Comp_Partial$Check_Results~Comp_Partial$SoilB, main = "Color Change based on SoilB")

We can visually see that there is a difference between the two levels.

boxplot(Comp_Partial$Check_Results~Comp_Partial$FabricA, main = "Color Change based on FabricA")

We can visually see that there is probably not a difference between the two levels. As well as non equal variances.

boxplot(Comp_Partial$Check_Results~Comp_Partial$FabricB, main = "Color Change based on FabricB")

We can visually see that there is a difference between the two levels. As well as non equal variances.

boxplot(Comp_Partial$Check_Results~Comp_Partial$Temp, main = "Color Change based on Temp")

We can visually see that there may be a difference between the two levels.

boxplot(Comp_Partial$Check_Results~Comp_Partial$Surfactant, main = "Color Change based on Surfactant")

We can visually see that there is a difference between the two levels.

ANOVA

We can now conduct an ANOVA test, and build a linear regression from just the main effects.

M1<- lm(Comp_Partial$Check_Results~ Comp_Partial$SoilA + Comp_Partial$SoilB + Comp_Partial$FabricA 
         + Comp_Partial$FabricB + Comp_Partial$Temp + Comp_Partial$Surfactant)
summary(M1)
## 
## Call:
## lm.default(formula = Comp_Partial$Check_Results ~ Comp_Partial$SoilA + 
##     Comp_Partial$SoilB + Comp_Partial$FabricA + Comp_Partial$FabricB + 
##     Comp_Partial$Temp + Comp_Partial$Surfactant)
## 
## Residuals:
##      1      2      3      4      5      6      7      8 
## -1.639  1.639 -1.639  1.639 -1.639  1.639  1.639 -1.639 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)  
## (Intercept)              10.89625    1.63875   6.649    0.095 .
## Comp_Partial$SoilA1       0.23625    1.63875   0.144    0.909  
## Comp_Partial$SoilB1       0.26875    1.63875   0.164    0.897  
## Comp_Partial$FabricA1    -0.18375    1.63875  -0.112    0.929  
## Comp_Partial$FabricB1     1.61375    1.63875   0.985    0.505  
## Comp_Partial$Temp1        0.07125    1.63875   0.043    0.972  
## Comp_Partial$Surfactant1  1.33875    1.63875   0.817    0.564  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.635 on 1 degrees of freedom
## Multiple R-squared:  0.6295, Adjusted R-squared:  -1.593 
## F-statistic: 0.2832 on 6 and 1 DF,  p-value: 0.8907

Looking at the R-sq, we can see it is a good fit. If we look at the Adj R-sq and the P-values we see that we should be hesitant of the results. It would be advisable to run more experiments.

M2<- lm(Data$Result~ Data$Soil + Data$Fabric + Data$Temp + Data$Surfactant)
summary(M2)
## 
## Call:
## lm.default(formula = Data$Result ~ Data$Soil + Data$Fabric + 
##     Data$Temp + Data$Surfactant)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.9569 -1.4582  0.0454  1.5831  5.2839 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      10.89833    1.36884   7.962 8.83e-09 ***
## Data$Soil2        3.14083    1.26730   2.478 0.019263 *  
## Data$Soil3       10.35833    1.26730   8.174 5.18e-09 ***
## Data$Fabric2     -5.32917    1.26730  -4.205 0.000229 ***
## Data$Fabric3     -4.82417    1.26730  -3.807 0.000675 ***
## Data$Temp2       -0.06056    1.03475  -0.059 0.953734    
## Data$Surfactant2  0.13611    1.03475   0.132 0.896256    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.104 on 29 degrees of freedom
## Multiple R-squared:   0.76,  Adjusted R-squared:  0.7103 
## F-statistic: 15.31 on 6 and 29 DF,  p-value: 7.929e-08

If we apply the same test with the full data, we can see that we obtain much better results. This is because we do not confound the main effects.

4. References to the literature

Design and Analysis of Experiments, 8th Edition Douglas C. Montgomery

http://www.itl.nist.gov/div898/handbook/pri/section3/pri3343.htm#generating

5. Appendices