# Fractional Factorial Designs

## 1. Setting

From the Ecdat package, VietNamH dataset was used for this project. This dataset has a total of 5999 observations describing the household expenses of households in Vietnam. The dataset can be loaded as follows:

#install.packages("Ecdat")
library("Ecdat")
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
##     sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
##
##     Orange
library(pid)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Loading required package: png
## Loading required package: FrF2
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
##     aov, lm
## The following object is masked from 'package:graphics':
##
##     plot.design
## The following object is masked from 'package:base':
##
##     lengths
library(FrF2)
library(knitr)
library(tidyr)
library(kimisc)
df <- VietNamH

### System Under Test

As described above, the VietNamH dataset has data about the household expenses of households in Vietnam. Specifically, we have the following variables in the data set:

• sex: Gender of household head (male, female)
• age: Age of household head
• educyr: Schooling year of household head
• farm: yes if farm household
• urban: yes if urban household
• hhsize: Size of the household
• lntotal: Natural logarighm of total expenditure of the household
• lnmed: Natural logarithm of medical expenditure of the household
• lnrlfood: Natural logarithm of food expenditure of the household
• lnexp12m: Natural logarithm of total household health care expenditure for 12 months
• commune: Commune

This dataset has been taken from the Vietnam World Bank Livings Standards Survey.

head(df)
##      sex age educyr farm urban hhsize  lntotal     lnmed  lnrlfood
## 1 female  68      4   no   yes      6 10.13649 11.233210  8.639339
## 2 female  57      8   no   yes      6 10.25206  8.505120  9.345752
## 3   male  42     14   no   yes      6 10.93231  8.713418 10.226330
## 4 female  72      9   no   yes      6 10.26749  9.291736  9.263722
## 5 female  73      1   no   yes      8 10.48811  7.555382  9.592890
## 6 female  66     13   no   yes      7 10.52660  9.789702  9.372034
##    lnexp12m commune
## 1 11.233210       1
## 2  8.505120       1
## 3  8.713418       1
## 4  9.291736       1
## 5  7.555382       1
## 6  9.789702       1

### Factors and Levels

First of all, we must keep only those variables that will be needed in our analysis. For our analysis, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize). The response variable is lntotal. Thus, our modified dataset becomes:

df = df[c(1,5,2,6,7)]
for (i in 1:nrow(df))
{
if (df$age[i] <= 35) { df$age_factor[i] <- "Young"
}
else if (df$age[i] <= 60) { df$age_factor[i] <- "Adult"
}
else
{
df$age_factor[i] <- "Old" } if (df$hhsize[i] <= 4)
{
df$hhsize_factor[i] <- "Small" } else if (df$hhsize[i] <= 8)
{
df$hhsize_factor[i] <- "Medium" } else { df$hhsize_factor[i] <- "Large"
}
}

df$sex <- as.factor(df$sex)
df$urban <- as.factor(df$urban)
df$age <- as.factor(df$age_factor)
df$hhsize <- as.factor(df$hhsize_factor)
df <- df[-c(6,7)]

head(df)
##      sex urban   age hhsize  lntotal
## 1 female   yes   Old Medium 10.13649
## 2 female   yes Adult Medium 10.25206
## 3   male   yes Adult Medium 10.93231
## 4 female   yes   Old Medium 10.26749
## 5 female   yes   Old Medium 10.48811
## 6 female   yes   Old Medium 10.52660
str(df)
## 'data.frame':    5999 obs. of  5 variables:
##  $sex : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ... ##$ urban  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $age : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ... ##$ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
##  lntotal: num 10.1 10.3 10.9 10.3 10.5 ... summary(df) ## sex urban age hhsize lntotal ## male :4375 no :4269 Adult:3497 Large : 233 Min. : 6.543 ## female:1624 yes:1730 Old :1283 Medium:2920 1st Qu.: 8.920 ## Young:1219 Small :2846 Median : 9.311 ## Mean : 9.342 ## 3rd Qu.: 9.759 ## Max. :12.202 As can be seen above, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize). ### Continuous Variables and Response Variables The response variable in our study (lntotal) is the only continuous variable. We converted the continuous factors into categorical variables. ## 2. Experimental Design ### How will the experiment be organized and conducted to test the hypothesis? If we were to use $$2^2 \cdot 3^2$$ design, we would end up with 36 different configurations of input variables. However we plan to do a $$2^k$$ design by converting the 3-level variables into two 2-level variables. As a result the mixed design that we currently have would be converted to a $$2^k$$ design with $$k = 6$$. The 3-level factors can be converted into two 2-level factors by assigning $$(-1, -1), (-1, +1), (+1, -1), (+1, +1)$$ to $$small, medium, medium$$ and $$large$$ respectively. We can now begin the process of modifying the current dataset. The first 3-level variable (age) can be converted as follows. C <- c(-1, 1, -1, 1) D <- c(-1, -1, 1, 1) X <- c("Young", "Adult", "Adult", "Old") FactX <- as.data.frame(cbind(C,D,X)) kable(FactX, align='c') C D X -1 -1 Young 1 -1 Adult -1 1 Adult 1 1 Old The second 3-level variable (hhsize) can be converted to two 2-level variables as follows. E <- C F <- D Y <- c("Small", "Medium", "Medium", "Large") FactY <- as.data.frame(cbind(E, F, Y)) kable(FactY, align='c') E F Y -1 -1 Small 1 -1 Medium -1 1 Medium 1 1 Large Now that we have converted the 3-level factors into two 2-level factors each, we can construct a $$2^6$$ design as follows. K <- as.data.frame(FrF2(64, 6, randomize = FALSE)) ## creating full factorial with 64 runs ... A <- c(as.integer(K[,1])) B <- c(as.integer(K[,2])) C <- c(as.integer(K[,3])) D <- c(as.integer(K[,4])) CD <- C*D E <- c(as.integer(K[,5])) F <- c(as.integer(K[,6])) EF <- E*F FullDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF)) for (i in 1:nrow(FullDesignMatrix)) { if (FullDesignMatrixA[i] == 1)
{
FullDesignMatrix$Sex[i] <- "Male" } else { FullDesignMatrix$Sex[i] <- "Female"
}

if (FullDesignMatrix$B[i] == 1) { FullDesignMatrix$Urban[i] <- "No"
}
else
{
FullDesignMatrix$Urban[i] <- "Yes" } if (FullDesignMatrix$CD[i] == 1)
{
FullDesignMatrix$Age[i] <- "Adult" } else if (FullDesignMatrix$CD[i] == 2)
{
FullDesignMatrix$Age[i] <- "Old" } else { FullDesignMatrix$Age[i] <- "Young"
}

if (FullDesignMatrix$EF[i] == 1) { FullDesignMatrix$Hhsize[i] <- "Large"
}
else if (FullDesignMatrix$EF[i] == 2) { FullDesignMatrix$Hhsize[i] <- "Medium"
}
else
{
FullDesignMatrix$Hhsize[i] <- "Small" } } FullDesign <- cbind(K, FullDesignMatrix$Sex, FullDesignMatrix$Urban, FullDesignMatrix$Age, FullDesignMatrixHhsize) kable(FullDesign, align='c') A B C D E F FullDesignMatrixSex FullDesignMatrix$Urban FullDesignMatrix$Age FullDesignMatrix$Hhsize -1 -1 -1 -1 -1 -1 Male No Adult Large 1 -1 -1 -1 -1 -1 Female No Adult Large -1 1 -1 -1 -1 -1 Male Yes Adult Large 1 1 -1 -1 -1 -1 Female Yes Adult Large -1 -1 1 -1 -1 -1 Male No Old Large 1 -1 1 -1 -1 -1 Female No Old Large -1 1 1 -1 -1 -1 Male Yes Old Large 1 1 1 -1 -1 -1 Female Yes Old Large -1 -1 -1 1 -1 -1 Male No Old Large 1 -1 -1 1 -1 -1 Female No Old Large -1 1 -1 1 -1 -1 Male Yes Old Large 1 1 -1 1 -1 -1 Female Yes Old Large -1 -1 1 1 -1 -1 Male No Young Large 1 -1 1 1 -1 -1 Female No Young Large -1 1 1 1 -1 -1 Male Yes Young Large 1 1 1 1 -1 -1 Female Yes Young Large -1 -1 -1 -1 1 -1 Male No Adult Medium 1 -1 -1 -1 1 -1 Female No Adult Medium -1 1 -1 -1 1 -1 Male Yes Adult Medium 1 1 -1 -1 1 -1 Female Yes Adult Medium -1 -1 1 -1 1 -1 Male No Old Medium 1 -1 1 -1 1 -1 Female No Old Medium -1 1 1 -1 1 -1 Male Yes Old Medium 1 1 1 -1 1 -1 Female Yes Old Medium -1 -1 -1 1 1 -1 Male No Old Medium 1 -1 -1 1 1 -1 Female No Old Medium -1 1 -1 1 1 -1 Male Yes Old Medium 1 1 -1 1 1 -1 Female Yes Old Medium -1 -1 1 1 1 -1 Male No Young Medium 1 -1 1 1 1 -1 Female No Young Medium -1 1 1 1 1 -1 Male Yes Young Medium 1 1 1 1 1 -1 Female Yes Young Medium -1 -1 -1 -1 -1 1 Male No Adult Medium 1 -1 -1 -1 -1 1 Female No Adult Medium -1 1 -1 -1 -1 1 Male Yes Adult Medium 1 1 -1 -1 -1 1 Female Yes Adult Medium -1 -1 1 -1 -1 1 Male No Old Medium 1 -1 1 -1 -1 1 Female No Old Medium -1 1 1 -1 -1 1 Male Yes Old Medium 1 1 1 -1 -1 1 Female Yes Old Medium -1 -1 -1 1 -1 1 Male No Old Medium 1 -1 -1 1 -1 1 Female No Old Medium -1 1 -1 1 -1 1 Male Yes Old Medium 1 1 -1 1 -1 1 Female Yes Old Medium -1 -1 1 1 -1 1 Male No Young Medium 1 -1 1 1 -1 1 Female No Young Medium -1 1 1 1 -1 1 Male Yes Young Medium 1 1 1 1 -1 1 Female Yes Young Medium -1 -1 -1 -1 1 1 Male No Adult Small 1 -1 -1 -1 1 1 Female No Adult Small -1 1 -1 -1 1 1 Male Yes Adult Small 1 1 -1 -1 1 1 Female Yes Adult Small -1 -1 1 -1 1 1 Male No Old Small 1 -1 1 -1 1 1 Female No Old Small -1 1 1 -1 1 1 Male Yes Old Small 1 1 1 -1 1 1 Female Yes Old Small -1 -1 -1 1 1 1 Male No Old Small 1 -1 -1 1 1 1 Female No Old Small -1 1 -1 1 1 1 Male Yes Old Small 1 1 -1 1 1 1 Female Yes Old Small -1 -1 1 1 1 1 Male No Young Small 1 -1 1 1 1 1 Female No Young Small -1 1 1 1 1 1 Male Yes Young Small 1 1 1 1 1 1 Female Yes Young Small To construct the $$2^{6 - 3}$$ design, we follow the following steps. First of all, it is important to note that $$2^{6 - 3}$$ design is essentially a design having $$\frac{1}{8}$$ number of configurations as compared to $$2^6$$ design. Note that with the given set of input variables, it is not possible to achieve Resolution IV and V because for Resolution IV, we would require $$2^{6 - 2}$$ i.e. 16 runs and for Resolution V, we would require $$2^{6 - 1}$$ i.e. 32 runs. We know that for a Resolution III design, no main effect is aliased with any other main effect, but main effects are aliased with two factor interactions. Now, we can construct our $$2^{6 - 3}_{III}$$ comprising of 8 runs as follows. K <- as.data.frame(FrF2(8,6,randomize = FALSE)) A <- c(as.integer(K[,1])) B <- c(as.integer(K[,2])) C <- c(as.integer(K[,3])) D <- c(as.integer(K[,4])) CD <- C*D E <- c(as.integer(K[,5])) F <- c(as.integer(K[,6])) EF <- E*F FracDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF)) for (i in 1:nrow(FracDesignMatrix)) { if (FracDesignMatrix$A[i] == 1)
{
FracDesignMatrix$Sex[i] <- "Male" } else { FracDesignMatrix$Sex[i] <- "Female"
}

if (FracDesignMatrix$B[i] == 1) { FracDesignMatrix$Urban[i] <- "No"
}
else
{
FracDesignMatrix$Urban[i] <- "Yes" } if (FracDesignMatrix$CD[i] == 1)
{
FracDesignMatrix$Age[i] <- "Adult" } else if (FracDesignMatrix$CD[i] == 2)
{
FracDesignMatrix$Age[i] <- "Old" } else { FracDesignMatrix$Age[i] <- "Young"
}

if (FracDesignMatrix$EF[i] == 1) { FracDesignMatrix$Hhsize[i] <- "Large"
}
else if (FracDesignMatrix$EF[i] == 2) { FracDesignMatrix$Hhsize[i] <- "Medium"
}
else
{
FracDesignMatrix$Hhsize[i] <- "Small" } } FracDesign <- cbind(K, FracDesignMatrix$Sex, FracDesignMatrix$Urban, FracDesignMatrix$Age, FracDesignMatrixHhsize) kable(FracDesign, align='c') A B C D E F FracDesignMatrixSex FracDesignMatrix$Urban FracDesignMatrix$Age FracDesignMatrix$Hhsize -1 -1 -1 1 1 1 Male No Old Small 1 -1 -1 -1 -1 1 Female No Adult Medium -1 1 -1 -1 1 -1 Male Yes Adult Medium 1 1 -1 1 -1 -1 Female Yes Old Large -1 -1 1 1 -1 -1 Male No Young Large 1 -1 1 -1 1 -1 Female No Old Medium -1 1 1 -1 -1 1 Male Yes Old Medium 1 1 1 1 1 1 Female Yes Young Small We can now construct our $$2^3 = 2^{6 - 3}$$ table as follows. A <- c(-1, 1) B <- c(-1, 1) C <- c(-1, 1) des <- expand.grid(A=A, B=B, C=C) des <- as.data.frame(des) des ## A B C ## 1 -1 -1 -1 ## 2 1 -1 -1 ## 3 -1 1 -1 ## 4 1 1 -1 ## 5 -1 -1 1 ## 6 1 -1 1 ## 7 -1 1 1 ## 8 1 1 1 Since we have six factors, we need to look up the trade off table for $$2^k$$ fractional factorial analysis. tradeOffTable() As we can see, for $$2^{6 - 3}_{III}$$ fractional factorial design, we can add additional generators as follows. des$D <- des$A * des$B
des$E <- des$A * des$C des$F <- des$B * des$C

des
##    A  B  C  D  E  F
## 1 -1 -1 -1  1  1  1
## 2  1 -1 -1 -1 -1  1
## 3 -1  1 -1 -1  1 -1
## 4  1  1 -1  1 -1 -1
## 5 -1 -1  1  1 -1 -1
## 6  1 -1  1 -1  1 -1
## 7 -1  1  1 -1 -1  1
## 8  1  1  1  1  1  1

Now, we have the $$2^{6 - 3}_{III}$$ design matrix. We can determine the aliasing structure by using the following commands.

K <- FrF2(8,6,randomize = FALSE)
summary(K)
## Call:
## FrF2(8, 6, randomize = FALSE)
##
## Experimental design of type  FrF2
## 8  runs
##
## Factor settings (scale ends):
##    A  B  C  D  E  F
## 1 -1 -1 -1 -1 -1 -1
## 2  1  1  1  1  1  1
##
## Design generating information:
## $legend ## [1] A=A B=B C=C D=D E=E F=F ## ##$generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main ## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE ## ##$fi2
## [1] AF=BE=CD
##
##
## The design itself:
##    A  B  C  D  E  F
## 1 -1 -1 -1  1  1  1
## 2  1 -1 -1 -1 -1  1
## 3 -1  1 -1 -1  1 -1
## 4  1  1 -1  1 -1 -1
## 5 -1 -1  1  1 -1 -1
## 6  1 -1  1 -1  1 -1
## 7 -1  1  1 -1 -1  1
## 8  1  1  1  1  1  1
## class=design, type= FrF2

This aliasing structure is in sync with the description of $$2^{6 - 3}_{III}$$ fractional factorial designs in Montgomery textbook.

Before we proceed further, let’s summarize the input variables once again. $$A$$ represents the gender of household head, which is a 2-level factor. $$B$$ represents if the household is situated in urban area or not, which is also a 2-level factor. $$C$$ and $$D$$ together represent the 3-level factor age. $$E$$ and $$F$$ together represent the 3-level factor household size.

### What is the rationale for this design?

The rationale for this design is to be able to draw as much information using as less number of experimental runs as possible. Furthermore, in order to simulate the experiment, we are using $$2^{6 - 3}_{III}$$ design. We wish to draw maximum possible information from the 8 runs.

### Randomization, Blocking and Replication

For ANOVA, Randomization must occur at three levels: Random Selection, Random Assignment and Random Execution. Random Selection refers to selecting a random set of sample from the population. Random Assignment refers to assigning those samples to different groups. Random Execution refers to ordering the experimental trials randomly [1]. The data comes from the Vietnam World Bank Livings Standards Survey. There doesn’t seem to be any information pertaining to random selection. However, random assignment and random execution is ensured in the analysis.

Blocking refers to arranging the experimental runs in blocks that are similar to each other [2]. In this experiment, as mentioned above, we would be blocking on the categorical IV sex in order to eliminate any variablity caused by the levels of this factor.

Replication is the repeatition of an experimental condition so that the variablity associated with the phenomenon under study can be estimated [3]. Repeated measures refers to using the same subjects with every branch of research, including the control group [4]. In this dataset, there are several replications of possible configurations of the input IVs. There is no evidence of repeated measures.

## 3. Statistical Analysis

### Exploratory Data Analysis (EDA)

It is important to note that the EDA performed here is only done to obtain a feel for the descriptive statistics of the problem because we are trying to use eight runs for our fractional factorial designs.

summary(df)
##      sex       urban         age          hhsize        lntotal
##  male  :4375   no :4269   Adult:3497   Large : 233   Min.   : 6.543
##  female:1624   yes:1730   Old  :1283   Medium:2920   1st Qu.: 8.920
##                           Young:1219   Small :2846   Median : 9.311
##                                                      Mean   : 9.342
##                                                      3rd Qu.: 9.759
##                                                      Max.   :12.202
str(df)
## 'data.frame':    5999 obs. of  5 variables:
##  $sex : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ... ##$ urban  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $age : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ... ##$ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
##  $lntotal: num 10.1 10.3 10.9 10.3 10.5 ... head(df) ## sex urban age hhsize lntotal ## 1 female yes Old Medium 10.13649 ## 2 female yes Adult Medium 10.25206 ## 3 male yes Adult Medium 10.93231 ## 4 female yes Old Medium 10.26749 ## 5 female yes Old Medium 10.48811 ## 6 female yes Old Medium 10.52660 Boxplots can be generated for the four input variables as follows. boxplot(df$lntotal~df\$sex, xlab="Sex", ylab="Log of Total Expenditure", main="Boxplot of the IV Sex")