Fractional Factorial Designs

Prasanna Date

RPI

December 9, 2016 V.1

1. Setting

From the Ecdat package, VietNamH dataset was used for this project. This dataset has a total of 5999 observations describing the household expenses of households in Vietnam. The dataset can be loaded as follows:

#install.packages("Ecdat")
library("Ecdat")
## Loading required package: Ecfun
## 
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
## 
##     sign
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
library(pid)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Loading required package: png
## Loading required package: FrF2
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## The following object is masked from 'package:graphics':
## 
##     plot.design
## The following object is masked from 'package:base':
## 
##     lengths
library(FrF2)
library(knitr)
library(tidyr)
library(kimisc)
df <- VietNamH

System Under Test

As described above, the VietNamH dataset has data about the household expenses of households in Vietnam. Specifically, we have the following variables in the data set:

  • sex: Gender of household head (male, female)
  • age: Age of household head
  • educyr: Schooling year of household head
  • farm: yes if farm household
  • urban: yes if urban household
  • hhsize: Size of the household
  • lntotal: Natural logarighm of total expenditure of the household
  • lnmed: Natural logarithm of medical expenditure of the household
  • lnrlfood: Natural logarithm of food expenditure of the household
  • lnexp12m: Natural logarithm of total household health care expenditure for 12 months
  • commune: Commune

This dataset has been taken from the Vietnam World Bank Livings Standards Survey.

head(df)
##      sex age educyr farm urban hhsize  lntotal     lnmed  lnrlfood
## 1 female  68      4   no   yes      6 10.13649 11.233210  8.639339
## 2 female  57      8   no   yes      6 10.25206  8.505120  9.345752
## 3   male  42     14   no   yes      6 10.93231  8.713418 10.226330
## 4 female  72      9   no   yes      6 10.26749  9.291736  9.263722
## 5 female  73      1   no   yes      8 10.48811  7.555382  9.592890
## 6 female  66     13   no   yes      7 10.52660  9.789702  9.372034
##    lnexp12m commune
## 1 11.233210       1
## 2  8.505120       1
## 3  8.713418       1
## 4  9.291736       1
## 5  7.555382       1
## 6  9.789702       1

Factors and Levels

First of all, we must keep only those variables that will be needed in our analysis. For our analysis, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize). The response variable is lntotal. Thus, our modified dataset becomes:

df = df[c(1,5,2,6,7)]
for (i in 1:nrow(df))
{
  if (df$age[i] <= 35) 
  {
    df$age_factor[i] <- "Young"
  }
  else if (df$age[i] <= 60) 
  {
    df$age_factor[i] <- "Adult"
  }
  else
  {
    df$age_factor[i] <- "Old"
  }
  
  if (df$hhsize[i] <= 4) 
  {
    df$hhsize_factor[i] <- "Small"
  }
  else if (df$hhsize[i] <= 8)
  {
    df$hhsize_factor[i] <- "Medium"
  }
  else
  {
    df$hhsize_factor[i] <- "Large"
  }
}

df$sex <- as.factor(df$sex)
df$urban <- as.factor(df$urban)
df$age <- as.factor(df$age_factor)
df$hhsize <- as.factor(df$hhsize_factor)
df <- df[-c(6,7)]

head(df)
##      sex urban   age hhsize  lntotal
## 1 female   yes   Old Medium 10.13649
## 2 female   yes Adult Medium 10.25206
## 3   male   yes Adult Medium 10.93231
## 4 female   yes   Old Medium 10.26749
## 5 female   yes   Old Medium 10.48811
## 6 female   yes   Old Medium 10.52660
str(df)
## 'data.frame':    5999 obs. of  5 variables:
##  $ sex    : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ...
##  $ urban  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ age    : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ...
##  $ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
##  $ lntotal: num  10.1 10.3 10.9 10.3 10.5 ...
summary(df)
##      sex       urban         age          hhsize        lntotal      
##  male  :4375   no :4269   Adult:3497   Large : 233   Min.   : 6.543  
##  female:1624   yes:1730   Old  :1283   Medium:2920   1st Qu.: 8.920  
##                           Young:1219   Small :2846   Median : 9.311  
##                                                      Mean   : 9.342  
##                                                      3rd Qu.: 9.759  
##                                                      Max.   :12.202

As can be seen above, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize).

Continuous Variables and Response Variables

The response variable in our study (lntotal) is the only continuous variable. We converted the continuous factors into categorical variables.

2. Experimental Design

How will the experiment be organized and conducted to test the hypothesis?

If we were to use \(2^2 \cdot 3^2\) design, we would end up with 36 different configurations of input variables. However we plan to do a \(2^k\) design by converting the 3-level variables into two 2-level variables. As a result the mixed design that we currently have would be converted to a \(2^k\) design with \(k = 6\).

The 3-level factors can be converted into two 2-level factors by assigning \((-1, -1), (-1, +1), (+1, -1), (+1, +1)\) to \(small, medium, medium\) and \(large\) respectively.

We can now begin the process of modifying the current dataset. The first 3-level variable (age) can be converted as follows.

C <- c(-1, 1, -1, 1)
D <- c(-1, -1, 1, 1)
X <- c("Young", "Adult", "Adult", "Old")
FactX <- as.data.frame(cbind(C,D,X))
kable(FactX, align='c')
C D X
-1 -1 Young
1 -1 Adult
-1 1 Adult
1 1 Old

The second 3-level variable (hhsize) can be converted to two 2-level variables as follows.

E <- C
F <- D
Y <- c("Small", "Medium", "Medium", "Large")
FactY <- as.data.frame(cbind(E, F, Y))
kable(FactY, align='c')
E F Y
-1 -1 Small
1 -1 Medium
-1 1 Medium
1 1 Large

Now that we have converted the 3-level factors into two 2-level factors each, we can construct a \(2^6\) design as follows.

K <- as.data.frame(FrF2(64, 6, randomize = FALSE))
## creating full factorial with 64 runs ...
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F

FullDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

for (i in 1:nrow(FullDesignMatrix))
{
  if (FullDesignMatrix$A[i] == 1)
  {
    FullDesignMatrix$Sex[i] <- "Male"
  }
  else
  {
    FullDesignMatrix$Sex[i] <- "Female"
  }
  
  if (FullDesignMatrix$B[i] == 1)
  {
    FullDesignMatrix$Urban[i] <- "No"
  }
  else
  {
    FullDesignMatrix$Urban[i] <- "Yes"
  }
  
  if (FullDesignMatrix$CD[i] == 1)
  {
    FullDesignMatrix$Age[i] <- "Adult"
  }
  else if (FullDesignMatrix$CD[i] == 2)
  {
    FullDesignMatrix$Age[i] <- "Old"
  }
  else
  {
    FullDesignMatrix$Age[i] <- "Young"
  }
  
  if (FullDesignMatrix$EF[i] == 1)
  {
    FullDesignMatrix$Hhsize[i] <- "Large"
  }
  else if (FullDesignMatrix$EF[i] == 2)
  {
    FullDesignMatrix$Hhsize[i] <- "Medium"
  }
  else
  {
    FullDesignMatrix$Hhsize[i] <- "Small"
  }
}

FullDesign <- cbind(K, FullDesignMatrix$Sex, FullDesignMatrix$Urban, FullDesignMatrix$Age, FullDesignMatrix$Hhsize)
kable(FullDesign, align='c')
A B C D E F FullDesignMatrix$Sex FullDesignMatrix$Urban FullDesignMatrix$Age FullDesignMatrix$Hhsize
-1 -1 -1 -1 -1 -1 Male No Adult Large
1 -1 -1 -1 -1 -1 Female No Adult Large
-1 1 -1 -1 -1 -1 Male Yes Adult Large
1 1 -1 -1 -1 -1 Female Yes Adult Large
-1 -1 1 -1 -1 -1 Male No Old Large
1 -1 1 -1 -1 -1 Female No Old Large
-1 1 1 -1 -1 -1 Male Yes Old Large
1 1 1 -1 -1 -1 Female Yes Old Large
-1 -1 -1 1 -1 -1 Male No Old Large
1 -1 -1 1 -1 -1 Female No Old Large
-1 1 -1 1 -1 -1 Male Yes Old Large
1 1 -1 1 -1 -1 Female Yes Old Large
-1 -1 1 1 -1 -1 Male No Young Large
1 -1 1 1 -1 -1 Female No Young Large
-1 1 1 1 -1 -1 Male Yes Young Large
1 1 1 1 -1 -1 Female Yes Young Large
-1 -1 -1 -1 1 -1 Male No Adult Medium
1 -1 -1 -1 1 -1 Female No Adult Medium
-1 1 -1 -1 1 -1 Male Yes Adult Medium
1 1 -1 -1 1 -1 Female Yes Adult Medium
-1 -1 1 -1 1 -1 Male No Old Medium
1 -1 1 -1 1 -1 Female No Old Medium
-1 1 1 -1 1 -1 Male Yes Old Medium
1 1 1 -1 1 -1 Female Yes Old Medium
-1 -1 -1 1 1 -1 Male No Old Medium
1 -1 -1 1 1 -1 Female No Old Medium
-1 1 -1 1 1 -1 Male Yes Old Medium
1 1 -1 1 1 -1 Female Yes Old Medium
-1 -1 1 1 1 -1 Male No Young Medium
1 -1 1 1 1 -1 Female No Young Medium
-1 1 1 1 1 -1 Male Yes Young Medium
1 1 1 1 1 -1 Female Yes Young Medium
-1 -1 -1 -1 -1 1 Male No Adult Medium
1 -1 -1 -1 -1 1 Female No Adult Medium
-1 1 -1 -1 -1 1 Male Yes Adult Medium
1 1 -1 -1 -1 1 Female Yes Adult Medium
-1 -1 1 -1 -1 1 Male No Old Medium
1 -1 1 -1 -1 1 Female No Old Medium
-1 1 1 -1 -1 1 Male Yes Old Medium
1 1 1 -1 -1 1 Female Yes Old Medium
-1 -1 -1 1 -1 1 Male No Old Medium
1 -1 -1 1 -1 1 Female No Old Medium
-1 1 -1 1 -1 1 Male Yes Old Medium
1 1 -1 1 -1 1 Female Yes Old Medium
-1 -1 1 1 -1 1 Male No Young Medium
1 -1 1 1 -1 1 Female No Young Medium
-1 1 1 1 -1 1 Male Yes Young Medium
1 1 1 1 -1 1 Female Yes Young Medium
-1 -1 -1 -1 1 1 Male No Adult Small
1 -1 -1 -1 1 1 Female No Adult Small
-1 1 -1 -1 1 1 Male Yes Adult Small
1 1 -1 -1 1 1 Female Yes Adult Small
-1 -1 1 -1 1 1 Male No Old Small
1 -1 1 -1 1 1 Female No Old Small
-1 1 1 -1 1 1 Male Yes Old Small
1 1 1 -1 1 1 Female Yes Old Small
-1 -1 -1 1 1 1 Male No Old Small
1 -1 -1 1 1 1 Female No Old Small
-1 1 -1 1 1 1 Male Yes Old Small
1 1 -1 1 1 1 Female Yes Old Small
-1 -1 1 1 1 1 Male No Young Small
1 -1 1 1 1 1 Female No Young Small
-1 1 1 1 1 1 Male Yes Young Small
1 1 1 1 1 1 Female Yes Young Small

To construct the \(2^{6 - 3}\) design, we follow the following steps. First of all, it is important to note that \(2^{6 - 3}\) design is essentially a design having \(\frac{1}{8}\) number of configurations as compared to \(2^6\) design. Note that with the given set of input variables, it is not possible to achieve Resolution IV and V because for Resolution IV, we would require \(2^{6 - 2}\) i.e. 16 runs and for Resolution V, we would require \(2^{6 - 1}\) i.e. 32 runs. We know that for a Resolution III design, no main effect is aliased with any other main effect, but main effects are aliased with two factor interactions. Now, we can construct our \(2^{6 - 3}_{III}\) comprising of 8 runs as follows.

K <- as.data.frame(FrF2(8,6,randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F

FracDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

for (i in 1:nrow(FracDesignMatrix))
{
  if (FracDesignMatrix$A[i] == 1)
  {
    FracDesignMatrix$Sex[i] <- "Male"
  }
  else
  {
    FracDesignMatrix$Sex[i] <- "Female"
  }
  
  if (FracDesignMatrix$B[i] == 1)
  {
    FracDesignMatrix$Urban[i] <- "No"
  }
  else
  {
    FracDesignMatrix$Urban[i] <- "Yes"
  }
  
  if (FracDesignMatrix$CD[i] == 1)
  {
    FracDesignMatrix$Age[i] <- "Adult"
  }
  else if (FracDesignMatrix$CD[i] == 2)
  {
    FracDesignMatrix$Age[i] <- "Old"
  }
  else
  {
    FracDesignMatrix$Age[i] <- "Young"
  }
  
  if (FracDesignMatrix$EF[i] == 1)
  {
    FracDesignMatrix$Hhsize[i] <- "Large"
  }
  else if (FracDesignMatrix$EF[i] == 2)
  {
    FracDesignMatrix$Hhsize[i] <- "Medium"
  }
  else
  {
    FracDesignMatrix$Hhsize[i] <- "Small"
  }
}

FracDesign <- cbind(K, FracDesignMatrix$Sex, FracDesignMatrix$Urban, FracDesignMatrix$Age, FracDesignMatrix$Hhsize)

kable(FracDesign, align='c')
A B C D E F FracDesignMatrix$Sex FracDesignMatrix$Urban FracDesignMatrix$Age FracDesignMatrix$Hhsize
-1 -1 -1 1 1 1 Male No Old Small
1 -1 -1 -1 -1 1 Female No Adult Medium
-1 1 -1 -1 1 -1 Male Yes Adult Medium
1 1 -1 1 -1 -1 Female Yes Old Large
-1 -1 1 1 -1 -1 Male No Young Large
1 -1 1 -1 1 -1 Female No Old Medium
-1 1 1 -1 -1 1 Male Yes Old Medium
1 1 1 1 1 1 Female Yes Young Small

We can now construct our \(2^3 = 2^{6 - 3}\) table as follows.

A <- c(-1, 1)
B <- c(-1, 1)
C <- c(-1, 1)
des <- expand.grid(A=A, B=B, C=C)
des <- as.data.frame(des)
des
##    A  B  C
## 1 -1 -1 -1
## 2  1 -1 -1
## 3 -1  1 -1
## 4  1  1 -1
## 5 -1 -1  1
## 6  1 -1  1
## 7 -1  1  1
## 8  1  1  1

Since we have six factors, we need to look up the trade off table for \(2^k\) fractional factorial analysis.

tradeOffTable()

As we can see, for \(2^{6 - 3}_{III}\) fractional factorial design, we can add additional generators as follows.

des$D <- des$A * des$B
des$E <- des$A * des$C
des$F <- des$B * des$C

des
##    A  B  C  D  E  F
## 1 -1 -1 -1  1  1  1
## 2  1 -1 -1 -1 -1  1
## 3 -1  1 -1 -1  1 -1
## 4  1  1 -1  1 -1 -1
## 5 -1 -1  1  1 -1 -1
## 6  1 -1  1 -1  1 -1
## 7 -1  1  1 -1 -1  1
## 8  1  1  1  1  1  1

Now, we have the \(2^{6 - 3}_{III}\) design matrix. We can determine the aliasing structure by using the following commands.

K <- FrF2(8,6,randomize = FALSE)
summary(K)
## Call:
## FrF2(8, 6, randomize = FALSE)
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##    A  B  C  D  E  F
## 1 -1 -1 -1 -1 -1 -1
## 2  1  1  1  1  1  1
## 
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##    A  B  C  D  E  F
## 1 -1 -1 -1  1  1  1
## 2  1 -1 -1 -1 -1  1
## 3 -1  1 -1 -1  1 -1
## 4  1  1 -1  1 -1 -1
## 5 -1 -1  1  1 -1 -1
## 6  1 -1  1 -1  1 -1
## 7 -1  1  1 -1 -1  1
## 8  1  1  1  1  1  1
## class=design, type= FrF2

This aliasing structure is in sync with the description of \(2^{6 - 3}_{III}\) fractional factorial designs in Montgomery textbook.

Before we proceed further, let’s summarize the input variables once again. \(A\) represents the gender of household head, which is a 2-level factor. \(B\) represents if the household is situated in urban area or not, which is also a 2-level factor. \(C\) and \(D\) together represent the 3-level factor age. \(E\) and \(F\) together represent the 3-level factor household size.

What is the rationale for this design?

The rationale for this design is to be able to draw as much information using as less number of experimental runs as possible. Furthermore, in order to simulate the experiment, we are using \(2^{6 - 3}_{III}\) design. We wish to draw maximum possible information from the 8 runs.

Randomization, Blocking and Replication

For ANOVA, Randomization must occur at three levels: Random Selection, Random Assignment and Random Execution. Random Selection refers to selecting a random set of sample from the population. Random Assignment refers to assigning those samples to different groups. Random Execution refers to ordering the experimental trials randomly [1]. The data comes from the Vietnam World Bank Livings Standards Survey. There doesn’t seem to be any information pertaining to random selection. However, random assignment and random execution is ensured in the analysis.

Blocking refers to arranging the experimental runs in blocks that are similar to each other [2]. In this experiment, as mentioned above, we would be blocking on the categorical IV sex in order to eliminate any variablity caused by the levels of this factor.

Replication is the repeatition of an experimental condition so that the variablity associated with the phenomenon under study can be estimated [3]. Repeated measures refers to using the same subjects with every branch of research, including the control group [4]. In this dataset, there are several replications of possible configurations of the input IVs. There is no evidence of repeated measures.

3. Statistical Analysis

Exploratory Data Analysis (EDA)

It is important to note that the EDA performed here is only done to obtain a feel for the descriptive statistics of the problem because we are trying to use eight runs for our fractional factorial designs.

summary(df)
##      sex       urban         age          hhsize        lntotal      
##  male  :4375   no :4269   Adult:3497   Large : 233   Min.   : 6.543  
##  female:1624   yes:1730   Old  :1283   Medium:2920   1st Qu.: 8.920  
##                           Young:1219   Small :2846   Median : 9.311  
##                                                      Mean   : 9.342  
##                                                      3rd Qu.: 9.759  
##                                                      Max.   :12.202
str(df)
## 'data.frame':    5999 obs. of  5 variables:
##  $ sex    : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ...
##  $ urban  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
##  $ age    : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ...
##  $ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
##  $ lntotal: num  10.1 10.3 10.9 10.3 10.5 ...
head(df)
##      sex urban   age hhsize  lntotal
## 1 female   yes   Old Medium 10.13649
## 2 female   yes Adult Medium 10.25206
## 3   male   yes Adult Medium 10.93231
## 4 female   yes   Old Medium 10.26749
## 5 female   yes   Old Medium 10.48811
## 6 female   yes   Old Medium 10.52660

Boxplots can be generated for the four input variables as follows.

boxplot(df$lntotal~df$sex, xlab="Sex", ylab="Log of Total Expenditure", main="Boxplot of the IV Sex")

boxplot(df$lntotal~df$urban, xlab="Urban?", ylab="Log of Total Expenditure", main="Boxplot of the IV Urban")

boxplot(df$lntotal~df$age, xlab="Age", ylab="Log of Total Expenditure", main="Boxplot of the IV Age")

boxplot(df$lntotal~df$hhsize, xlab="Household Income", ylab="Log of Total Expenditure", main="Boxplot of the IV Household Size")

Testing

We can form the ordered \(2^{6 - 3}_{III}\) fractional factorial design matrix as follows.

K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F

Design <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

for (i in 1:nrow(K))
{
  if (Design$A[i] == 1)
  {
    Design$Sex[i] <- "Male"
  }
  else
  {
    Design$Sex[i] <- "Female"
  }
  
  if (Design$B[i] == 1)
  {
    Design$Urban[i] <- "No"
  }
  else
  {
    Design$Urban[i] <- "Yes"
  }
  
  if (Design$CD[i] == 1)
  {
    Design$Age[i] <- "Adult"
  }
  else if (Design$CD[i] == 2)
  {
    Design$Age[i] <- "Old"
  }
  else
  {
    Design$Age[i] <- "Young"
  }
  
  if (Design$EF[i] == 1)
  {
    Design$Hhsize[i] <- "Large"
  }
  else if (Design$EF[i] == 2)
  {
    Design$Hhsize[i] <- "Medium"
  }
  else
  {
    Design$Hhsize[i] <- "Small"
  }
}

dfDesign <- cbind(K, Design$Sex, Design$Urban, Design$Age, Design$Hhsize)
kable(dfDesign, align='c')
A B C D E F Design$Sex Design$Urban Design$Age Design$Hhsize
-1 -1 -1 1 1 1 Male No Old Small
1 -1 -1 -1 -1 1 Female No Adult Medium
-1 1 -1 -1 1 -1 Male Yes Adult Medium
1 1 -1 1 -1 -1 Female Yes Old Large
-1 -1 1 1 -1 -1 Male No Young Large
1 -1 1 -1 1 -1 Female No Old Medium
-1 1 1 -1 -1 1 Male Yes Old Medium
1 1 1 1 1 1 Female Yes Young Small

The code for a randomized design matrix is as follows.

plan.design <- FrF2(8,6, randomize=TRUE)
K <- as.data.frame(plan.design)
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F

Design <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

for (i in 1:nrow(K))
{
  if (Design$A[i] == 1)
  {
    Design$Sex[i] <- "male"
  }
  else
  {
    Design$Sex[i] <- "female"
  }
  
  if (Design$B[i] == 1)
  {
    Design$Urban[i] <- "no"
  }
  else
  {
    Design$Urban[i] <- "yes"
  }
  
  if (Design$CD[i] == 1)
  {
    Design$Age[i] <- "Adult"
  }
  else if (Design$CD[i] == 2)
  {
    Design$Age[i] <- "Old"
  }
  else
  {
    Design$Age[i] <- "Young"
  }
  
  if (Design$EF[i] == 1)
  {
    Design$Hhsize[i] <- "Large"
  }
  else if (Design$EF[i] == 2)
  {
    Design$Hhsize[i] <- "Medium"
  }
  else
  {
    Design$Hhsize[i] <- "Small"
  }
}

dfDesign <- cbind(K, Design$Sex, Design$Urban, Design$Age, Design$Hhsize)
kable(dfDesign, align='c')
A B C D E F Design$Sex Design$Urban Design$Age Design$Hhsize
-1 -1 1 1 -1 -1 male no Young Large
-1 1 -1 -1 1 -1 male yes Adult Medium
1 1 1 1 1 1 female yes Young Small
1 -1 1 -1 1 -1 female no Old Medium
-1 -1 -1 1 1 1 male no Old Small
-1 1 1 -1 -1 1 male yes Old Medium
1 -1 -1 -1 -1 1 female no Adult Medium
1 1 -1 1 -1 -1 female yes Old Large
summary(dfDesign)
##   A      B      C      D      E      F      Design$Sex Design$Urban
##  -1:4   -1:4   -1:4   -1:4   -1:4   -1:4   female:4    no :4       
##  1 :4   1 :4   1 :4   1 :4   1 :4   1 :4   male  :4    yes:4       
##                                                                    
##  Design$Age Design$Hhsize
##  Adult:2    Large :2     
##  Old  :4    Medium:4     
##  Young:2    Small :2
design_with_ref <- unite(dfDesign, refcol, c(7,8,9,10), remove=FALSE)
kable(design_with_ref, align='c')
A B C D E F refcol Design$Sex Design$Urban Design$Age Design$Hhsize
-1 -1 1 1 -1 -1 male_no_Young_Large male no Young Large
-1 1 -1 -1 1 -1 male_yes_Adult_Medium male yes Adult Medium
1 1 1 1 1 1 female_yes_Young_Small female yes Young Small
1 -1 1 -1 1 -1 female_no_Old_Medium female no Old Medium
-1 -1 -1 1 1 1 male_no_Old_Small male no Old Small
-1 1 1 -1 -1 1 male_yes_Old_Medium male yes Old Medium
1 -1 -1 -1 -1 1 female_no_Adult_Medium female no Adult Medium
1 1 -1 1 -1 -1 female_yes_Old_Large female yes Old Large
Design.refcol <- c(dfDesign$refcol)
df <- unite(df, refcol, c(1,2,3,4), remove=FALSE)
head(df)
##                    refcol    sex urban   age hhsize  lntotal
## 1   female_yes_Old_Medium female   yes   Old Medium 10.13649
## 2 female_yes_Adult_Medium female   yes Adult Medium 10.25206
## 3   male_yes_Adult_Medium   male   yes Adult Medium 10.93231
## 4   female_yes_Old_Medium female   yes   Old Medium 10.26749
## 5   female_yes_Old_Medium female   yes   Old Medium 10.48811
## 6   female_yes_Old_Medium female   yes   Old Medium 10.52660
head(design_with_ref)
##    A  B  C  D  E  F                 refcol Design$Sex Design$Urban
## 1 -1 -1  1  1 -1 -1    male_no_Young_Large       male           no
## 2 -1  1 -1 -1  1 -1  male_yes_Adult_Medium       male          yes
## 3  1  1  1  1  1  1 female_yes_Young_Small     female          yes
## 4  1 -1  1 -1  1 -1   female_no_Old_Medium     female           no
## 5 -1 -1 -1  1  1  1      male_no_Old_Small       male           no
## 6 -1  1  1 -1 -1  1    male_yes_Old_Medium       male          yes
##   Design$Age Design$Hhsize
## 1      Young         Large
## 2      Adult        Medium
## 3      Young         Small
## 4        Old        Medium
## 5        Old         Small
## 6        Old        Medium
for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[1] == df$refcol[j])
  {
    df$R1[j] <- df$lntotal[j]
  }
  else
  {
    df$R1[j] <- 0
  }
}

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[2] == df$refcol[j])
  {
    df$R2[j] <- df$lntotal[j]
  }
  else
  {
    df$R2[j] <- 0
  }
}

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[3] == df$refcol[j])
  {
    df$R3[j] <- df$lntotal[j]
  }
  else
  {
    df$R3[j] <- 0
  }
}
 
for (j in 1:nrow(df))
{ 
  if (design_with_ref$refcol[4] == df$refcol[j])
  {
    df$R4[j] <- df$lntotal[j]
  }
  else
  {
    df$R4[j] <- 0
  }
}  

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[5] == df$refcol[j])
  {
    df$R5[j] <- df$lntotal[j]
  }
  else
  {
    df$R5[j] <- 0
  }
}

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[6] == df$refcol[j])
  {
    df$R6[j] <- df$lntotal[j]
  }
  else
  {
    df$R6[j] <- 0
  }
}

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[7] == df$refcol[j])
  {
    df$R7[j] <- df$lntotal[j]
  }
  else
  {
    df$R7[j] <- 0
  }
}  

for (j in 1:nrow(df))
{
  if (design_with_ref$refcol[8] == df$refcol[j])
  {
    df$R8[j] <- df$lntotal[j]
  }
  else
  {
    df$R8[j] <- 0
  }
}


R_1 <- sample.rows(subset(df, R1 > 0), 1)
R_2 <- sample.rows(subset(df, R2 > 0), 1)
R_3 <- sample.rows(subset(df, R3 > 0), 1)
R_4 <- sample.rows(subset(df, R4 > 0), 1)
R_5 <- sample.rows(subset(df, R5 > 0), 1)
R_6 <- sample.rows(subset(df, R6 > 0), 1)
R_7 <- sample.rows(subset(df, R7 > 0), 1)
R_8 <- sample.rows(subset(df, R8 > 0), 1)

RV <- c(R_1$R1, R_2$R2, R_3$R3, R_4$R4, R_5$R5, R_6$R6, R_7$R7, R_8$R8)

RV
## [1]  9.536133  9.683457  9.905867  9.441408  9.153513  9.816130  9.547972
## [8] 10.229960

The above eight response variable values can be added to our design.

plan.response <- add.response(plan.design, RV)
summary(plan.response)
## Call:
## FrF2(8, 6, randomize = TRUE)
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##    A  B  C  D  E  F
## 1 -1 -1 -1 -1 -1 -1
## 2  1  1  1  1  1  1
## 
## Responses:
## [1] RV
## 
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##    A  B  C  D  E  F        RV
## 1 -1 -1  1  1 -1 -1  9.536133
## 2 -1  1 -1 -1  1 -1  9.683457
## 3  1  1  1  1  1  1  9.905867
## 4  1 -1  1 -1  1 -1  9.441408
## 5 -1 -1 -1  1  1  1  9.153513
## 6 -1  1  1 -1 -1  1  9.816130
## 7  1 -1 -1 -1 -1  1  9.547972
## 8  1  1 -1  1 -1 -1 10.229960
## class=design, type= FrF2

Now that we have our design matrix along with the response variable values ready, we can begin the analysis.

summary(lm(plan.response))
## Number of observations used: 8 
## Formula:
## RV ~ (A + B + C + D + E + F)^2
## 
## Call:
## lm.default(formula = fo, data = model.frame(fo, data = formula))
## 
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
## 
## Coefficients: (14 not defined because of singularities)
##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)  9.664305         NA      NA       NA
## A1           0.116997         NA      NA       NA
## B1           0.244549         NA      NA       NA
## C1           0.010579         NA      NA       NA
## D1           0.042063         NA      NA       NA
## E1          -0.118244         NA      NA       NA
## F1          -0.058434         NA      NA       NA
## A1:F1        0.004052         NA      NA       NA
## 
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared:      1,  Adjusted R-squared:    NaN 
## F-statistic:   NaN on 7 and 0 DF,  p-value: NA

To compute the main effects, we can use the following command.

MEPlot(plan.response, abbrev=5, cex.xax=1.6, cex.main=2)

The interaction effects can be computed as follows.

IAPlot(plan.response, abbrev=5, show.alias=TRUE, lwd=2, cex=2, cex.xax=1.2, cex.lab=1.5)

According to the above plots, the main effects of \(B\), \(C\), \(D\) and \(F\) seem significant. These correspond to the urban (i.e. whether house is in urban area or not), age (i.e. age of head of the household) and hhsize (i.e. size of the household). It seems that the urban households have a higher total expenditure. Furthermore, age of the household owner negatively impacts the household expenses. Also, the larger the household, the higher is the expenditure (note that the hhsize was encoded as Large, Medium and Small respectively). Finally, it doesn’t look like the sex of the head of the household affects the total household expenditure - but we need to quantify the main effect through ANOVA.

Thus, we can formulate our null hypothesis as follows: The total household expenditure is not affected by sex, urban, age and household size.

AOVObject = aov(df$lntotal~df$sex+df$urban+df$age+df$hhsize, data = df)
anova(AOVObject)
## Analysis of Variance Table
## 
## Response: df$lntotal
##             Df  Sum Sq Mean Sq  F value    Pr(>F)    
## df$sex       1   18.87   18.87   62.744 2.785e-15 ***
## df$urban     1  635.53  635.53 2113.101 < 2.2e-16 ***
## df$age       2  113.70   56.85  189.030 < 2.2e-16 ***
## df$hhsize    2  266.78  133.39  443.519 < 2.2e-16 ***
## Residuals 5992 1802.13    0.30                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It is apparent from ANOVA reults that we can safely discard the null hypothesis - all the main effects are statistically significant.

Model Adequacy Checking

First of all, we can plot the histogram of the response variable

hist(df$lntotal, breaks=10, main="Histogram of logarithm of total household expenditure")

The distribution seems normal from the histogram.

qqnorm(residuals(AOVObject))
qqline(residuals(AOVObject))

plot(fitted(AOVObject), residuals(AOVObject))

The QQ plot and residuals vs fitted plots indeed confirm that the data was normally distributed and randomly selected.

References