From the Ecdat package, VietNamH dataset was used for this project. This dataset has a total of 5999 observations describing the household expenses of households in Vietnam. The dataset can be loaded as follows:
#install.packages("Ecdat")
library("Ecdat")
## Loading required package: Ecfun
##
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
##
## sign
##
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
##
## Orange
library(pid)
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 3.3.2
## Loading required package: png
## Loading required package: FrF2
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
library(FrF2)
library(knitr)
library(tidyr)
library(kimisc)
df <- VietNamH
As described above, the VietNamH dataset has data about the household expenses of households in Vietnam. Specifically, we have the following variables in the data set:
sex: Gender of household head (male, female)age: Age of household headeducyr: Schooling year of household headfarm: yes if farm householdurban: yes if urban householdhhsize: Size of the householdlntotal: Natural logarighm of total expenditure of the householdlnmed: Natural logarithm of medical expenditure of the householdlnrlfood: Natural logarithm of food expenditure of the householdlnexp12m: Natural logarithm of total household health care expenditure for 12 monthscommune: CommuneThis dataset has been taken from the Vietnam World Bank Livings Standards Survey.
head(df)
## sex age educyr farm urban hhsize lntotal lnmed lnrlfood
## 1 female 68 4 no yes 6 10.13649 11.233210 8.639339
## 2 female 57 8 no yes 6 10.25206 8.505120 9.345752
## 3 male 42 14 no yes 6 10.93231 8.713418 10.226330
## 4 female 72 9 no yes 6 10.26749 9.291736 9.263722
## 5 female 73 1 no yes 8 10.48811 7.555382 9.592890
## 6 female 66 13 no yes 7 10.52660 9.789702 9.372034
## lnexp12m commune
## 1 11.233210 1
## 2 8.505120 1
## 3 8.713418 1
## 4 9.291736 1
## 5 7.555382 1
## 6 9.789702 1
First of all, we must keep only those variables that will be needed in our analysis. For our analysis, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize). The response variable is lntotal. Thus, our modified dataset becomes:
df = df[c(1,5,2,6,7)]
for (i in 1:nrow(df))
{
if (df$age[i] <= 35)
{
df$age_factor[i] <- "Young"
}
else if (df$age[i] <= 60)
{
df$age_factor[i] <- "Adult"
}
else
{
df$age_factor[i] <- "Old"
}
if (df$hhsize[i] <= 4)
{
df$hhsize_factor[i] <- "Small"
}
else if (df$hhsize[i] <= 8)
{
df$hhsize_factor[i] <- "Medium"
}
else
{
df$hhsize_factor[i] <- "Large"
}
}
df$sex <- as.factor(df$sex)
df$urban <- as.factor(df$urban)
df$age <- as.factor(df$age_factor)
df$hhsize <- as.factor(df$hhsize_factor)
df <- df[-c(6,7)]
head(df)
## sex urban age hhsize lntotal
## 1 female yes Old Medium 10.13649
## 2 female yes Adult Medium 10.25206
## 3 male yes Adult Medium 10.93231
## 4 female yes Old Medium 10.26749
## 5 female yes Old Medium 10.48811
## 6 female yes Old Medium 10.52660
str(df)
## 'data.frame': 5999 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ...
## $ urban : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ age : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ...
## $ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
## $ lntotal: num 10.1 10.3 10.9 10.3 10.5 ...
summary(df)
## sex urban age hhsize lntotal
## male :4375 no :4269 Adult:3497 Large : 233 Min. : 6.543
## female:1624 yes:1730 Old :1283 Medium:2920 1st Qu.: 8.920
## Young:1219 Small :2846 Median : 9.311
## Mean : 9.342
## 3rd Qu.: 9.759
## Max. :12.202
As can be seen above, we have two 2-level IVs (sex and urban) and two 3-level IVs (age and hhsize).
The response variable in our study (lntotal) is the only continuous variable. We converted the continuous factors into categorical variables.
If we were to use \(2^2 \cdot 3^2\) design, we would end up with 36 different configurations of input variables. However we plan to do a \(2^k\) design by converting the 3-level variables into two 2-level variables. As a result the mixed design that we currently have would be converted to a \(2^k\) design with \(k = 6\).
The 3-level factors can be converted into two 2-level factors by assigning \((-1, -1), (-1, +1), (+1, -1), (+1, +1)\) to \(small, medium, medium\) and \(large\) respectively.
We can now begin the process of modifying the current dataset. The first 3-level variable (age) can be converted as follows.
C <- c(-1, 1, -1, 1)
D <- c(-1, -1, 1, 1)
X <- c("Young", "Adult", "Adult", "Old")
FactX <- as.data.frame(cbind(C,D,X))
kable(FactX, align='c')
| C | D | X |
|---|---|---|
| -1 | -1 | Young |
| 1 | -1 | Adult |
| -1 | 1 | Adult |
| 1 | 1 | Old |
The second 3-level variable (hhsize) can be converted to two 2-level variables as follows.
E <- C
F <- D
Y <- c("Small", "Medium", "Medium", "Large")
FactY <- as.data.frame(cbind(E, F, Y))
kable(FactY, align='c')
| E | F | Y |
|---|---|---|
| -1 | -1 | Small |
| 1 | -1 | Medium |
| -1 | 1 | Medium |
| 1 | 1 | Large |
Now that we have converted the 3-level factors into two 2-level factors each, we can construct a \(2^6\) design as follows.
K <- as.data.frame(FrF2(64, 6, randomize = FALSE))
## creating full factorial with 64 runs ...
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
FullDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
for (i in 1:nrow(FullDesignMatrix))
{
if (FullDesignMatrix$A[i] == 1)
{
FullDesignMatrix$Sex[i] <- "Male"
}
else
{
FullDesignMatrix$Sex[i] <- "Female"
}
if (FullDesignMatrix$B[i] == 1)
{
FullDesignMatrix$Urban[i] <- "No"
}
else
{
FullDesignMatrix$Urban[i] <- "Yes"
}
if (FullDesignMatrix$CD[i] == 1)
{
FullDesignMatrix$Age[i] <- "Adult"
}
else if (FullDesignMatrix$CD[i] == 2)
{
FullDesignMatrix$Age[i] <- "Old"
}
else
{
FullDesignMatrix$Age[i] <- "Young"
}
if (FullDesignMatrix$EF[i] == 1)
{
FullDesignMatrix$Hhsize[i] <- "Large"
}
else if (FullDesignMatrix$EF[i] == 2)
{
FullDesignMatrix$Hhsize[i] <- "Medium"
}
else
{
FullDesignMatrix$Hhsize[i] <- "Small"
}
}
FullDesign <- cbind(K, FullDesignMatrix$Sex, FullDesignMatrix$Urban, FullDesignMatrix$Age, FullDesignMatrix$Hhsize)
kable(FullDesign, align='c')
| A | B | C | D | E | F | FullDesignMatrix$Sex | FullDesignMatrix$Urban | FullDesignMatrix$Age | FullDesignMatrix$Hhsize |
|---|---|---|---|---|---|---|---|---|---|
| -1 | -1 | -1 | -1 | -1 | -1 | Male | No | Adult | Large |
| 1 | -1 | -1 | -1 | -1 | -1 | Female | No | Adult | Large |
| -1 | 1 | -1 | -1 | -1 | -1 | Male | Yes | Adult | Large |
| 1 | 1 | -1 | -1 | -1 | -1 | Female | Yes | Adult | Large |
| -1 | -1 | 1 | -1 | -1 | -1 | Male | No | Old | Large |
| 1 | -1 | 1 | -1 | -1 | -1 | Female | No | Old | Large |
| -1 | 1 | 1 | -1 | -1 | -1 | Male | Yes | Old | Large |
| 1 | 1 | 1 | -1 | -1 | -1 | Female | Yes | Old | Large |
| -1 | -1 | -1 | 1 | -1 | -1 | Male | No | Old | Large |
| 1 | -1 | -1 | 1 | -1 | -1 | Female | No | Old | Large |
| -1 | 1 | -1 | 1 | -1 | -1 | Male | Yes | Old | Large |
| 1 | 1 | -1 | 1 | -1 | -1 | Female | Yes | Old | Large |
| -1 | -1 | 1 | 1 | -1 | -1 | Male | No | Young | Large |
| 1 | -1 | 1 | 1 | -1 | -1 | Female | No | Young | Large |
| -1 | 1 | 1 | 1 | -1 | -1 | Male | Yes | Young | Large |
| 1 | 1 | 1 | 1 | -1 | -1 | Female | Yes | Young | Large |
| -1 | -1 | -1 | -1 | 1 | -1 | Male | No | Adult | Medium |
| 1 | -1 | -1 | -1 | 1 | -1 | Female | No | Adult | Medium |
| -1 | 1 | -1 | -1 | 1 | -1 | Male | Yes | Adult | Medium |
| 1 | 1 | -1 | -1 | 1 | -1 | Female | Yes | Adult | Medium |
| -1 | -1 | 1 | -1 | 1 | -1 | Male | No | Old | Medium |
| 1 | -1 | 1 | -1 | 1 | -1 | Female | No | Old | Medium |
| -1 | 1 | 1 | -1 | 1 | -1 | Male | Yes | Old | Medium |
| 1 | 1 | 1 | -1 | 1 | -1 | Female | Yes | Old | Medium |
| -1 | -1 | -1 | 1 | 1 | -1 | Male | No | Old | Medium |
| 1 | -1 | -1 | 1 | 1 | -1 | Female | No | Old | Medium |
| -1 | 1 | -1 | 1 | 1 | -1 | Male | Yes | Old | Medium |
| 1 | 1 | -1 | 1 | 1 | -1 | Female | Yes | Old | Medium |
| -1 | -1 | 1 | 1 | 1 | -1 | Male | No | Young | Medium |
| 1 | -1 | 1 | 1 | 1 | -1 | Female | No | Young | Medium |
| -1 | 1 | 1 | 1 | 1 | -1 | Male | Yes | Young | Medium |
| 1 | 1 | 1 | 1 | 1 | -1 | Female | Yes | Young | Medium |
| -1 | -1 | -1 | -1 | -1 | 1 | Male | No | Adult | Medium |
| 1 | -1 | -1 | -1 | -1 | 1 | Female | No | Adult | Medium |
| -1 | 1 | -1 | -1 | -1 | 1 | Male | Yes | Adult | Medium |
| 1 | 1 | -1 | -1 | -1 | 1 | Female | Yes | Adult | Medium |
| -1 | -1 | 1 | -1 | -1 | 1 | Male | No | Old | Medium |
| 1 | -1 | 1 | -1 | -1 | 1 | Female | No | Old | Medium |
| -1 | 1 | 1 | -1 | -1 | 1 | Male | Yes | Old | Medium |
| 1 | 1 | 1 | -1 | -1 | 1 | Female | Yes | Old | Medium |
| -1 | -1 | -1 | 1 | -1 | 1 | Male | No | Old | Medium |
| 1 | -1 | -1 | 1 | -1 | 1 | Female | No | Old | Medium |
| -1 | 1 | -1 | 1 | -1 | 1 | Male | Yes | Old | Medium |
| 1 | 1 | -1 | 1 | -1 | 1 | Female | Yes | Old | Medium |
| -1 | -1 | 1 | 1 | -1 | 1 | Male | No | Young | Medium |
| 1 | -1 | 1 | 1 | -1 | 1 | Female | No | Young | Medium |
| -1 | 1 | 1 | 1 | -1 | 1 | Male | Yes | Young | Medium |
| 1 | 1 | 1 | 1 | -1 | 1 | Female | Yes | Young | Medium |
| -1 | -1 | -1 | -1 | 1 | 1 | Male | No | Adult | Small |
| 1 | -1 | -1 | -1 | 1 | 1 | Female | No | Adult | Small |
| -1 | 1 | -1 | -1 | 1 | 1 | Male | Yes | Adult | Small |
| 1 | 1 | -1 | -1 | 1 | 1 | Female | Yes | Adult | Small |
| -1 | -1 | 1 | -1 | 1 | 1 | Male | No | Old | Small |
| 1 | -1 | 1 | -1 | 1 | 1 | Female | No | Old | Small |
| -1 | 1 | 1 | -1 | 1 | 1 | Male | Yes | Old | Small |
| 1 | 1 | 1 | -1 | 1 | 1 | Female | Yes | Old | Small |
| -1 | -1 | -1 | 1 | 1 | 1 | Male | No | Old | Small |
| 1 | -1 | -1 | 1 | 1 | 1 | Female | No | Old | Small |
| -1 | 1 | -1 | 1 | 1 | 1 | Male | Yes | Old | Small |
| 1 | 1 | -1 | 1 | 1 | 1 | Female | Yes | Old | Small |
| -1 | -1 | 1 | 1 | 1 | 1 | Male | No | Young | Small |
| 1 | -1 | 1 | 1 | 1 | 1 | Female | No | Young | Small |
| -1 | 1 | 1 | 1 | 1 | 1 | Male | Yes | Young | Small |
| 1 | 1 | 1 | 1 | 1 | 1 | Female | Yes | Young | Small |
To construct the \(2^{6 - 3}\) design, we follow the following steps. First of all, it is important to note that \(2^{6 - 3}\) design is essentially a design having \(\frac{1}{8}\) number of configurations as compared to \(2^6\) design. Note that with the given set of input variables, it is not possible to achieve Resolution IV and V because for Resolution IV, we would require \(2^{6 - 2}\) i.e. 16 runs and for Resolution V, we would require \(2^{6 - 1}\) i.e. 32 runs. We know that for a Resolution III design, no main effect is aliased with any other main effect, but main effects are aliased with two factor interactions. Now, we can construct our \(2^{6 - 3}_{III}\) comprising of 8 runs as follows.
K <- as.data.frame(FrF2(8,6,randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
FracDesignMatrix <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
for (i in 1:nrow(FracDesignMatrix))
{
if (FracDesignMatrix$A[i] == 1)
{
FracDesignMatrix$Sex[i] <- "Male"
}
else
{
FracDesignMatrix$Sex[i] <- "Female"
}
if (FracDesignMatrix$B[i] == 1)
{
FracDesignMatrix$Urban[i] <- "No"
}
else
{
FracDesignMatrix$Urban[i] <- "Yes"
}
if (FracDesignMatrix$CD[i] == 1)
{
FracDesignMatrix$Age[i] <- "Adult"
}
else if (FracDesignMatrix$CD[i] == 2)
{
FracDesignMatrix$Age[i] <- "Old"
}
else
{
FracDesignMatrix$Age[i] <- "Young"
}
if (FracDesignMatrix$EF[i] == 1)
{
FracDesignMatrix$Hhsize[i] <- "Large"
}
else if (FracDesignMatrix$EF[i] == 2)
{
FracDesignMatrix$Hhsize[i] <- "Medium"
}
else
{
FracDesignMatrix$Hhsize[i] <- "Small"
}
}
FracDesign <- cbind(K, FracDesignMatrix$Sex, FracDesignMatrix$Urban, FracDesignMatrix$Age, FracDesignMatrix$Hhsize)
kable(FracDesign, align='c')
| A | B | C | D | E | F | FracDesignMatrix$Sex | FracDesignMatrix$Urban | FracDesignMatrix$Age | FracDesignMatrix$Hhsize |
|---|---|---|---|---|---|---|---|---|---|
| -1 | -1 | -1 | 1 | 1 | 1 | Male | No | Old | Small |
| 1 | -1 | -1 | -1 | -1 | 1 | Female | No | Adult | Medium |
| -1 | 1 | -1 | -1 | 1 | -1 | Male | Yes | Adult | Medium |
| 1 | 1 | -1 | 1 | -1 | -1 | Female | Yes | Old | Large |
| -1 | -1 | 1 | 1 | -1 | -1 | Male | No | Young | Large |
| 1 | -1 | 1 | -1 | 1 | -1 | Female | No | Old | Medium |
| -1 | 1 | 1 | -1 | -1 | 1 | Male | Yes | Old | Medium |
| 1 | 1 | 1 | 1 | 1 | 1 | Female | Yes | Young | Small |
We can now construct our \(2^3 = 2^{6 - 3}\) table as follows.
A <- c(-1, 1)
B <- c(-1, 1)
C <- c(-1, 1)
des <- expand.grid(A=A, B=B, C=C)
des <- as.data.frame(des)
des
## A B C
## 1 -1 -1 -1
## 2 1 -1 -1
## 3 -1 1 -1
## 4 1 1 -1
## 5 -1 -1 1
## 6 1 -1 1
## 7 -1 1 1
## 8 1 1 1
Since we have six factors, we need to look up the trade off table for \(2^k\) fractional factorial analysis.
tradeOffTable()
As we can see, for \(2^{6 - 3}_{III}\) fractional factorial design, we can add additional generators as follows.
des$D <- des$A * des$B
des$E <- des$A * des$C
des$F <- des$B * des$C
des
## A B C D E F
## 1 -1 -1 -1 1 1 1
## 2 1 -1 -1 -1 -1 1
## 3 -1 1 -1 -1 1 -1
## 4 1 1 -1 1 -1 -1
## 5 -1 -1 1 1 -1 -1
## 6 1 -1 1 -1 1 -1
## 7 -1 1 1 -1 -1 1
## 8 1 1 1 1 1 1
Now, we have the \(2^{6 - 3}_{III}\) design matrix. We can determine the aliasing structure by using the following commands.
K <- FrF2(8,6,randomize = FALSE)
summary(K)
## Call:
## FrF2(8, 6, randomize = FALSE)
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## A B C D E F
## 1 -1 -1 -1 -1 -1 -1
## 2 1 1 1 1 1 1
##
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## A B C D E F
## 1 -1 -1 -1 1 1 1
## 2 1 -1 -1 -1 -1 1
## 3 -1 1 -1 -1 1 -1
## 4 1 1 -1 1 -1 -1
## 5 -1 -1 1 1 -1 -1
## 6 1 -1 1 -1 1 -1
## 7 -1 1 1 -1 -1 1
## 8 1 1 1 1 1 1
## class=design, type= FrF2
This aliasing structure is in sync with the description of \(2^{6 - 3}_{III}\) fractional factorial designs in Montgomery textbook.
Before we proceed further, let’s summarize the input variables once again. \(A\) represents the gender of household head, which is a 2-level factor. \(B\) represents if the household is situated in urban area or not, which is also a 2-level factor. \(C\) and \(D\) together represent the 3-level factor age. \(E\) and \(F\) together represent the 3-level factor household size.
The rationale for this design is to be able to draw as much information using as less number of experimental runs as possible. Furthermore, in order to simulate the experiment, we are using \(2^{6 - 3}_{III}\) design. We wish to draw maximum possible information from the 8 runs.
For ANOVA, Randomization must occur at three levels: Random Selection, Random Assignment and Random Execution. Random Selection refers to selecting a random set of sample from the population. Random Assignment refers to assigning those samples to different groups. Random Execution refers to ordering the experimental trials randomly [1]. The data comes from the Vietnam World Bank Livings Standards Survey. There doesn’t seem to be any information pertaining to random selection. However, random assignment and random execution is ensured in the analysis.
Blocking refers to arranging the experimental runs in blocks that are similar to each other [2]. In this experiment, as mentioned above, we would be blocking on the categorical IV sex in order to eliminate any variablity caused by the levels of this factor.
Replication is the repeatition of an experimental condition so that the variablity associated with the phenomenon under study can be estimated [3]. Repeated measures refers to using the same subjects with every branch of research, including the control group [4]. In this dataset, there are several replications of possible configurations of the input IVs. There is no evidence of repeated measures.
It is important to note that the EDA performed here is only done to obtain a feel for the descriptive statistics of the problem because we are trying to use eight runs for our fractional factorial designs.
summary(df)
## sex urban age hhsize lntotal
## male :4375 no :4269 Adult:3497 Large : 233 Min. : 6.543
## female:1624 yes:1730 Old :1283 Medium:2920 1st Qu.: 8.920
## Young:1219 Small :2846 Median : 9.311
## Mean : 9.342
## 3rd Qu.: 9.759
## Max. :12.202
str(df)
## 'data.frame': 5999 obs. of 5 variables:
## $ sex : Factor w/ 2 levels "male","female": 2 2 1 2 2 2 2 1 1 1 ...
## $ urban : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
## $ age : Factor w/ 3 levels "Adult","Old",..: 2 1 1 2 2 2 2 1 1 1 ...
## $ hhsize : Factor w/ 3 levels "Large","Medium",..: 2 2 2 2 2 2 1 3 2 3 ...
## $ lntotal: num 10.1 10.3 10.9 10.3 10.5 ...
head(df)
## sex urban age hhsize lntotal
## 1 female yes Old Medium 10.13649
## 2 female yes Adult Medium 10.25206
## 3 male yes Adult Medium 10.93231
## 4 female yes Old Medium 10.26749
## 5 female yes Old Medium 10.48811
## 6 female yes Old Medium 10.52660
Boxplots can be generated for the four input variables as follows.
boxplot(df$lntotal~df$sex, xlab="Sex", ylab="Log of Total Expenditure", main="Boxplot of the IV Sex")
boxplot(df$lntotal~df$urban, xlab="Urban?", ylab="Log of Total Expenditure", main="Boxplot of the IV Urban")
boxplot(df$lntotal~df$age, xlab="Age", ylab="Log of Total Expenditure", main="Boxplot of the IV Age")
boxplot(df$lntotal~df$hhsize, xlab="Household Income", ylab="Log of Total Expenditure", main="Boxplot of the IV Household Size")
We can form the ordered \(2^{6 - 3}_{III}\) fractional factorial design matrix as follows.
K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Design <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
for (i in 1:nrow(K))
{
if (Design$A[i] == 1)
{
Design$Sex[i] <- "Male"
}
else
{
Design$Sex[i] <- "Female"
}
if (Design$B[i] == 1)
{
Design$Urban[i] <- "No"
}
else
{
Design$Urban[i] <- "Yes"
}
if (Design$CD[i] == 1)
{
Design$Age[i] <- "Adult"
}
else if (Design$CD[i] == 2)
{
Design$Age[i] <- "Old"
}
else
{
Design$Age[i] <- "Young"
}
if (Design$EF[i] == 1)
{
Design$Hhsize[i] <- "Large"
}
else if (Design$EF[i] == 2)
{
Design$Hhsize[i] <- "Medium"
}
else
{
Design$Hhsize[i] <- "Small"
}
}
dfDesign <- cbind(K, Design$Sex, Design$Urban, Design$Age, Design$Hhsize)
kable(dfDesign, align='c')
| A | B | C | D | E | F | Design$Sex | Design$Urban | Design$Age | Design$Hhsize |
|---|---|---|---|---|---|---|---|---|---|
| -1 | -1 | -1 | 1 | 1 | 1 | Male | No | Old | Small |
| 1 | -1 | -1 | -1 | -1 | 1 | Female | No | Adult | Medium |
| -1 | 1 | -1 | -1 | 1 | -1 | Male | Yes | Adult | Medium |
| 1 | 1 | -1 | 1 | -1 | -1 | Female | Yes | Old | Large |
| -1 | -1 | 1 | 1 | -1 | -1 | Male | No | Young | Large |
| 1 | -1 | 1 | -1 | 1 | -1 | Female | No | Old | Medium |
| -1 | 1 | 1 | -1 | -1 | 1 | Male | Yes | Old | Medium |
| 1 | 1 | 1 | 1 | 1 | 1 | Female | Yes | Young | Small |
The code for a randomized design matrix is as follows.
plan.design <- FrF2(8,6, randomize=TRUE)
K <- as.data.frame(plan.design)
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Design <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
for (i in 1:nrow(K))
{
if (Design$A[i] == 1)
{
Design$Sex[i] <- "male"
}
else
{
Design$Sex[i] <- "female"
}
if (Design$B[i] == 1)
{
Design$Urban[i] <- "no"
}
else
{
Design$Urban[i] <- "yes"
}
if (Design$CD[i] == 1)
{
Design$Age[i] <- "Adult"
}
else if (Design$CD[i] == 2)
{
Design$Age[i] <- "Old"
}
else
{
Design$Age[i] <- "Young"
}
if (Design$EF[i] == 1)
{
Design$Hhsize[i] <- "Large"
}
else if (Design$EF[i] == 2)
{
Design$Hhsize[i] <- "Medium"
}
else
{
Design$Hhsize[i] <- "Small"
}
}
dfDesign <- cbind(K, Design$Sex, Design$Urban, Design$Age, Design$Hhsize)
kable(dfDesign, align='c')
| A | B | C | D | E | F | Design$Sex | Design$Urban | Design$Age | Design$Hhsize |
|---|---|---|---|---|---|---|---|---|---|
| -1 | -1 | 1 | 1 | -1 | -1 | male | no | Young | Large |
| -1 | 1 | -1 | -1 | 1 | -1 | male | yes | Adult | Medium |
| 1 | 1 | 1 | 1 | 1 | 1 | female | yes | Young | Small |
| 1 | -1 | 1 | -1 | 1 | -1 | female | no | Old | Medium |
| -1 | -1 | -1 | 1 | 1 | 1 | male | no | Old | Small |
| -1 | 1 | 1 | -1 | -1 | 1 | male | yes | Old | Medium |
| 1 | -1 | -1 | -1 | -1 | 1 | female | no | Adult | Medium |
| 1 | 1 | -1 | 1 | -1 | -1 | female | yes | Old | Large |
summary(dfDesign)
## A B C D E F Design$Sex Design$Urban
## -1:4 -1:4 -1:4 -1:4 -1:4 -1:4 female:4 no :4
## 1 :4 1 :4 1 :4 1 :4 1 :4 1 :4 male :4 yes:4
##
## Design$Age Design$Hhsize
## Adult:2 Large :2
## Old :4 Medium:4
## Young:2 Small :2
design_with_ref <- unite(dfDesign, refcol, c(7,8,9,10), remove=FALSE)
kable(design_with_ref, align='c')
| A | B | C | D | E | F | refcol | Design$Sex | Design$Urban | Design$Age | Design$Hhsize |
|---|---|---|---|---|---|---|---|---|---|---|
| -1 | -1 | 1 | 1 | -1 | -1 | male_no_Young_Large | male | no | Young | Large |
| -1 | 1 | -1 | -1 | 1 | -1 | male_yes_Adult_Medium | male | yes | Adult | Medium |
| 1 | 1 | 1 | 1 | 1 | 1 | female_yes_Young_Small | female | yes | Young | Small |
| 1 | -1 | 1 | -1 | 1 | -1 | female_no_Old_Medium | female | no | Old | Medium |
| -1 | -1 | -1 | 1 | 1 | 1 | male_no_Old_Small | male | no | Old | Small |
| -1 | 1 | 1 | -1 | -1 | 1 | male_yes_Old_Medium | male | yes | Old | Medium |
| 1 | -1 | -1 | -1 | -1 | 1 | female_no_Adult_Medium | female | no | Adult | Medium |
| 1 | 1 | -1 | 1 | -1 | -1 | female_yes_Old_Large | female | yes | Old | Large |
Design.refcol <- c(dfDesign$refcol)
df <- unite(df, refcol, c(1,2,3,4), remove=FALSE)
head(df)
## refcol sex urban age hhsize lntotal
## 1 female_yes_Old_Medium female yes Old Medium 10.13649
## 2 female_yes_Adult_Medium female yes Adult Medium 10.25206
## 3 male_yes_Adult_Medium male yes Adult Medium 10.93231
## 4 female_yes_Old_Medium female yes Old Medium 10.26749
## 5 female_yes_Old_Medium female yes Old Medium 10.48811
## 6 female_yes_Old_Medium female yes Old Medium 10.52660
head(design_with_ref)
## A B C D E F refcol Design$Sex Design$Urban
## 1 -1 -1 1 1 -1 -1 male_no_Young_Large male no
## 2 -1 1 -1 -1 1 -1 male_yes_Adult_Medium male yes
## 3 1 1 1 1 1 1 female_yes_Young_Small female yes
## 4 1 -1 1 -1 1 -1 female_no_Old_Medium female no
## 5 -1 -1 -1 1 1 1 male_no_Old_Small male no
## 6 -1 1 1 -1 -1 1 male_yes_Old_Medium male yes
## Design$Age Design$Hhsize
## 1 Young Large
## 2 Adult Medium
## 3 Young Small
## 4 Old Medium
## 5 Old Small
## 6 Old Medium
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[1] == df$refcol[j])
{
df$R1[j] <- df$lntotal[j]
}
else
{
df$R1[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[2] == df$refcol[j])
{
df$R2[j] <- df$lntotal[j]
}
else
{
df$R2[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[3] == df$refcol[j])
{
df$R3[j] <- df$lntotal[j]
}
else
{
df$R3[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[4] == df$refcol[j])
{
df$R4[j] <- df$lntotal[j]
}
else
{
df$R4[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[5] == df$refcol[j])
{
df$R5[j] <- df$lntotal[j]
}
else
{
df$R5[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[6] == df$refcol[j])
{
df$R6[j] <- df$lntotal[j]
}
else
{
df$R6[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[7] == df$refcol[j])
{
df$R7[j] <- df$lntotal[j]
}
else
{
df$R7[j] <- 0
}
}
for (j in 1:nrow(df))
{
if (design_with_ref$refcol[8] == df$refcol[j])
{
df$R8[j] <- df$lntotal[j]
}
else
{
df$R8[j] <- 0
}
}
R_1 <- sample.rows(subset(df, R1 > 0), 1)
R_2 <- sample.rows(subset(df, R2 > 0), 1)
R_3 <- sample.rows(subset(df, R3 > 0), 1)
R_4 <- sample.rows(subset(df, R4 > 0), 1)
R_5 <- sample.rows(subset(df, R5 > 0), 1)
R_6 <- sample.rows(subset(df, R6 > 0), 1)
R_7 <- sample.rows(subset(df, R7 > 0), 1)
R_8 <- sample.rows(subset(df, R8 > 0), 1)
RV <- c(R_1$R1, R_2$R2, R_3$R3, R_4$R4, R_5$R5, R_6$R6, R_7$R7, R_8$R8)
RV
## [1] 9.536133 9.683457 9.905867 9.441408 9.153513 9.816130 9.547972
## [8] 10.229960
The above eight response variable values can be added to our design.
plan.response <- add.response(plan.design, RV)
summary(plan.response)
## Call:
## FrF2(8, 6, randomize = TRUE)
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## A B C D E F
## 1 -1 -1 -1 -1 -1 -1
## 2 1 1 1 1 1 1
##
## Responses:
## [1] RV
##
## Design generating information:
## $legend
## [1] A=A B=B C=C D=D E=E F=F
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## A B C D E F RV
## 1 -1 -1 1 1 -1 -1 9.536133
## 2 -1 1 -1 -1 1 -1 9.683457
## 3 1 1 1 1 1 1 9.905867
## 4 1 -1 1 -1 1 -1 9.441408
## 5 -1 -1 -1 1 1 1 9.153513
## 6 -1 1 1 -1 -1 1 9.816130
## 7 1 -1 -1 -1 -1 1 9.547972
## 8 1 1 -1 1 -1 -1 10.229960
## class=design, type= FrF2
Now that we have our design matrix along with the response variable values ready, we can begin the analysis.
summary(lm(plan.response))
## Number of observations used: 8
## Formula:
## RV ~ (A + B + C + D + E + F)^2
##
## Call:
## lm.default(formula = fo, data = model.frame(fo, data = formula))
##
## Residuals:
## ALL 8 residuals are 0: no residual degrees of freedom!
##
## Coefficients: (14 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.664305 NA NA NA
## A1 0.116997 NA NA NA
## B1 0.244549 NA NA NA
## C1 0.010579 NA NA NA
## D1 0.042063 NA NA NA
## E1 -0.118244 NA NA NA
## F1 -0.058434 NA NA NA
## A1:F1 0.004052 NA NA NA
##
## Residual standard error: NaN on 0 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: NaN
## F-statistic: NaN on 7 and 0 DF, p-value: NA
To compute the main effects, we can use the following command.
MEPlot(plan.response, abbrev=5, cex.xax=1.6, cex.main=2)
The interaction effects can be computed as follows.
IAPlot(plan.response, abbrev=5, show.alias=TRUE, lwd=2, cex=2, cex.xax=1.2, cex.lab=1.5)
According to the above plots, the main effects of \(B\), \(C\), \(D\) and \(F\) seem significant. These correspond to the urban (i.e. whether house is in urban area or not), age (i.e. age of head of the household) and hhsize (i.e. size of the household). It seems that the urban households have a higher total expenditure. Furthermore, age of the household owner negatively impacts the household expenses. Also, the larger the household, the higher is the expenditure (note that the hhsize was encoded as Large, Medium and Small respectively). Finally, it doesn’t look like the sex of the head of the household affects the total household expenditure - but we need to quantify the main effect through ANOVA.
Thus, we can formulate our null hypothesis as follows: The total household expenditure is not affected by sex, urban, age and household size.
AOVObject = aov(df$lntotal~df$sex+df$urban+df$age+df$hhsize, data = df)
anova(AOVObject)
## Analysis of Variance Table
##
## Response: df$lntotal
## Df Sum Sq Mean Sq F value Pr(>F)
## df$sex 1 18.87 18.87 62.744 2.785e-15 ***
## df$urban 1 635.53 635.53 2113.101 < 2.2e-16 ***
## df$age 2 113.70 56.85 189.030 < 2.2e-16 ***
## df$hhsize 2 266.78 133.39 443.519 < 2.2e-16 ***
## Residuals 5992 1802.13 0.30
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
It is apparent from ANOVA reults that we can safely discard the null hypothesis - all the main effects are statistically significant.
First of all, we can plot the histogram of the response variable
hist(df$lntotal, breaks=10, main="Histogram of logarithm of total household expenditure")
The distribution seems normal from the histogram.
qqnorm(residuals(AOVObject))
qqline(residuals(AOVObject))
plot(fitted(AOVObject), residuals(AOVObject))
The QQ plot and residuals vs fitted plots indeed confirm that the data was normally distributed and randomly selected.