Project

Fractional Factorial Designs

Kaan UNNU

RPI

12.04.2016 V.1

1. Setting

After examining the available data sets in Ecdat, Housing data set has been selected.

Installation of the necessary packages and selection of the data-set is shown below:

#install.packages("pid")
#install.packages("Ecdat")
library(pid)
library(Ecdat)
library(FrF2)
library(knitr)
library(tidyr)
library(kimisc)

Warning: package 'kimisc' was built under R version 3.3.2

Prj <- Housing

System under test

Housing data set was collected for Fractional Factorial Design Project and consist of 12 variables and 546 observations. It represents Sales Prices of Houses in the City of Windsor in 1987 and the main source of the data is Journal of Applied Econometrics.

Definitions of the variables in data set are as below:

price : sale price of a house
lotsize : the lot size of a property in square feet
bedrooms : number of bedrooms
bathrms : number of full bathrooms
stories : number of stories excluding basement
driveway : does the house has a driveway ?
recroom : does the house has a recreational room ?
fullbase : does the house has a full finished basement ?
gashw : does the house uses gas for hot water heating ?
airco : does the house has central air conditioning ?
garagepl : number of garage places
prefarea : is the house located in the preferred neighborhood of the city ?

head(Prj)

  price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
1 42000    5850        3       1       2      yes      no      yes    no
2 38500    4000        2       1       1      yes      no       no    no
3 49500    3060        3       1       1      yes      no       no    no
4 60500    6650        3       1       2      yes     yes       no    no
5 61000    6360        2       1       1      yes      no       no    no
6 66000    4160        3       1       1      yes     yes      yes    no
  airco garagepl prefarea
1    no        1       no
2    no        0       no
3    no        0       no
4    no        0       no
5    no        0       no
6   yes        0       no

Factors and Levels

Factors and levels of each factor are listed as below:

str(Prj)

'data.frame':   546 obs. of  12 variables:
 $ price   : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
 $ lotsize : num  5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
 $ bedrooms: num  3 2 3 3 2 3 3 3 3 3 ...
 $ bathrms : num  1 1 1 1 1 1 2 1 1 2 ...
 $ stories : num  2 1 1 2 1 1 2 3 1 4 ...
 $ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
 $ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
 $ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
 $ gashw   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ airco   : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
 $ garagepl: num  1 0 0 0 0 0 2 0 0 1 ...
 $ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...

However, for our project we should select two 2-level factors and two 3-level factors in the given data and re-organize some of the variables according to this constrain.

2-level factors selected are:

fullbase : does the house has a full finished basement ? > already a 2 level factor (yes / no)
prefarea : is the house located in the preferred neighborhood of the city ? > already a 2 level factor (yes / no)

3-level factors selected and needs categorization are:

lotsize : the lot size of a property in square feet > numeric variable we will group it in three levels. Smaller than 3600 sqfeet will be defined as “Low”, 3600-6360 range will be defined as “Med” and finally lotsize bigger than 6360 as “High”.
bedrooms : number of bedrooms > numeric variable we will group it in three levels similar to the lotsize factor definitions will be 1-2 bedrooms as “Low” , 3 bedrooms “Med” and anyone bigger will be “High”.

Our new data set is now :

for (i in 1:546){
if(Prj$lotsize[i] <= 3600){Prj$lotsize2[i] <- "Low"}
if(Prj$lotsize[i] > 3600 & Prj$lotsize[i] <= 6360){Prj$lotsize2[i] <- "Med"}
if(Prj$lotsize[i] > 6360){Prj$lotsize2[i] <- "High"}

if(Prj$bedrooms[i] <= 2){Prj$bedrooms2[i] <- "Low"}
if(Prj$bedrooms[i] == 3){Prj$bedrooms2[i] <- "Med"}
if(Prj$bedrooms[i] > 3){Prj$bedrooms2[i] <- "High"}
} 

Prj$lotsize2 <- as.factor(Prj$lotsize2)
Prj$lotsize2 <- factor(Prj$lotsize2, levels= c("Low","Med","High"))
Prj$bedrooms2 <- as.factor(Prj$bedrooms2)
Prj$bedrooms2 <- factor(Prj$bedrooms2, levels= c("Low","Med","High"))

Prj <- Prj[,c(1,8,12,13,14)]
Prj.Orj <- Prj
str(Prj)

'data.frame':   546 obs. of  5 variables:
 $ price    : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
 $ fullbase : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
 $ prefarea : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ lotsize2 : Factor w/ 3 levels "Low","Med","High": 2 2 1 3 2 2 2 2 2 2 ...
 $ bedrooms2: Factor w/ 3 levels "Low","Med","High": 2 1 2 2 1 2 2 2 2 2 ...

Continuous variables & Response variables

In the data set we have one continuous variable which is also our Response Variable.

price : Sale price of a house

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

By using the given factors 2 two-levels and 2 three-levels for a full factorial analysis normally we should use 36 measurements. However we are going to implement a ( $2^{k}$ ), so our mixed level design should be converted to a ( $2^{6}$ ) design.

According to the Montgomery : “Designs in which some factors have two levels and other factors have three levels can be derived from the table of plus and minus signs for the usual ( $2^{k}$ ) design,” Since we have 2 three-level factors, we need 4 two-level factors for constructing our design as represented in below tables:

Three level factor 1 (X) :

C <- c(-1,1,-1,1)
D <- c(-1,-1,1,1)
X <- c("Low","Med","Med","High")
Factor_X <- as.data.frame(cbind(C,D,X))
kable(Factor_X,align = 'c')

C	D	X
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

Three level factor 2 (Y) :

E <- c(-1,1,-1,1)
F <- c(-1,-1,1,1)
Y <- c("Low","Med","Med","High")
Factor_Y <- as.data.frame(cbind(E,F,Y))
kable(Factor_Y,align = 'c')

E	F	Y
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

If we combine these new factors without existing two level factors a full factorial design ( $2^{6}$ ) for 6 factors with 64 experimental runs can be represented with below table. Also actual treatment combinations are shown on the right side of the table. This given table is ordered and should be randomized during experiment which we will show in the randomization section.

K <- as.data.frame(FrF2(64,6, randomize = FALSE))

creating full factorial with 64 runs ...

A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Full <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(K)
for (i in 1:n){
  if (Full$A[i] == 1){Full$Treatment_A[i] <- "Low"}
  if (Full$A[i] == 2){Full$Treatment_A[i] <- "High"}
 
  if (Full$B[i] == 1){Full$Treatment_B[i] <- "Low"}
  if (Full$B[i] == 2){Full$Treatment_B[i] <- "High"}
  
  if (Full$CD[i] == 1){Full$Treatment_X[i] <- "Low"}
  if (Full$CD[i] == 2){Full$Treatment_X[i] <- "Med"}
  if (Full$CD[i] == 4){Full$Treatment_X[i] <- "High"}
  
  if (Full$EF[i] == 1){Full$Treatment_Y[i] <- "Low"}
  if (Full$EF[i] == 2){Full$Treatment_Y[i] <- "Med"}
  if (Full$EF[i] == 4){Full$Treatment_Y[i] <- "High"}
}
Full_Factorial <- cbind(K,Full$Treatment_A,Full$Treatment_B,Full$Treatment_X,Full$Treatment_Y)
kable(Full_Factorial, align = 'c')

A	B	C	D	E	F	Full$Treatment_A	Full$Treatment_B	Full$Treatment_X	Full$Treatment_Y
-1	-1	-1	-1	-1	-1	Low	Low	Low	Low
1	-1	-1	-1	-1	-1	High	Low	Low	Low
-1	1	-1	-1	-1	-1	Low	High	Low	Low
1	1	-1	-1	-1	-1	High	High	Low	Low
-1	-1	1	-1	-1	-1	Low	Low	Med	Low
1	-1	1	-1	-1	-1	High	Low	Med	Low
-1	1	1	-1	-1	-1	Low	High	Med	Low
1	1	1	-1	-1	-1	High	High	Med	Low
-1	-1	-1	1	-1	-1	Low	Low	Med	Low
1	-1	-1	1	-1	-1	High	Low	Med	Low
-1	1	-1	1	-1	-1	Low	High	Med	Low
1	1	-1	1	-1	-1	High	High	Med	Low
-1	-1	1	1	-1	-1	Low	Low	High	Low
1	-1	1	1	-1	-1	High	Low	High	Low
-1	1	1	1	-1	-1	Low	High	High	Low
1	1	1	1	-1	-1	High	High	High	Low
-1	-1	-1	-1	1	-1	Low	Low	Low	Med
1	-1	-1	-1	1	-1	High	Low	Low	Med
-1	1	-1	-1	1	-1	Low	High	Low	Med
1	1	-1	-1	1	-1	High	High	Low	Med
-1	-1	1	-1	1	-1	Low	Low	Med	Med
1	-1	1	-1	1	-1	High	Low	Med	Med
-1	1	1	-1	1	-1	Low	High	Med	Med
1	1	1	-1	1	-1	High	High	Med	Med
-1	-1	-1	1	1	-1	Low	Low	Med	Med
1	-1	-1	1	1	-1	High	Low	Med	Med
-1	1	-1	1	1	-1	Low	High	Med	Med
1	1	-1	1	1	-1	High	High	Med	Med
-1	-1	1	1	1	-1	Low	Low	High	Med
1	-1	1	1	1	-1	High	Low	High	Med
-1	1	1	1	1	-1	Low	High	High	Med
1	1	1	1	1	-1	High	High	High	Med
-1	-1	-1	-1	-1	1	Low	Low	Low	Med
1	-1	-1	-1	-1	1	High	Low	Low	Med
-1	1	-1	-1	-1	1	Low	High	Low	Med
1	1	-1	-1	-1	1	High	High	Low	Med
-1	-1	1	-1	-1	1	Low	Low	Med	Med
1	-1	1	-1	-1	1	High	Low	Med	Med
-1	1	1	-1	-1	1	Low	High	Med	Med
1	1	1	-1	-1	1	High	High	Med	Med
-1	-1	-1	1	-1	1	Low	Low	Med	Med
1	-1	-1	1	-1	1	High	Low	Med	Med
-1	1	-1	1	-1	1	Low	High	Med	Med
1	1	-1	1	-1	1	High	High	Med	Med
-1	-1	1	1	-1	1	Low	Low	High	Med
1	-1	1	1	-1	1	High	Low	High	Med
-1	1	1	1	-1	1	Low	High	High	Med
1	1	1	1	-1	1	High	High	High	Med
-1	-1	-1	-1	1	1	Low	Low	Low	High
1	-1	-1	-1	1	1	High	Low	Low	High
-1	1	-1	-1	1	1	Low	High	Low	High
1	1	-1	-1	1	1	High	High	Low	High
-1	-1	1	-1	1	1	Low	Low	Med	High
1	-1	1	-1	1	1	High	Low	Med	High
-1	1	1	-1	1	1	Low	High	Med	High
1	1	1	-1	1	1	High	High	Med	High
-1	-1	-1	1	1	1	Low	Low	Med	High
1	-1	-1	1	1	1	High	Low	Med	High
-1	1	-1	1	1	1	Low	High	Med	High
1	1	-1	1	1	1	High	High	Med	High
-1	-1	1	1	1	1	Low	Low	High	High
1	-1	1	1	1	1	High	Low	High	High
-1	1	1	1	1	1	Low	High	High	High
1	1	1	1	1	1	High	High	High	High

Additionally, for Project-3 a $2^{6-3}$ fractional factorial design was requested. This is a 1/8 fraction of $2^{6}$ design with a Resolution III. Unfortunately, with the given design parameters, it is not possible to construct a higher resolution. For higher resolutions like Resolution IV we need 16 runs ( $2^{6-2}$ ) and for Resolution VI we need 32 runs ( $2^{6-1}$ ).

Alias effect of this resolution can be defined as (cited from our course presentations) >> Resolution III : “No main effect is aliased with any other main effect (me). Main effects are aliased with two factor interactions (2fi).”

If we construct our experiment as a fractional factorial design for $2^{6-3}_{III}$ as 1/8 fraction of 6 factors in 8 experimental runs will be:

K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Frac <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(K)
for (i in 1:n){
  if (Frac$A[i] == 1){Frac$Treatment_A[i] <- "Low"}
  if (Frac$A[i] == 2){Frac$Treatment_A[i] <- "High"}
 
  if (Frac$B[i] == 1){Frac$Treatment_B[i] <- "Low"}
  if (Frac$B[i] == 2){Frac$Treatment_B[i] <- "High"}
  
  if (Frac$CD[i] == 1){Frac$Treatment_X[i] <- "Low"}
  if (Frac$CD[i] == 2){Frac$Treatment_X[i] <- "Med"}
  if (Frac$CD[i] == 4){Frac$Treatment_X[i] <- "High"}
  
  if (Frac$EF[i] == 1){Frac$Treatment_Y[i] <- "Low"}
  if (Frac$EF[i] == 2){Frac$Treatment_Y[i] <- "Med"}
  if (Frac$EF[i] == 4){Frac$Treatment_Y[i] <- "High"}
}
Fractional_Factorial <- cbind(K,Frac$Treatment_A,Frac$Treatment_B,Frac$Treatment_X,Frac$Treatment_Y)
kable(Fractional_Factorial, align = 'c')

A	B	C	D	E	F	Frac$Treatment_A	Frac$Treatment_B	Frac$Treatment_X	Frac$Treatment_Y
-1	-1	-1	1	1	1	Low	Low	Med	High
1	-1	-1	-1	-1	1	High	Low	Low	Med
-1	1	-1	-1	1	-1	Low	High	Low	Med
1	1	-1	1	-1	-1	High	High	Med	Low
-1	-1	1	1	-1	-1	Low	Low	High	Low
1	-1	1	-1	1	-1	High	Low	Med	Med
-1	1	1	-1	-1	1	Low	High	Med	Med
1	1	1	1	1	1	High	High	High	High
### IS	YE 602	0 How	to det	ermine	the g	enerator I

Since we know that we are going to have 8 experiments we should start to construct our table as $2^{3}$ :

A <- B <- C <- c(-1, +1)
gen <- expand.grid(A=A, B=B, C=C)
gen <- as.data.frame(gen)
gen

In this initial plan we have 3 factors and 8 experiments, but, we should increase our factor number to 6. From pid package we can check the trade off table that represents $2^{k}$ fractional factorial analysis:

tradeOffTable()

Design generators for $2^{6-3}$ are D=AB, E=AC, F=BC, and if we include them to our design:

gen$D <- gen$A*gen$B
gen$E <- gen$A*gen$C
gen$F <- gen$B*gen$C
gen

   A  B  C  D  E  F
1 -1 -1 -1  1  1  1
2  1 -1 -1 -1 -1  1
3 -1  1 -1 -1  1 -1
4  1  1 -1  1 -1 -1
5 -1 -1  1  1 -1 -1
6  1 -1  1 -1  1 -1
7 -1  1  1 -1 -1  1
8  1  1  1  1  1  1

We mathematically know that if we multiply the factor by itself it will generate the intercept:

gen$A*gen$A

[1] 1 1 1 1 1 1 1 1

gen$B*gen$B

[1] 1 1 1 1 1 1 1 1

gen$C*gen$C

[1] 1 1 1 1 1 1 1 1

gen$D*gen$D

[1] 1 1 1 1 1 1 1 1

gen$E*gen$E

[1] 1 1 1 1 1 1 1 1

gen$F*gen$F

[1] 1 1 1 1 1 1 1 1

If we make this calculation with the given generators by multiplying both sides of the equation: * DD=ABD can be transformed to I=ABD * EE=ACE can be transformed to I=ACE * FF=BCF can be transformed to I=ACE

Also we can find aliases by this method: D=AB >>> multiply by A >>> AD=AAB >>> AD=B D=AB >>> multiply by C >>> CD=ABC

We can control B=AD as below:

gen$A*gen$D

[1] -1 -1  1  1 -1 -1  1  1

gen$B

[1] -1 -1  1  1 -1 -1  1  1

If we continue this method all 2 and 3 way interactions, finally we will find below equation for Intercept (I): I=ABD=ACE=BCDE=BCF=ACDF=ABEF=DEF

Also all these relationships for $2^{6-3}_{III}$ Fractional Factorial Designs can be represented as below :

Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition (Page 707). Wiley. Kindle Edition.

And additionally with FrF2 package same alias relations can be found :

K <- FrF2(8,6, randomize = FALSE)
summary(K)

Call:
FrF2(8, 6, randomize = FALSE)

Experimental design of type  FrF2 
8  runs

Factor settings (scale ends):
   A  B  C  D  E  F
1 -1 -1 -1 -1 -1 -1
2  1  1  1  1  1  1

Design generating information:
$legend
[1] A=A B=B C=C D=D E=E F=F

$generators
[1] D=AB E=AC F=BC


Alias structure:
$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE

$fi2
[1] AF=BE=CD


The design itself:
   A  B  C  D  E  F
1 -1 -1 -1  1  1  1
2  1 -1 -1 -1 -1  1
3 -1  1 -1 -1  1 -1
4  1  1 -1  1 -1 -1
5 -1 -1  1  1 -1 -1
6  1 -1  1 -1  1 -1
7 -1  1  1 -1 -1  1
8  1  1  1  1  1  1
class=design, type= FrF2

During the test section We will not change these aliasing structures and we will use our data set with this structure :

A represents Two level factor
B represents Two level factor
c & D represents Three level factor
E & F represents Three level factor

Next steps of our experiment will follow below outline:

First general descriptive statistics will be studied for all given data
Data will be organized according to $2^{6-3}_{III}$ Fractional Factorial Design
8 random observations will be selected (matching with the design)
A model will be constructed with main effects
This model is going to be tested with ANOVA

What is the rationale for this design?

Data set has been retrieved from Ecdat, and factors had been already chosen and we have more than 500 observations. However in this project I assume we are simulating the very first start of an experimental research and we will try to gather as much as possible information with only 8 runs.

Randomization,Blocking and Replication

Replication, local control(blocking) and randomization are fundamentals of a designing an experiment.Replication and blocking increases the precision in the experiment and randomization helps reducing bias.

Randomize: What is the Randomization Scheme?

Randomization is possible with below three conditions:

Random selection : Selection of your sample set from the population.
Random assignment : Assigning the samples to different groups or treatments
Random execution : Order of the experiments

If all these are realized in a random behavior we can assume the analyze is randomized. In our analyze we do not know the initial collection of the data but we will : * Randomly order our 8 experiments and * Randomly assign our matching measurements form our complete set to the experiment

Replicate: Are there replicates and/or repeated measures?

In this dataset we are not going to use a replication. Because for such case we would need 16 experiments (2x8). On the other hand with 16 experiments a $2^{6-2}$ design would be constructed. Since we are limited with 8 observations I assume the cost, duration or other resources are limited.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Since we are going to use only 8 treatments in our fractional factorial analysis, these initial descriptive analysis will only be for our interest and will not affect our further analysis . Normally the researcher would not have these data beforehand. However, for comparing our findings, factors can be represented with below boxplots and also our response variable histogram will follow:

summary(Prj)

     price        fullbase  prefarea  lotsize2   bedrooms2 
 Min.   : 25000   no :355   no :418   Low :144   Low :138  
 1st Qu.: 49125   yes:191   yes:128   Med :267   Med :301  
 Median : 62000                       High:135   High:107  
 Mean   : 68122                                            
 3rd Qu.: 82000                                            
 Max.   :190000

str(Prj)

'data.frame':   546 obs. of  5 variables:
 $ price    : num  42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
 $ fullbase : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
 $ prefarea : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
 $ lotsize2 : Factor w/ 3 levels "Low","Med","High": 2 2 1 3 2 2 2 2 2 2 ...
 $ bedrooms2: Factor w/ 3 levels "Low","Med","High": 2 1 2 2 1 2 2 2 2 2 ...

head(Prj)

  price fullbase prefarea lotsize2 bedrooms2
1 42000      yes       no      Med       Med
2 38500       no       no      Med       Low
3 49500       no       no      Low       Med
4 60500       no       no     High       Med
5 61000       no       no      Med       Low
6 66000      yes       no      Med       Med

boxplot(Prj$price~Prj$lotsize2, xlab="Lot Size", ylab="House Price")
title("Lot Size(sqf)")
means1 <- by(Prj$price, Prj$lotsize2, mean)
points(1:3, means1, pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")

boxplot(Prj$price~Prj$bedrooms2, xlab="Bedrooms", ylab="House Price")
title("Quantitiy Of Bedrooms")
means1 <- by(Prj$price, Prj$bedrooms2, mean)
points(1:3, means1, pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")

boxplot(Prj$price~Prj$fullbase, xlab="Full Basement", ylab="House Price")
title("Full Basement")
means1 <- by(Prj$price, Prj$fullbase, mean)
points(1:2, means1, pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")

boxplot(Prj$price~Prj$prefarea, xlab="Preferred Area", ylab="House Price")
title("Preferred Area")
means1 <- by(Prj$price, Prj$prefarea, mean)
points(1:2, means1, pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")

Testing

Ordered $2^{6-3}_{III}$ Fractional Factorial Design:

K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Plan <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(K)
for (i in 1:n){
  if (Plan$A[i] == 1){Plan$fullbase[i] <- "no"}
  if (Plan$A[i] == 2){Plan$fullbase[i] <- "yes"}
 
  if (Plan$B[i] == 1){Plan$prefarea[i] <- "no"}
  if (Plan$B[i] == 2){Plan$prefarea[i] <- "yes"}
  
  if (Plan$CD[i] == 1){Plan$lotsize2[i] <- "Low"}
  if (Plan$CD[i] == 2){Plan$lotsize2[i] <- "Med"}
  if (Plan$CD[i] == 4){Plan$lotsize2[i] <- "High"}
  
  if (Plan$EF[i] == 1){Plan$bedrooms2[i] <- "Low"}
  if (Plan$EF[i] == 2){Plan$bedrooms2[i] <- "Med"}
  if (Plan$EF[i] == 4){Plan$bedrooms2[i] <- "High"}
}
Prjplan <- cbind(K,Plan$fullbase,Plan$prefarea,Plan$lotsize2,Plan$bedrooms2)
kable(Prjplan, align = 'c')

A	B	C	D	E	F	Plan$fullbase	Plan$prefarea	Plan$lotsize2	Plan$bedrooms2
-1	-1	-1	1	1	1	no	no	Med	High
1	-1	-1	-1	-1	1	yes	no	Low	Med
-1	1	-1	-1	1	-1	no	yes	Low	Med
1	1	-1	1	-1	-1	yes	yes	Med	Low
-1	-1	1	1	-1	-1	no	no	High	Low
1	-1	1	-1	1	-1	yes	no	Med	Med
-1	1	1	-1	-1	1	no	yes	Med	Med
1	1	1	1	1	1	yes	yes	High	High

Randomized

Randomized $2^{6-3}_{III}$ Fractional Factorial Design :

plan.design <- FrF2(8,6, randomize = TRUE)
K <- as.data.frame(plan.design)
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Plan <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(K)
for (i in 1:n){
  if (Plan$A[i] == 1){Plan$fullbase[i] <- "no"}
  if (Plan$A[i] == 2){Plan$fullbase[i] <- "yes"}
 
  if (Plan$B[i] == 1){Plan$prefarea[i] <- "no"}
  if (Plan$B[i] == 2){Plan$prefarea[i] <- "yes"}
  
  if (Plan$CD[i] == 1){Plan$lotsize2[i] <- "Low"}
  if (Plan$CD[i] == 2){Plan$lotsize2[i] <- "Med"}
  if (Plan$CD[i] == 4){Plan$lotsize2[i] <- "High"}
  
  if (Plan$EF[i] == 1){Plan$bedrooms2[i] <- "Low"}
  if (Plan$EF[i] == 2){Plan$bedrooms2[i] <- "Med"}
  if (Plan$EF[i] == 4){Plan$bedrooms2[i] <- "High"}
}
Prjplan <- cbind(K,Plan$fullbase,Plan$prefarea,Plan$lotsize2,Plan$bedrooms2)
summary(Prjplan)

  A      B      C      D      E      F     Plan$fullbase Plan$prefarea
 -1:4   -1:4   -1:4   -1:4   -1:4   -1:4   no :4         no :4        
 1 :4   1 :4   1 :4   1 :4   1 :4   1 :4   yes:4         yes:4        
                                                                      
 Plan$lotsize2 Plan$bedrooms2
 High:2        High:2        
 Low :2        Low :2        
 Med :4        Med :4

kable(Prjplan, align = 'c')

A	B	C	D	E	F	Plan$fullbase	Plan$prefarea	Plan$lotsize2	Plan$bedrooms2
1	-1	1	-1	1	-1	yes	no	Med	Med
1	-1	-1	-1	-1	1	yes	no	Low	Med
1	1	-1	1	-1	-1	yes	yes	Med	Low
-1	-1	-1	1	1	1	no	no	Med	High
-1	-1	1	1	-1	-1	no	no	High	Low
-1	1	-1	-1	1	-1	no	yes	Low	Med
-1	1	1	-1	-1	1	no	yes	Med	Med
1	1	1	1	1	1	yes	yes	High	High

We will create reference columns both for our complete data and 8 experiments. These refcol variables are going to be used for matching experiments and data set and randomly assign a result to the experiment run.

Prjplan2<-unite(Prjplan, refcol , c(7,8,9,10), remove=FALSE)
kable(Prjplan2, align = 'c')

A	B	C	D	E	F	refcol	Plan$fullbase	Plan$prefarea	Plan$lotsize2	Plan$bedrooms2
1	-1	1	-1	1	-1	yes_no_Med_Med	yes	no	Med	Med
1	-1	-1	-1	-1	1	yes_no_Low_Med	yes	no	Low	Med
1	1	-1	1	-1	-1	yes_yes_Med_Low	yes	yes	Med	Low
-1	-1	-1	1	1	1	no_no_Med_High	no	no	Med	High
-1	-1	1	1	-1	-1	no_no_High_Low	no	no	High	Low
-1	1	-1	-1	1	-1	no_yes_Low_Med	no	yes	Low	Med
-1	1	1	-1	-1	1	no_yes_Med_Med	no	yes	Med	Med
1	1	1	1	1	1	yes_yes_High_High	yes	yes	High	High

Plan.refcol <- c(Prjplan2$refcol)


Prj<-unite(Prj, refcol , c(2,3,4,5), remove=FALSE)
head(Prj)

  price         refcol fullbase prefarea lotsize2 bedrooms2
1 42000 yes_no_Med_Med      yes       no      Med       Med
2 38500  no_no_Med_Low       no       no      Med       Low
3 49500  no_no_Low_Med       no       no      Low       Med
4 60500 no_no_High_Med       no       no     High       Med
5 61000  no_no_Med_Low       no       no      Med       Low
6 66000 yes_no_Med_Med      yes       no      Med       Med

tail(Prj)

     price         refcol fullbase prefarea lotsize2 bedrooms2
541  85000 no_no_High_Med       no       no     High       Med
542  91500  no_no_Med_Med       no       no      Med       Med
543  94000  no_no_Med_Med       no       no      Med       Med
544 103000  no_no_Med_Med       no       no      Med       Med
545 105000  no_no_Med_Med       no       no      Med       Med
546 105000  no_no_Med_Med       no       no      Med       Med

for (j in 1:546){if (Prjplan2$refcol[1] == Prj$refcol[j]){Prj$RO1[j] <- Prj$price[j]}else{Prj$RO1[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[2] == Prj$refcol[j]){Prj$RO2[j] <- Prj$price[j]}else{Prj$RO2[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[3] == Prj$refcol[j]){Prj$RO3[j] <- Prj$price[j]}else{Prj$RO3[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[4] == Prj$refcol[j]){Prj$RO4[j] <- Prj$price[j]}else{Prj$RO4[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[5] == Prj$refcol[j]){Prj$RO5[j] <- Prj$price[j]}else{Prj$RO5[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[6] == Prj$refcol[j]){Prj$RO6[j] <- Prj$price[j]}else{Prj$RO6[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[7] == Prj$refcol[j]){Prj$RO7[j] <- Prj$price[j]}else{Prj$RO7[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[8] == Prj$refcol[j]){Prj$RO8[j] <- Prj$price[j]}else{Prj$RO8[j] <- 0}}
R1 <- sample.rows(subset(Prj, RO1 > 0), 1)
R2 <- sample.rows(subset(Prj, RO2 > 0), 1)
R3 <- sample.rows(subset(Prj, RO3 > 0), 1)
R4 <- sample.rows(subset(Prj, RO4 > 0), 1)
R5 <- sample.rows(subset(Prj, RO5 > 0), 1)
R6 <- sample.rows(subset(Prj, RO6 > 0), 1)
R7 <- sample.rows(subset(Prj, RO7 > 0), 1)
R8 <- sample.rows(subset(Prj, RO8 > 0), 1)

RV <- c(R1$RO1,R2$RO2,R3$RO3,R4$RO4,R5$RO5,R6$RO6,R7$RO7,R8$RO8)

Our response variables randomly selected for our experiment are:

RV

[1]  66000  35000  57000  48000  45000  52000  78000 174500

We will add these Response Variables to our design:

plan.resp <- add.response(plan.design, RV)
summary(plan.resp)

Call:
FrF2(8, 6, randomize = TRUE)

Experimental design of type  FrF2 
8  runs

Factor settings (scale ends):
   A  B  C  D  E  F
1 -1 -1 -1 -1 -1 -1
2  1  1  1  1  1  1

Responses:
[1] RV

Design generating information:
$legend
[1] A=A B=B C=C D=D E=E F=F

$generators
[1] D=AB E=AC F=BC


Alias structure:
$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE

$fi2
[1] AF=BE=CD


The design itself:
   A  B  C  D  E  F     RV
1  1 -1  1 -1  1 -1  66000
2  1 -1 -1 -1 -1  1  35000
3  1  1 -1  1 -1 -1  57000
4 -1 -1 -1  1  1  1  48000
5 -1 -1  1  1 -1 -1  45000
6 -1  1 -1 -1  1 -1  52000
7 -1  1  1 -1 -1  1  78000
8  1  1  1  1  1  1 174500
class=design, type= FrF2

Now we can analyze our $2^{6-3}_{III}$ fractional factorial experiment with FrF2 package :

summary(lm(plan.resp))

Number of observations used: 8 
Formula:
RV ~ (A + B + C + D + E + F)^2

Call:
lm.default(formula = fo, data = model.frame(fo, data = formula))

Residuals:
ALL 8 residuals are 0: no residual degrees of freedom!

Coefficients: (14 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)
(Intercept)    69438         NA      NA       NA
A1             13688         NA      NA       NA
B1             20938         NA      NA       NA
C1             21438         NA      NA       NA
D1             11688         NA      NA       NA
E1             15688         NA      NA       NA
F1             14438         NA      NA       NA
A1:F1           7188         NA      NA       NA

Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared:      1, Adjusted R-squared:    NaN 
F-statistic:   NaN on 7 and 0 DF,  p-value: NA

Main effects can be represented as below :

MEPlot(plan.resp, abbrev = 5, cex.xax = 1.6, cex.main = 2)

Interaction effects can be represented as below with alias numbering, also second plot shows selected (C,D,E,F) interactions :

IAPlot(plan.resp, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)

IAPlot(plan.resp, abbrev = 5, select = 4:6, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)

According to these findings our screening design is representing that all main effects are large and should be included in the model. Now we can compare our finding with the full data set :

In this case our null hypothesis can be set as house price does not affected by full basement, house being in a preferred ares, lot size of the house, number of bedrooms.

DNM2=lm(price~fullbase+prefarea+lotsize2+bedrooms2,data=Prj.Orj)
anova(DNM2)

Analysis of Variance Table

Response: price
           Df     Sum Sq    Mean Sq F value    Pr(>F)    
fullbase    1 1.3476e+10 1.3476e+10  31.412 3.334e-08 ***
prefarea    1 3.3656e+10 3.3656e+10  78.454 < 2.2e-16 ***
lotsize2    2 7.7017e+10 3.8508e+10  89.765 < 2.2e-16 ***
bedrooms2   2 3.3228e+10 1.6614e+10  38.728 < 2.2e-16 ***
Residuals 539 2.3123e+11 4.2899e+08                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Also these findings represents a statistically significant effect and we can reject our null hypothesis, but lastly we should conduct model adequacy checking :

hist(Prj$price, breaks = 20)

qqnorm(residuals(DNM2))
qqline(residuals(DNM2))

par(mfrow=c(2,2))
plot(DNM2)

The results of ANOVA are based on the assumption that the data is normally distributed and randomly selected. However, q-q plots and scatter plots show these assumptions had not been fulfilled. Also the histogram of the response variable shows a right skewed distribution. With these findings we conclude that further analysis should be conducted.

4. References to the literature

http://www2.stat.duke.edu/~banks/218-lectures.dir/spec-lect3.pdf

Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition (Page 707). Wiley. Kindle Edition.

Ulrike Gromping (2014). R Package FrF2 for Creating and Analyzing Fractional Factorial 2-Level Designs. Journal of Statistical Software, January 2014, Volume 56, Issue 1.

Kevin DUNN (2015. Experiments 4G - Fractional factorials: using aliasing notation to plan experiments. https://www.youtube.com/watch?v=zrZS-zovKSc

http://www2.uaem.mx/r-mirror/web/packages/planor/vignettes/planorVignette.pdf

5. Appendices

https://cran.r-project.org/web/packages/Ecdat/Ecdat.pdf

Project_3