After examining the available data sets in Ecdat
, Housing
data set has been selected.
Installation of the necessary packages and selection of the data-set is shown below:
#install.packages("pid")
#install.packages("Ecdat")
library(pid)
library(Ecdat)
library(FrF2)
library(knitr)
library(tidyr)
library(kimisc)
Warning: package 'kimisc' was built under R version 3.3.2
Prj <- Housing
Housing
data set was collected for Fractional Factorial Design Project and consist of 12 variables and 546 observations. It represents Sales Prices of Houses in the City of Windsor in 1987 and the main source of the data is Journal of Applied Econometrics.
Definitions of the variables in data set are as below:
head(Prj)
price lotsize bedrooms bathrms stories driveway recroom fullbase gashw
1 42000 5850 3 1 2 yes no yes no
2 38500 4000 2 1 1 yes no no no
3 49500 3060 3 1 1 yes no no no
4 60500 6650 3 1 2 yes yes no no
5 61000 6360 2 1 1 yes no no no
6 66000 4160 3 1 1 yes yes yes no
airco garagepl prefarea
1 no 1 no
2 no 0 no
3 no 0 no
4 no 0 no
5 no 0 no
6 yes 0 no
Factors and levels of each factor are listed as below:
str(Prj)
'data.frame': 546 obs. of 12 variables:
$ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
$ lotsize : num 5850 4000 3060 6650 6360 4160 3880 4160 4800 5500 ...
$ bedrooms: num 3 2 3 3 2 3 3 3 3 3 ...
$ bathrms : num 1 1 1 1 1 1 2 1 1 2 ...
$ stories : num 2 1 1 2 1 1 2 3 1 4 ...
$ driveway: Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...
$ recroom : Factor w/ 2 levels "no","yes": 1 1 1 2 1 2 1 1 2 2 ...
$ fullbase: Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
$ gashw : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ airco : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 1 1 1 2 ...
$ garagepl: num 1 0 0 0 0 0 2 0 0 1 ...
$ prefarea: Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
However, for our project we should select two 2-level factors and two 3-level factors in the given data and re-organize some of the variables according to this constrain.
2-level factors selected are:
3-level factors selected and needs categorization are:
Our new data set is now :
for (i in 1:546){
if(Prj$lotsize[i] <= 3600){Prj$lotsize2[i] <- "Low"}
if(Prj$lotsize[i] > 3600 & Prj$lotsize[i] <= 6360){Prj$lotsize2[i] <- "Med"}
if(Prj$lotsize[i] > 6360){Prj$lotsize2[i] <- "High"}
if(Prj$bedrooms[i] <= 2){Prj$bedrooms2[i] <- "Low"}
if(Prj$bedrooms[i] == 3){Prj$bedrooms2[i] <- "Med"}
if(Prj$bedrooms[i] > 3){Prj$bedrooms2[i] <- "High"}
}
Prj$lotsize2 <- as.factor(Prj$lotsize2)
Prj$lotsize2 <- factor(Prj$lotsize2, levels= c("Low","Med","High"))
Prj$bedrooms2 <- as.factor(Prj$bedrooms2)
Prj$bedrooms2 <- factor(Prj$bedrooms2, levels= c("Low","Med","High"))
Prj <- Prj[,c(1,8,12,13,14)]
Prj.Orj <- Prj
str(Prj)
'data.frame': 546 obs. of 5 variables:
$ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
$ fullbase : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
$ prefarea : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ lotsize2 : Factor w/ 3 levels "Low","Med","High": 2 2 1 3 2 2 2 2 2 2 ...
$ bedrooms2: Factor w/ 3 levels "Low","Med","High": 2 1 2 2 1 2 2 2 2 2 ...
In the data set we have one continuous variable which is also our Response Variable.
By using the given factors 2 two-levels and 2 three-levels for a full factorial analysis normally we should use 36 measurements. However we are going to implement a (2k), so our mixed level design should be converted to a (26) design.
According to the Montgomery : “Designs in which some factors have two levels and other factors have three levels can be derived from the table of plus and minus signs for the usual (2k) design,” Since we have 2 three-level factors, we need 4 two-level factors for constructing our design as represented in below tables:
Three level factor 1 (X) :
C <- c(-1,1,-1,1)
D <- c(-1,-1,1,1)
X <- c("Low","Med","Med","High")
Factor_X <- as.data.frame(cbind(C,D,X))
kable(Factor_X,align = 'c')
C | D | X |
---|---|---|
-1 | -1 | Low |
1 | -1 | Med |
-1 | 1 | Med |
1 | 1 | High |
Three level factor 2 (Y) :
E <- c(-1,1,-1,1)
F <- c(-1,-1,1,1)
Y <- c("Low","Med","Med","High")
Factor_Y <- as.data.frame(cbind(E,F,Y))
kable(Factor_Y,align = 'c')
E | F | Y |
---|---|---|
-1 | -1 | Low |
1 | -1 | Med |
-1 | 1 | Med |
1 | 1 | High |
If we combine these new factors without existing two level factors a full factorial design (26) for 6 factors with 64 experimental runs can be represented with below table. Also actual treatment combinations are shown on the right side of the table. This given table is ordered and should be randomized during experiment which we will show in the randomization section.
K <- as.data.frame(FrF2(64,6, randomize = FALSE))
creating full factorial with 64 runs ...
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Full <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(K)
for (i in 1:n){
if (Full$A[i] == 1){Full$Treatment_A[i] <- "Low"}
if (Full$A[i] == 2){Full$Treatment_A[i] <- "High"}
if (Full$B[i] == 1){Full$Treatment_B[i] <- "Low"}
if (Full$B[i] == 2){Full$Treatment_B[i] <- "High"}
if (Full$CD[i] == 1){Full$Treatment_X[i] <- "Low"}
if (Full$CD[i] == 2){Full$Treatment_X[i] <- "Med"}
if (Full$CD[i] == 4){Full$Treatment_X[i] <- "High"}
if (Full$EF[i] == 1){Full$Treatment_Y[i] <- "Low"}
if (Full$EF[i] == 2){Full$Treatment_Y[i] <- "Med"}
if (Full$EF[i] == 4){Full$Treatment_Y[i] <- "High"}
}
Full_Factorial <- cbind(K,Full$Treatment_A,Full$Treatment_B,Full$Treatment_X,Full$Treatment_Y)
kable(Full_Factorial, align = 'c')
A | B | C | D | E | F | Full$Treatment_A | Full$Treatment_B | Full$Treatment_X | Full$Treatment_Y |
---|---|---|---|---|---|---|---|---|---|
-1 | -1 | -1 | -1 | -1 | -1 | Low | Low | Low | Low |
1 | -1 | -1 | -1 | -1 | -1 | High | Low | Low | Low |
-1 | 1 | -1 | -1 | -1 | -1 | Low | High | Low | Low |
1 | 1 | -1 | -1 | -1 | -1 | High | High | Low | Low |
-1 | -1 | 1 | -1 | -1 | -1 | Low | Low | Med | Low |
1 | -1 | 1 | -1 | -1 | -1 | High | Low | Med | Low |
-1 | 1 | 1 | -1 | -1 | -1 | Low | High | Med | Low |
1 | 1 | 1 | -1 | -1 | -1 | High | High | Med | Low |
-1 | -1 | -1 | 1 | -1 | -1 | Low | Low | Med | Low |
1 | -1 | -1 | 1 | -1 | -1 | High | Low | Med | Low |
-1 | 1 | -1 | 1 | -1 | -1 | Low | High | Med | Low |
1 | 1 | -1 | 1 | -1 | -1 | High | High | Med | Low |
-1 | -1 | 1 | 1 | -1 | -1 | Low | Low | High | Low |
1 | -1 | 1 | 1 | -1 | -1 | High | Low | High | Low |
-1 | 1 | 1 | 1 | -1 | -1 | Low | High | High | Low |
1 | 1 | 1 | 1 | -1 | -1 | High | High | High | Low |
-1 | -1 | -1 | -1 | 1 | -1 | Low | Low | Low | Med |
1 | -1 | -1 | -1 | 1 | -1 | High | Low | Low | Med |
-1 | 1 | -1 | -1 | 1 | -1 | Low | High | Low | Med |
1 | 1 | -1 | -1 | 1 | -1 | High | High | Low | Med |
-1 | -1 | 1 | -1 | 1 | -1 | Low | Low | Med | Med |
1 | -1 | 1 | -1 | 1 | -1 | High | Low | Med | Med |
-1 | 1 | 1 | -1 | 1 | -1 | Low | High | Med | Med |
1 | 1 | 1 | -1 | 1 | -1 | High | High | Med | Med |
-1 | -1 | -1 | 1 | 1 | -1 | Low | Low | Med | Med |
1 | -1 | -1 | 1 | 1 | -1 | High | Low | Med | Med |
-1 | 1 | -1 | 1 | 1 | -1 | Low | High | Med | Med |
1 | 1 | -1 | 1 | 1 | -1 | High | High | Med | Med |
-1 | -1 | 1 | 1 | 1 | -1 | Low | Low | High | Med |
1 | -1 | 1 | 1 | 1 | -1 | High | Low | High | Med |
-1 | 1 | 1 | 1 | 1 | -1 | Low | High | High | Med |
1 | 1 | 1 | 1 | 1 | -1 | High | High | High | Med |
-1 | -1 | -1 | -1 | -1 | 1 | Low | Low | Low | Med |
1 | -1 | -1 | -1 | -1 | 1 | High | Low | Low | Med |
-1 | 1 | -1 | -1 | -1 | 1 | Low | High | Low | Med |
1 | 1 | -1 | -1 | -1 | 1 | High | High | Low | Med |
-1 | -1 | 1 | -1 | -1 | 1 | Low | Low | Med | Med |
1 | -1 | 1 | -1 | -1 | 1 | High | Low | Med | Med |
-1 | 1 | 1 | -1 | -1 | 1 | Low | High | Med | Med |
1 | 1 | 1 | -1 | -1 | 1 | High | High | Med | Med |
-1 | -1 | -1 | 1 | -1 | 1 | Low | Low | Med | Med |
1 | -1 | -1 | 1 | -1 | 1 | High | Low | Med | Med |
-1 | 1 | -1 | 1 | -1 | 1 | Low | High | Med | Med |
1 | 1 | -1 | 1 | -1 | 1 | High | High | Med | Med |
-1 | -1 | 1 | 1 | -1 | 1 | Low | Low | High | Med |
1 | -1 | 1 | 1 | -1 | 1 | High | Low | High | Med |
-1 | 1 | 1 | 1 | -1 | 1 | Low | High | High | Med |
1 | 1 | 1 | 1 | -1 | 1 | High | High | High | Med |
-1 | -1 | -1 | -1 | 1 | 1 | Low | Low | Low | High |
1 | -1 | -1 | -1 | 1 | 1 | High | Low | Low | High |
-1 | 1 | -1 | -1 | 1 | 1 | Low | High | Low | High |
1 | 1 | -1 | -1 | 1 | 1 | High | High | Low | High |
-1 | -1 | 1 | -1 | 1 | 1 | Low | Low | Med | High |
1 | -1 | 1 | -1 | 1 | 1 | High | Low | Med | High |
-1 | 1 | 1 | -1 | 1 | 1 | Low | High | Med | High |
1 | 1 | 1 | -1 | 1 | 1 | High | High | Med | High |
-1 | -1 | -1 | 1 | 1 | 1 | Low | Low | Med | High |
1 | -1 | -1 | 1 | 1 | 1 | High | Low | Med | High |
-1 | 1 | -1 | 1 | 1 | 1 | Low | High | Med | High |
1 | 1 | -1 | 1 | 1 | 1 | High | High | Med | High |
-1 | -1 | 1 | 1 | 1 | 1 | Low | Low | High | High |
1 | -1 | 1 | 1 | 1 | 1 | High | Low | High | High |
-1 | 1 | 1 | 1 | 1 | 1 | Low | High | High | High |
1 | 1 | 1 | 1 | 1 | 1 | High | High | High | High |
Additionally, for Project-3 a 26−3 fractional factorial design was requested. This is a 1/8 fraction of 26 design with a Resolution III. Unfortunately, with the given design parameters, it is not possible to construct a higher resolution. For higher resolutions like Resolution IV we need 16 runs (26−2) and for Resolution VI we need 32 runs (26−1).
Alias effect of this resolution can be defined as (cited from our course presentations) >> Resolution III : “No main effect is aliased with any other main effect (me). Main effects are aliased with two factor interactions (2fi).”
If we construct our experiment as a fractional factorial design for 26−3III as 1/8 fraction of 6 factors in 8 experimental runs will be:
K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Frac <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(K)
for (i in 1:n){
if (Frac$A[i] == 1){Frac$Treatment_A[i] <- "Low"}
if (Frac$A[i] == 2){Frac$Treatment_A[i] <- "High"}
if (Frac$B[i] == 1){Frac$Treatment_B[i] <- "Low"}
if (Frac$B[i] == 2){Frac$Treatment_B[i] <- "High"}
if (Frac$CD[i] == 1){Frac$Treatment_X[i] <- "Low"}
if (Frac$CD[i] == 2){Frac$Treatment_X[i] <- "Med"}
if (Frac$CD[i] == 4){Frac$Treatment_X[i] <- "High"}
if (Frac$EF[i] == 1){Frac$Treatment_Y[i] <- "Low"}
if (Frac$EF[i] == 2){Frac$Treatment_Y[i] <- "Med"}
if (Frac$EF[i] == 4){Frac$Treatment_Y[i] <- "High"}
}
Fractional_Factorial <- cbind(K,Frac$Treatment_A,Frac$Treatment_B,Frac$Treatment_X,Frac$Treatment_Y)
kable(Fractional_Factorial, align = 'c')
A | B | C | D | E | F | Frac$Treatment_A | Frac$Treatment_B | Frac$Treatment_X | Frac$Treatment_Y |
---|---|---|---|---|---|---|---|---|---|
-1 | -1 | -1 | 1 | 1 | 1 | Low | Low | Med | High |
1 | -1 | -1 | -1 | -1 | 1 | High | Low | Low | Med |
-1 | 1 | -1 | -1 | 1 | -1 | Low | High | Low | Med |
1 | 1 | -1 | 1 | -1 | -1 | High | High | Med | Low |
-1 | -1 | 1 | 1 | -1 | -1 | Low | Low | High | Low |
1 | -1 | 1 | -1 | 1 | -1 | High | Low | Med | Med |
-1 | 1 | 1 | -1 | -1 | 1 | Low | High | Med | Med |
1 | 1 | 1 | 1 | 1 | 1 | High | High | High | High |
### IS | YE 602 | 0 How | to det | ermine | the g | enerator I |
Since we know that we are going to have 8 experiments we should start to construct our table as 23 :
A <- B <- C <- c(-1, +1)
gen <- expand.grid(A=A, B=B, C=C)
gen <- as.data.frame(gen)
gen
A B C
1 -1 -1 -1
2 1 -1 -1
3 -1 1 -1
4 1 1 -1
5 -1 -1 1
6 1 -1 1
7 -1 1 1
8 1 1 1
In this initial plan we have 3 factors and 8 experiments, but, we should increase our factor number to 6. From pid
package we can check the trade off table that represents 2k fractional factorial analysis:
tradeOffTable()
Design generators for 26−3 are D=AB, E=AC, F=BC, and if we include them to our design:
gen$D <- gen$A*gen$B
gen$E <- gen$A*gen$C
gen$F <- gen$B*gen$C
gen
A B C D E F
1 -1 -1 -1 1 1 1
2 1 -1 -1 -1 -1 1
3 -1 1 -1 -1 1 -1
4 1 1 -1 1 -1 -1
5 -1 -1 1 1 -1 -1
6 1 -1 1 -1 1 -1
7 -1 1 1 -1 -1 1
8 1 1 1 1 1 1
We mathematically know that if we multiply the factor by itself it will generate the intercept:
gen$A*gen$A
[1] 1 1 1 1 1 1 1 1
gen$B*gen$B
[1] 1 1 1 1 1 1 1 1
gen$C*gen$C
[1] 1 1 1 1 1 1 1 1
gen$D*gen$D
[1] 1 1 1 1 1 1 1 1
gen$E*gen$E
[1] 1 1 1 1 1 1 1 1
gen$F*gen$F
[1] 1 1 1 1 1 1 1 1
If we make this calculation with the given generators by multiplying both sides of the equation: * DD=ABD can be transformed to I=ABD * EE=ACE can be transformed to I=ACE * FF=BCF can be transformed to I=ACE
Also we can find aliases by this method: D=AB >>> multiply by A >>> AD=AAB >>> AD=B D=AB >>> multiply by C >>> CD=ABC
We can control B=AD as below:
gen$A*gen$D
[1] -1 -1 1 1 -1 -1 1 1
gen$B
[1] -1 -1 1 1 -1 -1 1 1
If we continue this method all 2 and 3 way interactions, finally we will find below equation for Intercept (I): I=ABD=ACE=BCDE=BCF=ACDF=ABEF=DEF
Also all these relationships for 26−3III Fractional Factorial Designs can be represented as below :
Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition (Page 707). Wiley. Kindle Edition.
And additionally with FrF2 package same alias relations can be found :
K <- FrF2(8,6, randomize = FALSE)
summary(K)
Call:
FrF2(8, 6, randomize = FALSE)
Experimental design of type FrF2
8 runs
Factor settings (scale ends):
A B C D E F
1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1
Design generating information:
$legend
[1] A=A B=B C=C D=D E=E F=F
$generators
[1] D=AB E=AC F=BC
Alias structure:
$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
$fi2
[1] AF=BE=CD
The design itself:
A B C D E F
1 -1 -1 -1 1 1 1
2 1 -1 -1 -1 -1 1
3 -1 1 -1 -1 1 -1
4 1 1 -1 1 -1 -1
5 -1 -1 1 1 -1 -1
6 1 -1 1 -1 1 -1
7 -1 1 1 -1 -1 1
8 1 1 1 1 1 1
class=design, type= FrF2
During the test section We will not change these aliasing structures and we will use our data set with this structure :
Next steps of our experiment will follow below outline:
Data set has been retrieved from Ecdat, and factors had been already chosen and we have more than 500 observations. However in this project I assume we are simulating the very first start of an experimental research and we will try to gather as much as possible information with only 8 runs.
Replication, local control(blocking) and randomization are fundamentals of a designing an experiment.Replication and blocking increases the precision in the experiment and randomization helps reducing bias.
Randomization is possible with below three conditions:
If all these are realized in a random behavior we can assume the analyze is randomized. In our analyze we do not know the initial collection of the data but we will : * Randomly order our 8 experiments and * Randomly assign our matching measurements form our complete set to the experiment
In this dataset we are not going to use a replication. Because for such case we would need 16 experiments (2x8). On the other hand with 16 experiments a 26−2 design would be constructed. Since we are limited with 8 observations I assume the cost, duration or other resources are limited.
Since we are going to use only 8 treatments in our fractional factorial analysis, these initial descriptive analysis will only be for our interest and will not affect our further analysis . Normally the researcher would not have these data beforehand. However, for comparing our findings, factors can be represented with below boxplots and also our response variable histogram will follow:
summary(Prj)
price fullbase prefarea lotsize2 bedrooms2
Min. : 25000 no :355 no :418 Low :144 Low :138
1st Qu.: 49125 yes:191 yes:128 Med :267 Med :301
Median : 62000 High:135 High:107
Mean : 68122
3rd Qu.: 82000
Max. :190000
str(Prj)
'data.frame': 546 obs. of 5 variables:
$ price : num 42000 38500 49500 60500 61000 66000 66000 69000 83800 88500 ...
$ fullbase : Factor w/ 2 levels "no","yes": 2 1 1 1 1 2 2 1 2 1 ...
$ prefarea : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
$ lotsize2 : Factor w/ 3 levels "Low","Med","High": 2 2 1 3 2 2 2 2 2 2 ...
$ bedrooms2: Factor w/ 3 levels "Low","Med","High": 2 1 2 2 1 2 2 2 2 2 ...
head(Prj)
price fullbase prefarea lotsize2 bedrooms2
1 42000 yes no Med Med
2 38500 no no Med Low
3 49500 no no Low Med
4 60500 no no High Med
5 61000 no no Med Low
6 66000 yes no Med Med
boxplot(Prj$price~Prj$lotsize2, xlab="Lot Size", ylab="House Price")
title("Lot Size(sqf)")
means1 <- by(Prj$price, Prj$lotsize2, mean)
points(1:3, means1, pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(Prj$price~Prj$bedrooms2, xlab="Bedrooms", ylab="House Price")
title("Quantitiy Of Bedrooms")
means1 <- by(Prj$price, Prj$bedrooms2, mean)
points(1:3, means1, pch = 23, cex = 2, bg = "red")
text(1:3 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(Prj$price~Prj$fullbase, xlab="Full Basement", ylab="House Price")
title("Full Basement")
means1 <- by(Prj$price, Prj$fullbase, mean)
points(1:2, means1, pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
boxplot(Prj$price~Prj$prefarea, xlab="Preferred Area", ylab="House Price")
title("Preferred Area")
means1 <- by(Prj$price, Prj$prefarea, mean)
points(1:2, means1, pch = 23, cex = 2, bg = "red")
text(1:2 - 0.1, means1,labels = format(means1, format = "f", digits = 2),pos = 3, cex = 0.9, col = "red")
Ordered 26−3III Fractional Factorial Design:
K <- as.data.frame(FrF2(8,6, randomize = FALSE))
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Plan <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(K)
for (i in 1:n){
if (Plan$A[i] == 1){Plan$fullbase[i] <- "no"}
if (Plan$A[i] == 2){Plan$fullbase[i] <- "yes"}
if (Plan$B[i] == 1){Plan$prefarea[i] <- "no"}
if (Plan$B[i] == 2){Plan$prefarea[i] <- "yes"}
if (Plan$CD[i] == 1){Plan$lotsize2[i] <- "Low"}
if (Plan$CD[i] == 2){Plan$lotsize2[i] <- "Med"}
if (Plan$CD[i] == 4){Plan$lotsize2[i] <- "High"}
if (Plan$EF[i] == 1){Plan$bedrooms2[i] <- "Low"}
if (Plan$EF[i] == 2){Plan$bedrooms2[i] <- "Med"}
if (Plan$EF[i] == 4){Plan$bedrooms2[i] <- "High"}
}
Prjplan <- cbind(K,Plan$fullbase,Plan$prefarea,Plan$lotsize2,Plan$bedrooms2)
kable(Prjplan, align = 'c')
A | B | C | D | E | F | Plan$fullbase | Plan$prefarea | Plan$lotsize2 | Plan$bedrooms2 |
---|---|---|---|---|---|---|---|---|---|
-1 | -1 | -1 | 1 | 1 | 1 | no | no | Med | High |
1 | -1 | -1 | -1 | -1 | 1 | yes | no | Low | Med |
-1 | 1 | -1 | -1 | 1 | -1 | no | yes | Low | Med |
1 | 1 | -1 | 1 | -1 | -1 | yes | yes | Med | Low |
-1 | -1 | 1 | 1 | -1 | -1 | no | no | High | Low |
1 | -1 | 1 | -1 | 1 | -1 | yes | no | Med | Med |
-1 | 1 | 1 | -1 | -1 | 1 | no | yes | Med | Med |
1 | 1 | 1 | 1 | 1 | 1 | yes | yes | High | High |
Randomized 26−3III Fractional Factorial Design :
plan.design <- FrF2(8,6, randomize = TRUE)
K <- as.data.frame(plan.design)
A <- c(as.integer(K[,1]))
B <- c(as.integer(K[,2]))
C <- c(as.integer(K[,3]))
D <- c(as.integer(K[,4]))
CD <- C*D
E <- c(as.integer(K[,5]))
F <- c(as.integer(K[,6]))
EF <- E*F
Plan <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(K)
for (i in 1:n){
if (Plan$A[i] == 1){Plan$fullbase[i] <- "no"}
if (Plan$A[i] == 2){Plan$fullbase[i] <- "yes"}
if (Plan$B[i] == 1){Plan$prefarea[i] <- "no"}
if (Plan$B[i] == 2){Plan$prefarea[i] <- "yes"}
if (Plan$CD[i] == 1){Plan$lotsize2[i] <- "Low"}
if (Plan$CD[i] == 2){Plan$lotsize2[i] <- "Med"}
if (Plan$CD[i] == 4){Plan$lotsize2[i] <- "High"}
if (Plan$EF[i] == 1){Plan$bedrooms2[i] <- "Low"}
if (Plan$EF[i] == 2){Plan$bedrooms2[i] <- "Med"}
if (Plan$EF[i] == 4){Plan$bedrooms2[i] <- "High"}
}
Prjplan <- cbind(K,Plan$fullbase,Plan$prefarea,Plan$lotsize2,Plan$bedrooms2)
summary(Prjplan)
A B C D E F Plan$fullbase Plan$prefarea
-1:4 -1:4 -1:4 -1:4 -1:4 -1:4 no :4 no :4
1 :4 1 :4 1 :4 1 :4 1 :4 1 :4 yes:4 yes:4
Plan$lotsize2 Plan$bedrooms2
High:2 High:2
Low :2 Low :2
Med :4 Med :4
kable(Prjplan, align = 'c')
A | B | C | D | E | F | Plan$fullbase | Plan$prefarea | Plan$lotsize2 | Plan$bedrooms2 |
---|---|---|---|---|---|---|---|---|---|
1 | -1 | 1 | -1 | 1 | -1 | yes | no | Med | Med |
1 | -1 | -1 | -1 | -1 | 1 | yes | no | Low | Med |
1 | 1 | -1 | 1 | -1 | -1 | yes | yes | Med | Low |
-1 | -1 | -1 | 1 | 1 | 1 | no | no | Med | High |
-1 | -1 | 1 | 1 | -1 | -1 | no | no | High | Low |
-1 | 1 | -1 | -1 | 1 | -1 | no | yes | Low | Med |
-1 | 1 | 1 | -1 | -1 | 1 | no | yes | Med | Med |
1 | 1 | 1 | 1 | 1 | 1 | yes | yes | High | High |
We will create reference columns both for our complete data and 8 experiments. These refcol variables are going to be used for matching experiments and data set and randomly assign a result to the experiment run.
Prjplan2<-unite(Prjplan, refcol , c(7,8,9,10), remove=FALSE)
kable(Prjplan2, align = 'c')
A | B | C | D | E | F | refcol | Plan$fullbase | Plan$prefarea | Plan$lotsize2 | Plan$bedrooms2 |
---|---|---|---|---|---|---|---|---|---|---|
1 | -1 | 1 | -1 | 1 | -1 | yes_no_Med_Med | yes | no | Med | Med |
1 | -1 | -1 | -1 | -1 | 1 | yes_no_Low_Med | yes | no | Low | Med |
1 | 1 | -1 | 1 | -1 | -1 | yes_yes_Med_Low | yes | yes | Med | Low |
-1 | -1 | -1 | 1 | 1 | 1 | no_no_Med_High | no | no | Med | High |
-1 | -1 | 1 | 1 | -1 | -1 | no_no_High_Low | no | no | High | Low |
-1 | 1 | -1 | -1 | 1 | -1 | no_yes_Low_Med | no | yes | Low | Med |
-1 | 1 | 1 | -1 | -1 | 1 | no_yes_Med_Med | no | yes | Med | Med |
1 | 1 | 1 | 1 | 1 | 1 | yes_yes_High_High | yes | yes | High | High |
Plan.refcol <- c(Prjplan2$refcol)
Prj<-unite(Prj, refcol , c(2,3,4,5), remove=FALSE)
head(Prj)
price refcol fullbase prefarea lotsize2 bedrooms2
1 42000 yes_no_Med_Med yes no Med Med
2 38500 no_no_Med_Low no no Med Low
3 49500 no_no_Low_Med no no Low Med
4 60500 no_no_High_Med no no High Med
5 61000 no_no_Med_Low no no Med Low
6 66000 yes_no_Med_Med yes no Med Med
tail(Prj)
price refcol fullbase prefarea lotsize2 bedrooms2
541 85000 no_no_High_Med no no High Med
542 91500 no_no_Med_Med no no Med Med
543 94000 no_no_Med_Med no no Med Med
544 103000 no_no_Med_Med no no Med Med
545 105000 no_no_Med_Med no no Med Med
546 105000 no_no_Med_Med no no Med Med
for (j in 1:546){if (Prjplan2$refcol[1] == Prj$refcol[j]){Prj$RO1[j] <- Prj$price[j]}else{Prj$RO1[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[2] == Prj$refcol[j]){Prj$RO2[j] <- Prj$price[j]}else{Prj$RO2[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[3] == Prj$refcol[j]){Prj$RO3[j] <- Prj$price[j]}else{Prj$RO3[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[4] == Prj$refcol[j]){Prj$RO4[j] <- Prj$price[j]}else{Prj$RO4[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[5] == Prj$refcol[j]){Prj$RO5[j] <- Prj$price[j]}else{Prj$RO5[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[6] == Prj$refcol[j]){Prj$RO6[j] <- Prj$price[j]}else{Prj$RO6[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[7] == Prj$refcol[j]){Prj$RO7[j] <- Prj$price[j]}else{Prj$RO7[j] <- 0}}
for (j in 1:546){if (Prjplan2$refcol[8] == Prj$refcol[j]){Prj$RO8[j] <- Prj$price[j]}else{Prj$RO8[j] <- 0}}
R1 <- sample.rows(subset(Prj, RO1 > 0), 1)
R2 <- sample.rows(subset(Prj, RO2 > 0), 1)
R3 <- sample.rows(subset(Prj, RO3 > 0), 1)
R4 <- sample.rows(subset(Prj, RO4 > 0), 1)
R5 <- sample.rows(subset(Prj, RO5 > 0), 1)
R6 <- sample.rows(subset(Prj, RO6 > 0), 1)
R7 <- sample.rows(subset(Prj, RO7 > 0), 1)
R8 <- sample.rows(subset(Prj, RO8 > 0), 1)
RV <- c(R1$RO1,R2$RO2,R3$RO3,R4$RO4,R5$RO5,R6$RO6,R7$RO7,R8$RO8)
Our response variables randomly selected for our experiment are:
RV
[1] 66000 35000 57000 48000 45000 52000 78000 174500
We will add these Response Variables to our design:
plan.resp <- add.response(plan.design, RV)
summary(plan.resp)
Call:
FrF2(8, 6, randomize = TRUE)
Experimental design of type FrF2
8 runs
Factor settings (scale ends):
A B C D E F
1 -1 -1 -1 -1 -1 -1
2 1 1 1 1 1 1
Responses:
[1] RV
Design generating information:
$legend
[1] A=A B=B C=C D=D E=E F=F
$generators
[1] D=AB E=AC F=BC
Alias structure:
$main
[1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
$fi2
[1] AF=BE=CD
The design itself:
A B C D E F RV
1 1 -1 1 -1 1 -1 66000
2 1 -1 -1 -1 -1 1 35000
3 1 1 -1 1 -1 -1 57000
4 -1 -1 -1 1 1 1 48000
5 -1 -1 1 1 -1 -1 45000
6 -1 1 -1 -1 1 -1 52000
7 -1 1 1 -1 -1 1 78000
8 1 1 1 1 1 1 174500
class=design, type= FrF2
Now we can analyze our 26−3III fractional factorial experiment with FrF2 package :
summary(lm(plan.resp))
Number of observations used: 8
Formula:
RV ~ (A + B + C + D + E + F)^2
Call:
lm.default(formula = fo, data = model.frame(fo, data = formula))
Residuals:
ALL 8 residuals are 0: no residual degrees of freedom!
Coefficients: (14 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 69438 NA NA NA
A1 13688 NA NA NA
B1 20938 NA NA NA
C1 21438 NA NA NA
D1 11688 NA NA NA
E1 15688 NA NA NA
F1 14438 NA NA NA
A1:F1 7188 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 7 and 0 DF, p-value: NA
Main effects can be represented as below :
MEPlot(plan.resp, abbrev = 5, cex.xax = 1.6, cex.main = 2)
Interaction effects can be represented as below with alias numbering, also second plot shows selected (C,D,E,F) interactions :
IAPlot(plan.resp, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)
IAPlot(plan.resp, abbrev = 5, select = 4:6, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5)
According to these findings our screening design is representing that all main effects are large and should be included in the model. Now we can compare our finding with the full data set :
In this case our null hypothesis can be set as house price does not affected by full basement, house being in a preferred ares, lot size of the house, number of bedrooms.
DNM2=lm(price~fullbase+prefarea+lotsize2+bedrooms2,data=Prj.Orj)
anova(DNM2)
Analysis of Variance Table
Response: price
Df Sum Sq Mean Sq F value Pr(>F)
fullbase 1 1.3476e+10 1.3476e+10 31.412 3.334e-08 ***
prefarea 1 3.3656e+10 3.3656e+10 78.454 < 2.2e-16 ***
lotsize2 2 7.7017e+10 3.8508e+10 89.765 < 2.2e-16 ***
bedrooms2 2 3.3228e+10 1.6614e+10 38.728 < 2.2e-16 ***
Residuals 539 2.3123e+11 4.2899e+08
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Also these findings represents a statistically significant effect and we can reject our null hypothesis, but lastly we should conduct model adequacy checking :
hist(Prj$price, breaks = 20)
qqnorm(residuals(DNM2))
qqline(residuals(DNM2))
par(mfrow=c(2,2))
plot(DNM2)
The results of ANOVA are based on the assumption that the data is normally distributed and randomly selected. However, q-q plots and scatter plots show these assumptions had not been fulfilled. Also the histogram of the response variable shows a right skewed distribution. With these findings we conclude that further analysis should be conducted.
http://www2.stat.duke.edu/~banks/218-lectures.dir/spec-lect3.pdf
Montgomery, Douglas C.. Design and Analysis of Experiments, 8th Edition (Page 707). Wiley. Kindle Edition.
Ulrike Gromping (2014). R Package FrF2 for Creating and Analyzing Fractional Factorial 2-Level Designs. Journal of Statistical Software, January 2014, Volume 56, Issue 1.
Kevin DUNN (2015. Experiments 4G - Fractional factorials: using aliasing notation to plan experiments. https://www.youtube.com/watch?v=zrZS-zovKSc
http://www2.uaem.mx/r-mirror/web/packages/planor/vignettes/planorVignette.pdf