DOE Project3
Benjamin Byeon
Rensselaer Polytechnic Institute - Troy, NY
V1.12.10.16
1. Setting
From the Ecdat Package, I selected a dataset that contained N=93 observations. The data looks into different factors that affect the pricing of cars. The study contains one response variable and two 2 level factors and two 3 level factors, which would be decomposed into 2 two level factors. Further analysis were made regarding the study.
The data was copied and pasted into Excel, and then saved as a .csv. Shown below is the loaded data into R.
#Load the package
library("FrF2")
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars
## airbag drive man.trans origin price
## 1 None Front Yes non-USA 15.9
## 2 Driver & Passenger Front Yes non-USA 33.9
## 3 Driver only Front Yes non-USA 29.1
## 4 Driver & Passenger Front Yes non-USA 37.7
## 5 Driver only Rear Yes non-USA 30.0
## 6 Driver only Front No USA 15.7
## 7 Driver only Front No USA 20.8
## 8 Driver only Rear No USA 23.7
## 9 Driver only Front No USA 26.3
## 10 Driver only Front No USA 34.7
## 11 Driver & Passenger Front No USA 40.1
## 12 None Front Yes USA 13.4
## 13 Driver only Front Yes USA 11.4
## 14 Driver & Passenger Rear Yes USA 15.1
## 15 None Front No USA 15.9
## 16 None Front No USA 16.3
## 17 None 4WD No USA 16.6
## 18 Driver only Rear No USA 18.8
## 19 Driver only Rear Yes USA 38.0
## 20 Driver & Passenger Front No USA 18.4
## 21 Driver & Passenger Front No USA 15.8
## 22 Driver only Front No USA 29.5
## 23 None Front Yes USA 9.2
## 24 Driver only Front Yes USA 11.3
## 25 Driver only Front Yes USA 13.3
## 26 Driver only 4WD No USA 19.0
## 27 Driver only Front No USA 15.6
## 28 Driver only 4WD Yes USA 25.8
## 29 None Front Yes USA 12.2
## 30 Driver & Passenger Front No USA 19.3
## 31 None Front Yes USA 7.4
## 32 None Front Yes USA 10.1
## 33 None Front Yes USA 11.3
## 34 Driver only Rear Yes USA 15.9
## 35 Driver only Front Yes USA 14.0
## 36 Driver only 4WD Yes USA 19.9
## 37 Driver only Front No USA 20.2
## 38 Driver only Rear No USA 20.9
## 39 None Front Yes non-USA 8.4
## 40 Driver only Front Yes non-USA 12.5
## 41 Driver & Passenger Front Yes non-USA 19.8
## 42 Driver only Front Yes non-USA 12.1
## 43 Driver & Passenger Front Yes non-USA 17.5
## 44 None Front Yes non-USA 8.0
## 45 None Front Yes non-USA 10.0
## 46 None Front Yes non-USA 10.0
## 47 None Front Yes non-USA 13.9
## 48 Driver only Rear No non-USA 47.9
## 49 Driver only Front Yes non-USA 28.0
## 50 Driver & Passenger Rear Yes non-USA 35.2
## 51 Driver & Passenger Front No USA 34.3
## 52 Driver & Passenger Rear No USA 36.1
## 53 None Front Yes non-USA 8.3
## 54 None Front Yes non-USA 11.6
## 55 Driver only Front Yes non-USA 16.5
## 56 None 4WD No non-USA 19.1
## 57 Driver only Rear Yes non-USA 32.5
## 58 Driver only Rear Yes non-USA 31.9
## 59 Driver & Passenger Rear No non-USA 61.9
## 60 Driver only Front Yes USA 14.1
## 61 None Rear No USA 14.9
## 62 None Front Yes non-USA 10.3
## 63 Driver only Front No non-USA 26.1
## 64 Driver only Front Yes non-USA 11.8
## 65 Driver only Front Yes non-USA 15.7
## 66 None Front No non-USA 19.1
## 67 Driver only Front No non-USA 21.5
## 68 None Front No USA 13.5
## 69 Driver only Front No USA 16.3
## 70 None Front No USA 19.5
## 71 Driver only Front No USA 20.7
## 72 None 4WD Yes USA 14.4
## 73 None Front Yes USA 9.0
## 74 None Front Yes USA 11.1
## 75 Driver & Passenger Rear Yes USA 17.7
## 76 None Front Yes USA 18.5
## 77 Driver & Passenger Front No USA 24.4
## 78 Driver only Front Yes non-USA 28.7
## 79 Driver only Front Yes USA 11.1
## 80 None 4WD Yes non-USA 8.4
## 81 None 4WD Yes non-USA 10.9
## 82 Driver only 4WD Yes non-USA 19.5
## 83 None Front Yes non-USA 8.6
## 84 Driver only Front Yes non-USA 9.8
## 85 Driver only Front Yes non-USA 18.4
## 86 Driver only Front Yes non-USA 18.2
## 87 Driver only 4WD Yes non-USA 22.7
## 88 None Front Yes non-USA 9.1
## 89 None Front Yes non-USA 19.7
## 90 None Front Yes non-USA 20.0
## 91 None Front Yes non-USA 23.3
## 92 Driver only Rear Yes non-USA 22.7
## 93 Driver & Passenger Front Yes non-USA 26.7
System under test
The dataset helps determine the what effects a price of a car in 1993, with factors the presence of airbags, driving train type, avaiability of manual transmission, and the origin of the machine parts. The factors were chosen in order to match the Fractional Factorial Design criteria. Within the choices, Best Guess was used to select specific factors due to my personal interests and experience with knowledge of cars.
Below are the definitions of the variables:
price: minimum price of the cars
airbag: air bags: none, driver only, or driver and passenger
drive: rear wheel, front wheel, or 4WD
man.trans: manual transmission: yes or no
origin: product origin made in the USA: yes or no
#Display the descriptive summaery
summary(cars)
## airbag drive man.trans origin price
## Driver & Passenger:16 4WD :10 No :32 non-USA:45 Min. : 7.40
## Driver only :43 Front:67 Yes:61 USA :48 1st Qu.:12.20
## None :34 Rear :16 Median :17.70
## Mean :19.51
## 3rd Qu.:23.30
## Max. :61.90
head(cars)
## airbag drive man.trans origin price
## 1 None Front Yes non-USA 15.9
## 2 Driver & Passenger Front Yes non-USA 33.9
## 3 Driver only Front Yes non-USA 29.1
## 4 Driver & Passenger Front Yes non-USA 37.7
## 5 Driver only Rear Yes non-USA 30.0
## 6 Driver only Front No USA 15.7
tail(cars)
## airbag drive man.trans origin price
## 88 None Front Yes non-USA 9.1
## 89 None Front Yes non-USA 19.7
## 90 None Front Yes non-USA 20.0
## 91 None Front Yes non-USA 23.3
## 92 Driver only Rear Yes non-USA 22.7
## 93 Driver & Passenger Front Yes non-USA 26.7
names(cars)
## [1] "airbag" "drive" "man.trans" "origin" "price"
#Display the structure
names(cars)
## [1] "airbag" "drive" "man.trans" "origin" "price"
str(cars)
## 'data.frame': 93 obs. of 5 variables:
## $ airbag : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
## $ drive : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
## $ man.trans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
## $ origin : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
Factors and Levels
The data contains 5 factors, two 2 level factors and two 3 level factors. The as.factors was not necessary because they were categorical.
#Check the levels
levels(cars$price)
## NULL
levels(cars$airbag)
## [1] "Driver & Passenger" "Driver only" "None"
levels(cars$drive)
## [1] "4WD" "Front" "Rear"
levels(cars$man.trans)
## [1] "No" "Yes"
levels(cars$origin)
## [1] "non-USA" "USA"
The two 2 level factors:
man.trans: Does the car have manual transmission yes or no?
origin: Are product parts of the car made from origin in the USA yes or no?
The two 3 level factors:
airbag: Do the cars have air bags none, driver only, or driver and passenger?
drive: Is the car rear wheel, front wheel, or 4WD?
Continuous Variables (if any)
The data had one continuous variable that was also the dependent variable. It was the price. The price was in USD. Note, the may not reflect the USD worth today because the data was collected in 1993.
Response Variable(s)
The main objective of the experiment was to determine the effects of car’s price Therefore, the price variable also represents the response variable as well.
The Data: How is it organized and what does it look like?
The data consists of N=93 obversations with 6 variables. The data was collected in the year 1993 and the country the data came from was the USA.
In order to satisfy the factorial design with levels of high and low, I needed to make changes to the data. I replaced the factors with binary integer terms of 0,1,and 2.
#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)
#Automatically replace the categories as 0,1,2
for (i in 1:a){
#For airbags
if (cars$airbag[i] =="Driver & Passenger"){
airbag[i,1] = 0
}
if (cars$airbag[i] == "Driver only"){
airbag[i,1] = 1
}
if (cars$airbag[i] == "None"){
airbag[i,1] = 2
}
#For drive
if (cars$drive[i] =="4WD"){
drive[i,1] = 0
}
if (cars$drive[i] == "Front"){
drive[i,1] = 1
}
if (cars$drive[i] == "Rear"){
drive[i,1] = 2
}
#For man.trans
if (cars$man.trans[i] == "No"){
man.trans[i,1] = 0
} else{
man.trans[i,1] = 1
}
#For origin
if (cars$origin[i] == "non-USA"){
origin[i,1] = 0
} else{
origin[i,1] = 1
}
}
Here is the new breakdown of the levels:
airbag: driver and passenger = 0, driver only = 1, none = 2
drive: 4WD = 0, front = 1, rear = 2
man.trans: no = 0, yes = 1
origin: no = 0, yes = 1
Here is the view of the new organization:
#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")
head(cars2,10)
## airbag drive man.trans origin price
## 1 2 1 1 0 15.9
## 2 0 1 1 0 33.9
## 3 1 1 1 0 29.1
## 4 0 1 1 0 37.7
## 5 1 2 1 0 30.0
## 6 1 1 0 1 15.7
## 7 1 1 0 1 20.8
## 8 1 2 0 1 23.7
## 9 1 1 0 1 26.3
## 10 1 1 0 1 34.7
tail(cars2, 20)
## airbag drive man.trans origin price
## 74 2 1 1 1 11.1
## 75 0 2 1 1 17.7
## 76 2 1 1 1 18.5
## 77 0 1 0 1 24.4
## 78 1 1 1 0 28.7
## 79 1 1 1 1 11.1
## 80 2 0 1 0 8.4
## 81 2 0 1 0 10.9
## 82 1 0 1 0 19.5
## 83 2 1 1 0 8.6
## 84 1 1 1 0 9.8
## 85 1 1 1 0 18.4
## 86 1 1 1 0 18.2
## 87 1 0 1 0 22.7
## 88 2 1 1 0 9.1
## 89 2 1 1 0 19.7
## 90 2 1 1 0 20.0
## 91 2 1 1 0 23.3
## 92 1 2 1 0 22.7
## 93 0 1 1 0 26.7
str(cars2)
## 'data.frame': 93 obs. of 5 variables:
## $ airbag : num 2 0 1 0 1 1 1 1 1 1 ...
## $ drive : num 1 1 1 1 2 1 1 2 1 1 ...
## $ man.trans: num 1 1 1 1 1 0 0 0 0 0 ...
## $ origin : num 0 0 0 0 0 1 1 1 1 1 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
2. (Experimental) Design
The experimental design for the fractional factorial design had many procedurals steps that needed to taken.
A fractional factorial design helps reduce the calculations needed.
How will the experiment be organized and conducted to test the hypothesis?
The first step was to make sure the I appropriately construct the correct 2^k design by representing each 3-level factor in terms of 2-level factors. That is what I precisely did by : Here is the decomposition of the 3-level factors:
#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
knitr::kable(X1,align='c')
| -1 |
-1 |
Low |
| 1 |
-1 |
Med |
| -1 |
1 |
Med |
| 1 |
1 |
High |
E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(E,F,X2))
knitr::kable(X2, align ='c')
| -1 |
-1 |
Low |
| 1 |
-1 |
Med |
| -1 |
1 |
Med |
| 1 |
1 |
High |
This decomposition of the 3 level factors was a primary step of the project. To match later, I also changed the factors to integers for the corresponding:
Get match corresponding
for (i in 1:a){
#For A
if (test$A[i] == "low"){
A[i,1] = 0
} else{
A[i,1] = 1
}
#For B
if (test$B[i] == "low"){
B[i,1] = 0
} else{
B[i,1] = 1
}
#For X1
if (test$X1[i] =="low"){
X1[i,1] = 0
}
if (test$X1[i] == "med"){
X1[i,1] = 1
}
if (test$X1[i] == "high"){
X1[i,1] = 2
}
#For X2
if (test$X2[i] =="low"){
X2[i,1] = 0
}
if (test$X2[i] == "med"){
X2[i,1] = 1
}
if (test$X2[i] == "high"){
X2[i,1] = 2
}
}
The design is a 2^m. Technically, we should have 2^4 = 32 observational runs. However, we are going to reduce later as well to make it more efficient and resourceful.
With 64 measurements now, here is the fractional factorial design table on Excel as an image. Only a snippet was shown because it will be coded into R as well.
#FFD_t = FrF2(8, nfactors = 6, res3=T, estimable = formula("~airbag+drive+man.trans+origin+airbag:(drive+man.trans+origin)"))
f1 = FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))
## creating full factorial with 64 runs ...
f1
## A B C D E F
## 1 1 1 -1 -1 1 -1
## 2 1 -1 -1 -1 1 1
## 3 -1 -1 1 1 -1 1
## 4 1 -1 1 1 -1 1
## 5 -1 1 -1 -1 1 -1
## 6 1 1 -1 -1 1 1
## 7 -1 -1 -1 1 1 1
## 8 1 -1 -1 1 -1 1
## 9 1 -1 -1 -1 -1 -1
## 10 -1 -1 -1 -1 1 1
## 11 1 -1 1 1 1 1
## 12 1 -1 1 1 1 -1
## 13 1 1 1 -1 1 1
## 14 -1 1 1 1 -1 1
## 15 1 1 1 1 -1 -1
## 16 -1 -1 -1 -1 -1 -1
## 17 1 1 -1 1 -1 -1
## 18 -1 1 1 1 -1 -1
## 19 1 -1 -1 1 1 -1
## 20 -1 -1 1 1 1 -1
## 21 1 1 1 1 -1 1
## 22 1 -1 -1 1 1 1
## 23 -1 1 1 -1 -1 -1
## 24 -1 -1 1 -1 -1 1
## 25 -1 1 1 -1 1 -1
## 26 -1 -1 -1 -1 1 -1
## 27 -1 -1 1 -1 1 1
## 28 -1 -1 -1 -1 -1 1
## 29 1 1 -1 1 -1 1
## 30 -1 -1 -1 1 1 -1
## 31 -1 -1 1 -1 -1 -1
## 32 1 1 1 -1 1 -1
## 33 -1 1 -1 1 1 1
## 34 1 1 -1 1 1 -1
## 35 1 -1 1 -1 -1 -1
## 36 1 1 1 1 1 -1
## 37 1 -1 1 -1 1 -1
## 38 1 1 -1 -1 -1 1
## 39 -1 1 1 1 1 1
## 40 -1 -1 -1 1 -1 -1
## 41 -1 1 1 -1 -1 1
## 42 -1 1 -1 1 1 -1
## 43 1 -1 -1 -1 1 -1
## 44 1 -1 -1 1 -1 -1
## 45 1 -1 1 1 -1 -1
## 46 -1 -1 1 -1 1 -1
## 47 1 -1 -1 -1 -1 1
## 48 -1 1 -1 -1 -1 -1
## 49 -1 1 1 1 1 -1
## 50 1 1 1 -1 -1 1
## 51 -1 1 -1 1 -1 -1
## 52 -1 1 1 -1 1 1
## 53 1 1 1 -1 -1 -1
## 54 1 -1 1 -1 -1 1
## 55 -1 -1 1 1 1 1
## 56 -1 -1 -1 1 -1 1
## 57 -1 1 -1 -1 1 1
## 58 -1 1 -1 -1 -1 1
## 59 -1 1 -1 1 -1 1
## 60 1 1 -1 -1 -1 -1
## 61 1 1 -1 1 1 1
## 62 1 -1 1 -1 1 1
## 63 1 1 1 1 1 1
## 64 -1 -1 1 1 -1 -1
## class=design, type= full factorial
#EXCEL
ffd = as.data.frame(FrF2(64,6, randomize = FALSE))
## creating full factorial with 64 runs ...
A = c(as.integer(ffd[,1]))
B = c(as.integer(ffd[,2]))
C = c(as.integer(ffd[,3]))
D = c(as.integer(ffd[,4]))
CD = C*D
E = c(as.integer(ffd[,5]))
F = c(as.integer(ffd[,6]))
EF = E*F
excel = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(ffd)
for (i in 1:n){
if (excel$A[i] == 1){excel$TreatA[i] = "Low"}
if (excel$A[i] == 2){excel$TreatA[i] = "High"}
if (excel$B[i] == 1){excel$TreatB[i] = "Low"}
if (excel$B[i] == 2){excel$TreatB[i] = "High"}
if (excel$CD[i] == 1){excel$TreatX1[i] = "Low"}
if (excel$CD[i] == 2){excel$TreatX1[i] = "Med"}
if (excel$CD[i] == 4){excel$TreatX1[i] = "High"}
if (excel$EF[i] == 1){excel$TreatX2[i] = "Low"}
if (excel$EF[i] == 2){excel$TreatX2[i] = "Med"}
if (excel$EF[i] == 4){excel$TreatX2[i] = "High"}
}
ffd12 = cbind(ffd,excel$TreatA, excel$TreatB, excel$TreatX1, excel$TreatX2)
knitr::kable(ffd12, align = 'c')
| -1 |
-1 |
-1 |
-1 |
-1 |
-1 |
Low |
Low |
Low |
Low |
| 1 |
-1 |
-1 |
-1 |
-1 |
-1 |
High |
Low |
Low |
Low |
| -1 |
1 |
-1 |
-1 |
-1 |
-1 |
Low |
High |
Low |
Low |
| 1 |
1 |
-1 |
-1 |
-1 |
-1 |
High |
High |
Low |
Low |
| -1 |
-1 |
1 |
-1 |
-1 |
-1 |
Low |
Low |
Med |
Low |
| 1 |
-1 |
1 |
-1 |
-1 |
-1 |
High |
Low |
Med |
Low |
| -1 |
1 |
1 |
-1 |
-1 |
-1 |
Low |
High |
Med |
Low |
| 1 |
1 |
1 |
-1 |
-1 |
-1 |
High |
High |
Med |
Low |
| -1 |
-1 |
-1 |
1 |
-1 |
-1 |
Low |
Low |
Med |
Low |
| 1 |
-1 |
-1 |
1 |
-1 |
-1 |
High |
Low |
Med |
Low |
| -1 |
1 |
-1 |
1 |
-1 |
-1 |
Low |
High |
Med |
Low |
| 1 |
1 |
-1 |
1 |
-1 |
-1 |
High |
High |
Med |
Low |
| -1 |
-1 |
1 |
1 |
-1 |
-1 |
Low |
Low |
High |
Low |
| 1 |
-1 |
1 |
1 |
-1 |
-1 |
High |
Low |
High |
Low |
| -1 |
1 |
1 |
1 |
-1 |
-1 |
Low |
High |
High |
Low |
| 1 |
1 |
1 |
1 |
-1 |
-1 |
High |
High |
High |
Low |
| -1 |
-1 |
-1 |
-1 |
1 |
-1 |
Low |
Low |
Low |
Med |
| 1 |
-1 |
-1 |
-1 |
1 |
-1 |
High |
Low |
Low |
Med |
| -1 |
1 |
-1 |
-1 |
1 |
-1 |
Low |
High |
Low |
Med |
| 1 |
1 |
-1 |
-1 |
1 |
-1 |
High |
High |
Low |
Med |
| -1 |
-1 |
1 |
-1 |
1 |
-1 |
Low |
Low |
Med |
Med |
| 1 |
-1 |
1 |
-1 |
1 |
-1 |
High |
Low |
Med |
Med |
| -1 |
1 |
1 |
-1 |
1 |
-1 |
Low |
High |
Med |
Med |
| 1 |
1 |
1 |
-1 |
1 |
-1 |
High |
High |
Med |
Med |
| -1 |
-1 |
-1 |
1 |
1 |
-1 |
Low |
Low |
Med |
Med |
| 1 |
-1 |
-1 |
1 |
1 |
-1 |
High |
Low |
Med |
Med |
| -1 |
1 |
-1 |
1 |
1 |
-1 |
Low |
High |
Med |
Med |
| 1 |
1 |
-1 |
1 |
1 |
-1 |
High |
High |
Med |
Med |
| -1 |
-1 |
1 |
1 |
1 |
-1 |
Low |
Low |
High |
Med |
| 1 |
-1 |
1 |
1 |
1 |
-1 |
High |
Low |
High |
Med |
| -1 |
1 |
1 |
1 |
1 |
-1 |
Low |
High |
High |
Med |
| 1 |
1 |
1 |
1 |
1 |
-1 |
High |
High |
High |
Med |
| -1 |
-1 |
-1 |
-1 |
-1 |
1 |
Low |
Low |
Low |
Med |
| 1 |
-1 |
-1 |
-1 |
-1 |
1 |
High |
Low |
Low |
Med |
| -1 |
1 |
-1 |
-1 |
-1 |
1 |
Low |
High |
Low |
Med |
| 1 |
1 |
-1 |
-1 |
-1 |
1 |
High |
High |
Low |
Med |
| -1 |
-1 |
1 |
-1 |
-1 |
1 |
Low |
Low |
Med |
Med |
| 1 |
-1 |
1 |
-1 |
-1 |
1 |
High |
Low |
Med |
Med |
| -1 |
1 |
1 |
-1 |
-1 |
1 |
Low |
High |
Med |
Med |
| 1 |
1 |
1 |
-1 |
-1 |
1 |
High |
High |
Med |
Med |
| -1 |
-1 |
-1 |
1 |
-1 |
1 |
Low |
Low |
Med |
Med |
| 1 |
-1 |
-1 |
1 |
-1 |
1 |
High |
Low |
Med |
Med |
| -1 |
1 |
-1 |
1 |
-1 |
1 |
Low |
High |
Med |
Med |
| 1 |
1 |
-1 |
1 |
-1 |
1 |
High |
High |
Med |
Med |
| -1 |
-1 |
1 |
1 |
-1 |
1 |
Low |
Low |
High |
Med |
| 1 |
-1 |
1 |
1 |
-1 |
1 |
High |
Low |
High |
Med |
| -1 |
1 |
1 |
1 |
-1 |
1 |
Low |
High |
High |
Med |
| 1 |
1 |
1 |
1 |
-1 |
1 |
High |
High |
High |
Med |
| -1 |
-1 |
-1 |
-1 |
1 |
1 |
Low |
Low |
Low |
High |
| 1 |
-1 |
-1 |
-1 |
1 |
1 |
High |
Low |
Low |
High |
| -1 |
1 |
-1 |
-1 |
1 |
1 |
Low |
High |
Low |
High |
| 1 |
1 |
-1 |
-1 |
1 |
1 |
High |
High |
Low |
High |
| -1 |
-1 |
1 |
-1 |
1 |
1 |
Low |
Low |
Med |
High |
| 1 |
-1 |
1 |
-1 |
1 |
1 |
High |
Low |
Med |
High |
| -1 |
1 |
1 |
-1 |
1 |
1 |
Low |
High |
Med |
High |
| 1 |
1 |
1 |
-1 |
1 |
1 |
High |
High |
Med |
High |
| -1 |
-1 |
-1 |
1 |
1 |
1 |
Low |
Low |
Med |
High |
| 1 |
-1 |
-1 |
1 |
1 |
1 |
High |
Low |
Med |
High |
| -1 |
1 |
-1 |
1 |
1 |
1 |
Low |
High |
Med |
High |
| 1 |
1 |
-1 |
1 |
1 |
1 |
High |
High |
Med |
High |
| -1 |
-1 |
1 |
1 |
1 |
1 |
Low |
Low |
High |
High |
| 1 |
-1 |
1 |
1 |
1 |
1 |
High |
Low |
High |
High |
| -1 |
1 |
1 |
1 |
1 |
1 |
Low |
High |
High |
High |
| 1 |
1 |
1 |
1 |
1 |
1 |
High |
High |
High |
High |
For the project, I also did a temporary 2^m-3 design, where m=6. Based on the calculations, this is a 1/8 fractional factorial design. The design would be utilized to compute the resolution once after matching and collecting the data. Then, using the matched and collected data, I would get the me and ie’s that exist. I will also later assess the aliasing to ensure and determine structure as well in order to cope for the design itself as it is limited and reduced.
Before actually even testing, I was able to predict that the Resolution would be Resolution III. Since it is a 1/8 fraction design, it will be Resolution III. We would also most likley be unable to get higher Resolutions due to the data itself and the parameters.
To construct a temporary 1/8 fractional factorial design, I did the same using FrF2.
#This is the fractional factorial design of 2^6-3
t = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t[,1]))
B = c(as.integer(t[,2]))
C = c(as.integer(t[,3]))
D = c(as.integer(t[,4]))
CD = C*D
E = c(as.integer(t[,5]))
F = c(as.integer(t[,6]))
EF = E*F
ffdp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t)
for (i in 1:n){
if (ffdp$A[i] == 1){ffdp$TreatA[i] = "Low"}
if (ffdp$A[i] == 2){ffdp$TreatA[i] = "High"}
if (ffdp$B[i] == 1){ffdp$TreatB[i] = "Low"}
if (ffdp$B[i] == 2){ffdp$TreatB[i] = "High"}
if (ffdp$CD[i] == 1){ffdp$TreatX1[i] = "Low"}
if (ffdp$CD[i] == 2){ffdp$TreatX1[i] = "Med"}
if (ffdp$CD[i] == 4){ffdp$TreatX1[i] = "High"}
if (ffdp$EF[i] == 1){ffdp$TreatX2[i] = "Low"}
if (ffdp$EF[i] == 2){ffdp$TreatX2[i] = "Med"}
if (ffdp$EF[i] == 4){ffdp$TreatX2[i] = "High"}
}
ffdMain = cbind(t,ffdp$TreatA,ffdp$TreatB,ffdp$TreatX1,ffdp$TreatX2)
knitr::kable(ffdMain, align = 'c')
| -1 |
-1 |
-1 |
1 |
1 |
1 |
Low |
Low |
Med |
High |
| 1 |
-1 |
-1 |
-1 |
-1 |
1 |
High |
Low |
Low |
Med |
| -1 |
1 |
-1 |
-1 |
1 |
-1 |
Low |
High |
Low |
Med |
| 1 |
1 |
-1 |
1 |
-1 |
-1 |
High |
High |
Med |
Low |
| -1 |
-1 |
1 |
1 |
-1 |
-1 |
Low |
Low |
High |
Low |
| 1 |
-1 |
1 |
-1 |
1 |
-1 |
High |
Low |
Med |
Med |
| -1 |
1 |
1 |
-1 |
-1 |
1 |
Low |
High |
Med |
Med |
| 1 |
1 |
1 |
1 |
1 |
1 |
High |
High |
High |
High |
As an overview of the design, I first read in the .csv datafile that had the categorical variables. Then, I converted them into integers. For the 2 level factors there was no decomposition, and so the integers were simply 0 and 1 (low and high). For the 3 level factors, I needed to decompose them into two 2 level factors. In other words, I map the 3 level factors to a 2 level factor. However, I needed to accomodate the middle, repeating case. To do so, I repeated the middle case. For the 3 level factors, I decomposed them as 0,1,and 2 (low, med, high). In practice, I implemented the 3 level factor as 2 level factors and the middle combination (med) was repeated as 0/1 and again as 1/0. While searching for the middle combination, I looked for 1/0 or 0/1. The low was 0/0 and the high was 1/1, which would correspond to 2. It was important to realize that I should not search on the 0 and 2 cases, but include the runs that have the equivalent physical value associated with the 1 as well. It is not mapping to 0,2 or 2,0, but more like 1/0 and 0/1 for the med, 0/0 for the low, and 1/1 for high. By having such design and getting the medium combination, I have now treated and mapped the 3 level as 2 level. In the Excel, the A and B are 2 levels, while the X1 and X2 are decomposed as C and D and E and F, respectively.
What is the rationale for this design?
The rationale for this design was limited as I adapted a pre-existing experiment design dataset. The original factors and procedures to collect the data were done by the experimentor. Therefore, I had less flexibility and control over some aspects.
However, at least for the fractional factorial design, I used it to reduce the number of measurements and would use alias to determine the confounding variables and resolution. The aliasing would help check for accuracy. Later, in the future, I would validate this by using the Response Surface Methods. For now, let’s just focus on the factorial design.
Randomize: What is the Randomization Scheme?
First of all, I assume the data collected was random because the cars were chosen randomly and at a random time frame of the year 1993, including day and time.
Also, to ensure further randomization, I randomized the order of the data:
#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata
## airbag drive man.trans origin price
## 19 Driver only Rear Yes USA 38.0
## 36 Driver only 4WD Yes USA 19.9
## 5 Driver only Rear Yes non-USA 30.0
## 18 Driver only Rear No USA 18.8
## 26 Driver only 4WD No USA 19.0
## 3 Driver only Front Yes non-USA 29.1
## 38 Driver only Rear No USA 20.9
## 46 None Front Yes non-USA 10.0
## 42 Driver only Front Yes non-USA 12.1
## 89 None Front Yes non-USA 19.7
## 83 None Front Yes non-USA 8.6
## 84 Driver only Front Yes non-USA 9.8
## 85 Driver only Front Yes non-USA 18.4
## 92 Driver only Rear Yes non-USA 22.7
## 39 None Front Yes non-USA 8.4
## 71 Driver only Front No USA 20.7
## 16 None Front No USA 16.3
## 90 None Front Yes non-USA 20.0
## 40 Driver only Front Yes non-USA 12.5
## 29 None Front Yes USA 12.2
## 21 Driver & Passenger Front No USA 15.8
## 81 None 4WD Yes non-USA 10.9
## 6 Driver only Front No USA 15.7
## 15 None Front No USA 15.9
## 37 Driver only Front No USA 20.2
## 24 Driver only Front Yes USA 11.3
## 75 Driver & Passenger Rear Yes USA 17.7
## 35 Driver only Front Yes USA 14.0
## 61 None Rear No USA 14.9
## 27 Driver only Front No USA 15.6
## 65 Driver only Front Yes non-USA 15.7
## 8 Driver only Rear No USA 23.7
## 54 None Front Yes non-USA 11.6
## 93 Driver & Passenger Front Yes non-USA 26.7
## 74 None Front Yes USA 11.1
## 47 None Front Yes non-USA 13.9
## 86 Driver only Front Yes non-USA 18.2
## 66 None Front No non-USA 19.1
## 11 Driver & Passenger Front No USA 40.1
## 32 None Front Yes USA 10.1
## 51 Driver & Passenger Front No USA 34.3
## 10 Driver only Front No USA 34.7
## 30 Driver & Passenger Front No USA 19.3
## 78 Driver only Front Yes non-USA 28.7
## 68 None Front No USA 13.5
## 20 Driver & Passenger Front No USA 18.4
## 34 Driver only Rear Yes USA 15.9
## 44 None Front Yes non-USA 8.0
## 1 None Front Yes non-USA 15.9
## 82 Driver only 4WD Yes non-USA 19.5
## 64 Driver only Front Yes non-USA 11.8
## 13 Driver only Front Yes USA 11.4
## 55 Driver only Front Yes non-USA 16.5
## 45 None Front Yes non-USA 10.0
## 58 Driver only Rear Yes non-USA 31.9
## 56 None 4WD No non-USA 19.1
## 88 None Front Yes non-USA 9.1
## 9 Driver only Front No USA 26.3
## 52 Driver & Passenger Rear No USA 36.1
## 87 Driver only 4WD Yes non-USA 22.7
## 2 Driver & Passenger Front Yes non-USA 33.9
## 22 Driver only Front No USA 29.5
## 67 Driver only Front No non-USA 21.5
## 70 None Front No USA 19.5
## 49 Driver only Front Yes non-USA 28.0
## 59 Driver & Passenger Rear No non-USA 61.9
## 79 Driver only Front Yes USA 11.1
## 31 None Front Yes USA 7.4
## 41 Driver & Passenger Front Yes non-USA 19.8
## 48 Driver only Rear No non-USA 47.9
## 76 None Front Yes USA 18.5
## 43 Driver & Passenger Front Yes non-USA 17.5
## 53 None Front Yes non-USA 8.3
## 28 Driver only 4WD Yes USA 25.8
## 62 None Front Yes non-USA 10.3
## 33 None Front Yes USA 11.3
## 14 Driver & Passenger Rear Yes USA 15.1
## 50 Driver & Passenger Rear Yes non-USA 35.2
## 23 None Front Yes USA 9.2
## 4 Driver & Passenger Front Yes non-USA 37.7
## 80 None 4WD Yes non-USA 8.4
## 72 None 4WD Yes USA 14.4
## 25 Driver only Front Yes USA 13.3
## 73 None Front Yes USA 9.0
## 12 None Front Yes USA 13.4
## 63 Driver only Front No non-USA 26.1
## 7 Driver only Front No USA 20.8
## 69 Driver only Front No USA 16.3
## 57 Driver only Rear Yes non-USA 32.5
## 91 None Front Yes non-USA 23.3
## 17 None 4WD No USA 16.6
## 77 Driver & Passenger Front No USA 24.4
## 60 Driver only Front Yes USA 14.1
In addition, further randomization would occur below whie matching and subsetting (selection and assignment). In essence, I randomized many times. First, the data itself. Then, I randomized the order in which the experiment would run. Finally, the last randomization component was the random subset groups I would place them after Matching.
Therefore, there were 3 levels of randomization: measurement, assignment, and performing.
Replicate: Are there replicates and/or repeated measures?
Due to the simplification of the dataset, which was during 1993, I cannot verify if there were any replicates and/or repeated measures. At least for the project, I have not implemented such.
Block: Did you use blocking in the design?
There is no indication that blocking was used in the original experiment. Personally, I did not use blocking in my design as well.
The next steps left for the study was as follows:
First design the table
Assign factor levels to low and high
Match the table design of low and high to the actual data set
“Create” a new dataset that corresponds to the table design
Use this “new dataset” to continue the next procedures of me, ie, and ANOVA
Confirm the Resolution.
3. (Statistical) Analysis
(Exploratory Data Analysis) Graphics and descriptive summary
Below is the descriptive summary of the data.
summary(cars2)
## airbag drive man.trans origin
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.000 Median :1.000 Median :1.0000 Median :1.0000
## Mean :1.194 Mean :1.065 Mean :0.6559 Mean :0.5161
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :2.000 Max. :2.000 Max. :1.0000 Max. :1.0000
## price
## Min. : 7.40
## 1st Qu.:12.20
## Median :17.70
## Mean :19.51
## 3rd Qu.:23.30
## Max. :61.90
head(cars2,10)
## airbag drive man.trans origin price
## 1 2 1 1 0 15.9
## 2 0 1 1 0 33.9
## 3 1 1 1 0 29.1
## 4 0 1 1 0 37.7
## 5 1 2 1 0 30.0
## 6 1 1 0 1 15.7
## 7 1 1 0 1 20.8
## 8 1 2 0 1 23.7
## 9 1 1 0 1 26.3
## 10 1 1 0 1 34.7
Here are some graphical representations:
#Histogram
hist(cars$price, main = "Histogram of Car Prices")

Boxplots for each factors:
#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")

boxplot(cars$price~cars$drive, main ="Price vs Drive")

boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")

boxplot(cars$price~cars$origin, main ="Price vs Origin")

# Boxplot of overall factors with labeled axes
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")

Testing
In this experiment, the null hypothesis states that there is no difference between the price of the car because of the airbags, drive, man.trans, and origin.
On the contrary, the alternative hypothesis states that there is a difference between the price of the car because of the airbags, drive, man.trans, and origin.
Fractional Factorial Design
I created the basic fractional factorial design of 6 factors (decomposed) and then the 2^6-3 design.
#Fractional Factorial Design
ffd1a = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
ffd1a
## airbagA airbagB driveA driveB man.trans origin
## 1 -1 1 -1 -1 1 -1
## 2 -1 -1 1 1 -1 -1
## 3 -1 1 1 -1 -1 1
## 4 1 1 1 1 1 1
## 5 1 -1 -1 -1 -1 1
## 6 -1 -1 -1 1 1 1
## 7 1 1 -1 1 -1 -1
## 8 1 -1 1 -1 1 -1
## class=design, type= FrF2
ffd2a = FrF2(8, nfactors = 4, res3=T, factor.names = c("airbag", "drive", "man.trans", "origin"))
ffd2a
## airbag drive man.trans origin
## 1 -1 -1 1 1
## 2 1 1 -1 -1
## 3 -1 1 1 -1
## 4 -1 -1 -1 -1
## 5 -1 1 -1 1
## 6 1 -1 1 -1
## 7 1 -1 -1 1
## 8 1 1 1 1
## class=design, type= FrF2
I implemented the same design into R. However, first I did an unrandomized design.
Here is the Excel table I made.
Here is the R version.
t1 = as.data.frame(FrF2(8,6, res3=TRUE, randomize = FALSE))
A = c(as.integer(t1[,1]))
B = c(as.integer(t1[,2]))
C = c(as.integer(t1[,3]))
D = c(as.integer(t1[,4]))
CD = C*D
E = c(as.integer(t1[,5]))
F = c(as.integer(t1[,6]))
EF = E*F
temp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t1)
for (i in 1:n){
if (temp$CD[i] == 1){temp$airbag[i] = "Low"}
if (temp$CD[i] == 2){temp$airbag[i] = "Med"}
if (temp$CD[i] == 4){temp$airbag[i] = "High"}
if (temp$EF[i] == 1){temp$drive[i] = "Low"}
if (temp$EF[i] == 2){temp$drive[i] = "Med"}
if (temp$EF[i] == 4){temp$drive[i] = "High"}
if (temp$A[i] == 1){temp$man.trans[i] = "no"}
if (temp$A[i] == 2){temp$man.trans[i] = "yes"}
if (temp$B[i] == 1){temp$origin[i] = "no"}
if (temp$B[i] == 2){temp$origin[i] = "yes"}
}
project = cbind(t1,temp$airbag, temp$drive, temp$man.trans, temp$origin)
knitr::kable(project, align = 'c')
| -1 |
-1 |
-1 |
1 |
1 |
1 |
Med |
High |
no |
no |
| 1 |
-1 |
-1 |
-1 |
-1 |
1 |
Low |
Med |
yes |
no |
| -1 |
1 |
-1 |
-1 |
1 |
-1 |
Low |
Med |
no |
yes |
| 1 |
1 |
-1 |
1 |
-1 |
-1 |
Med |
Low |
yes |
yes |
| -1 |
-1 |
1 |
1 |
-1 |
-1 |
High |
Low |
no |
no |
| 1 |
-1 |
1 |
-1 |
1 |
-1 |
Med |
Med |
yes |
no |
| -1 |
1 |
1 |
-1 |
-1 |
1 |
Med |
Med |
no |
yes |
| 1 |
1 |
1 |
1 |
1 |
1 |
High |
High |
yes |
yes |
Then, using this, I randomized the order of the runs.
This is essentially the same, except I did randomize=TRUE.
#Randomized
FFD = (FrF2(8,6, res3=TRUE, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t2)
for (i in 1:n){
if (design$CD[i] == 1){design$airbag[i] <- "Low"}
if (design$CD[i] == 2){design$airbag[i] <- "Med"}
if (design$CD[i] == 4){design$airbag[i] <- "High"}
if (design$EF[i] == 1){design$drive[i] <- "Low"}
if (design$EF[i] == 2){design$drive[i] <- "Med"}
if (design$EF[i] == 4){design$drive[i] <- "High"}
if (design$A[i] == 1){design$man.trans[i] <- "no"}
if (design$A[i] == 2){design$man.trans[i] <- "yes"}
if (design$B[i] == 1){design$origin[i] <- "no"}
if (design$B[i] == 2){design$origin[i] <- "yes"}
}
fractionalfactorialdesign = cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
## A B C D E F design$airbag design$drive
## -1:4 -1:4 -1:4 -1:4 -1:4 -1:4 High:2 High:2
## 1 :4 1 :4 1 :4 1 :4 1 :4 1 :4 Low :2 Low :2
## Med :4 Med :4
## design$man.trans design$origin
## no :4 no :4
## yes:4 yes:4
##
knitr::kable(fractionalfactorialdesign, align = 'c')
| -1 |
1 |
1 |
-1 |
-1 |
1 |
Med |
Med |
no |
yes |
| 1 |
1 |
-1 |
1 |
-1 |
-1 |
Med |
Low |
yes |
yes |
| 1 |
1 |
1 |
1 |
1 |
1 |
High |
High |
yes |
yes |
| -1 |
1 |
-1 |
-1 |
1 |
-1 |
Low |
Med |
no |
yes |
| 1 |
-1 |
-1 |
-1 |
-1 |
1 |
Low |
Med |
yes |
no |
| -1 |
-1 |
-1 |
1 |
1 |
1 |
Med |
High |
no |
no |
| 1 |
-1 |
1 |
-1 |
1 |
-1 |
Med |
Med |
yes |
no |
| -1 |
-1 |
1 |
1 |
-1 |
-1 |
High |
Low |
no |
no |
The idea behind this design is the matching component. I first generated the dataset and then the fractional factorial design using FrF2, which I used to compare. The design consisted of the decomposition process. The explanation was described before. The matching process was using those designs. Once having the two dataset (the car design and the FrF2 design), I was able to compare them. I would see which ones corresponds. For example, if I had a high, med, low, high in the cars dataset and a high, med, low, high in the FrF2 design, then they matched. The matching process was the selection method to collect the data. There was only 8 chosen because of the design of the 1/8th we chose to do.
FFD_main = FrF2(8, nfactors = 6, res3=T)
FFD_main
## A B C D E F
## 1 1 1 -1 1 -1 -1
## 2 -1 -1 -1 1 1 1
## 3 1 -1 1 -1 1 -1
## 4 -1 1 1 -1 -1 1
## 5 -1 -1 1 1 -1 -1
## 6 -1 1 -1 -1 1 -1
## 7 1 -1 -1 -1 -1 1
## 8 1 1 1 1 1 1
## class=design, type= FrF2
aliasprint(FFD_main)
## $legend
## [1] A=A B=B C=C D=D E=E F=F
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
I looked into the aliasing more later after the me and ie.
Main Effects and Interaction Effects
First, I computed the me and ie without matching. So, this would be before the collecting data.
#me and ie in general and without matching
m1 = lm(price~(airbag+ drive+man.trans+origin)^2,data=cars2)
anova(m1)
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## airbag 1 2742.6 2742.56 65.7071 4.308e-12 ***
## drive 1 360.4 360.39 8.6343 0.0042829 **
## man.trans 1 390.9 390.87 9.3645 0.0029895 **
## origin 1 576.9 576.92 13.8219 0.0003663 ***
## airbag:drive 1 99.5 99.52 2.3843 0.1264075
## airbag:man.trans 1 10.5 10.50 0.2517 0.6172519
## airbag:origin 1 574.2 574.17 13.7562 0.0003774 ***
## drive:man.trans 1 7.4 7.40 0.1772 0.6748971
## drive:origin 1 307.8 307.80 7.3743 0.0080660 **
## man.trans:origin 1 91.3 91.28 2.1870 0.1430149
## Residuals 82 3422.6 41.74
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Then, I computed the me and ie after the data was collected. To do so, I made subsets that contained the matched ones in the Excel (or the one above).
The me and 2fi were calculated the same way it was always calculated.
meie = matrix(c(mea, meb,mec, med, mee,mef,ieab, ieac, iead, ieae, ieaf,iebc,iebd,iebe,iebf,iecd,iecf,iede,iedf,ieef), ncol=1)
meie
## [,1]
## [1,] -9.100
## [2,] 0.450
## [3,] 1.700
## [4,] 5.250
## [5,] 2.775
## [6,] 12.550
## [7,] -4.750
## [8,] 2.950
## [9,] 7.600
## [10,] 2.950
## [11,] -11.450
## [12,] 4.450
## [13,] -4.750
## [14,] 4.450
## [15,] 9.100
## [16,] 14.300
## [17,] 9.100
## [18,] -4.750
## [19,] -0.100
## [20,] -9.950
NOTE: IMPORTANT When I ran it the first time, this is the meie generated. The RMarkdown was generated using my second computer. Therefore, meie values may differ. Please refer to the meie table below:
The response variable was added into the design as well.
#Read price matched:
cars3 = read.csv("cars3.csv", header = TRUE, sep=",")
des= add.response(FFD, cars3$price)
MEPlot(des, abbrev = 5, cex.xax = 1.6, cex.main = 2)

IAPlot(des, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5, main="Interaction Plot")

#Showing it in perspective
FFD2 = FrF2(8, nfactors = 6, res3= TRUE, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
FFD2
## airbagA airbagB driveA driveB man.trans origin
## 1 1 1 1 1 1 1
## 2 -1 -1 1 1 -1 -1
## 3 1 1 -1 1 -1 -1
## 4 -1 1 1 -1 -1 1
## 5 1 -1 -1 -1 -1 1
## 6 1 -1 1 -1 1 -1
## 7 -1 1 -1 -1 1 -1
## 8 -1 -1 -1 1 1 1
## class=design, type= FrF2
summary(FFD2)
## Call:
## FrF2(8, nfactors = 6, res3 = TRUE, factor.names = c("airbagA",
## "airbagB", "driveA", "driveB", "man.trans", "origin"))
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## airbagA airbagB driveA driveB man.trans origin
## 1 -1 -1 -1 -1 -1 -1
## 2 1 1 1 1 1 1
##
## Design generating information:
## $legend
## [1] A=airbagA B=airbagB C=driveA D=driveB E=man.trans F=origin
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## airbagA airbagB driveA driveB man.trans origin
## 1 1 1 1 1 1 1
## 2 -1 -1 1 1 -1 -1
## 3 1 1 -1 1 -1 -1
## 4 -1 1 1 -1 -1 1
## 5 1 -1 -1 -1 -1 1
## 6 1 -1 1 -1 1 -1
## 7 -1 1 -1 -1 1 -1
## 8 -1 -1 -1 1 1 1
## class=design, type= FrF2
aliasprint(FFD2)
## $legend
## [1] A=airbagA B=airbagB C=driveA D=driveB E=man.trans F=origin
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
According to the meie table and the aliasprint of the FFD2, I could see that the model was highly aliased. As expected the Resolution was Resolution III because it was an 1/8th 2^6-3 design. The definition of Resolution III is “No main effect is aliased with any other main effect (me). Main effects are aliased with two factor interactions (2fi)”. The definition matched to what really happened because the me were not aliased with any other me, but with only the 2fis. The fi’s should be looked upon with caution because of the confounding variable and not many factors can be looked at indepdently. The me’s seem to be the only ones that look reliable and accurate. Therefore, to create my linear model, I decided to factor in both the me’s only and the ANOVA.
Constructing the Linear Model
To create the linear model, I first check the ANOVA results for all the factors togther. Then, I came up with the factors to select based off this and the me’s.
fit = aov(price~airbag + drive + man.trans + origin + price:airbag + price:drive + price:man.trans + price:origin, data = cars2)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## airbag 1 2742.6 2742.6 643.69 < 2e-16 ***
## drive 1 360.4 360.4 84.58 2.38e-14 ***
## man.trans 1 390.9 390.9 91.74 4.08e-15 ***
## origin 1 576.9 576.9 135.40 < 2e-16 ***
## price:airbag 1 1655.5 1655.5 388.56 < 2e-16 ***
## price:drive 1 2351.3 2351.3 551.86 < 2e-16 ***
## price:man.trans 1 85.2 85.2 20.00 2.41e-05 ***
## price:origin 1 63.3 63.3 14.87 0.000225 ***
## Residuals 84 357.9 4.3
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As you can see, there was significance for all the factors. This made sense because the price of cars, especially back then, would matter a lot depending if the car was manual transmission or not or had 4WD. Even today, 4WD drive cars tend to be more expensive than a front or rear drive car. Also, origin is another factor that makes sense because ‘Made in the USA’ is definitely more expensive than a foreign part due to many factors such as labor costs and production/manufacturing costs. Therefore, ‘Made in the USA’ parts would increase the car’s price. Airbags costs should also be included in the car. Making airbags are not free. There is a cost to it, therefore increasing the price of the car. Using good judgement and the meie table, I knew not to consider the 2fi because they were highly aliased. Then, I wanted to differentiate the “more significant ones”. To do so, the best logical way was to refer to the meie table again. Airbags had a relatively small me of -2. Drive, man.trans, and origin had a bigger me of -11.3 and -11.65. Therefore, I decided to include drive, man.trans, and origin into the linear model. Intuitively and using prior knowledge this made sense. Drive is based on the car’s engine and bottom layout design, requiring more time, hence more costs. So, the results made sense.
Therefore, I generated another ANOVA using those factors.
fit2 = aov(price~man.trans +drive + origin, data = cars2)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## man.trans 1 915 915.1 13.382 0.000430 ***
## drive 1 970 969.9 14.184 0.000297 ***
## origin 1 613 613.1 8.966 0.003561 **
## Residuals 89 6086 68.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results from the ANOVA, my Best Guess experiment, and the me all matched together and were consistent. The ANOVA factors that were significant had a high me. This also matched with my prior background knowledge of cars in general.
The linear model is as follows:
fit.lm = lm(price~ drive+man.trans + origin, data = cars2)
summary(fit.lm)
##
## Call:
## lm.default(formula = price ~ drive + man.trans + origin, data = cars2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.283 -5.997 -2.005 4.731 28.028
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.509 2.813 7.645 2.31e-11 ***
## drive 6.182 1.640 3.770 0.000293 ***
## man.trans -8.604 2.004 -4.292 4.49e-05 ***
## origin -5.690 1.900 -2.994 0.003561 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.269 on 89 degrees of freedom
## Multiple R-squared: 0.291, Adjusted R-squared: 0.2671
## F-statistic: 12.18 on 3 and 89 DF, p-value: 9.477e-07
Estimation
confint(fit2)
## 2.5 % 97.5 %
## (Intercept) 15.918817 27.099343
## man.trans -12.586332 -4.621029
## drive 2.923599 9.439590
## origin -9.464954 -1.914121
coef(fit2)
## (Intercept) man.trans drive origin
## 21.509080 -8.603681 6.181595 -5.689537
Diagnostics / Model Adequacy Checking
To check if the normality assumption holds, I used the qqnorm and qqlines.
#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))

#Residual Plot
plot(fitted(fit2),residuals(fit2))

Based on the qqnorm and qqline, the qqnorm line looks fairly linear and closely fits the qqline drawn. Also, the residual plot showed an evenly distributed residual as a function of fitted values. Also, there was a good scatter and not close to 0. The pattern seemed random and evenly distributed, two important conditions that satisfy the normality check. Therefore, the normality assumption was satisfied because the residuals were evenly distributed and the qqnorm line is linear enough. The normality check passed. There is no need to do transformation because the normality assumption was passed.
Summary: Overall, the experiment finding was an interesting one. The null hypothesis was rejected because the price of cars depended on the factors, like drive, man.trans, and origin. The fractional factorial design was implemented and I looked into the me and ie as well. The aliasing structure was examined and the Resolution was Resolution III. Resolution III was the highest resolution possible because of the parameters and data. To get a higher Resolution, I would need a new dataset and have more runs. For Resolution IV, I need 16 runs (2^4) and for Resolution VI I need 32 runs. For now, Resolution III would be satisfying. In general, the drive, man.trans, and origin had a great impact, while the 2fi were not reliable. The results make sense intuitvely and meets the expectations I had.
For the findings, it was interesting to explore the design procedure.
The analysis made sense and the fractional factorial design was well executed.
5. Appendices
#Benjamin Byeon
#11/24/16
#Section 1
#Load the package
library("FrF2")
#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars
#System under test
summary(cars)
head(cars)
tail(cars)
names(cars)
#Display the structure
names(cars)
str(cars)
#Check the levels
levels(cars$price)
levels(cars$airbag)
levels(cars$drive)
levels(cars$man.trans)
levels(cars$origin)
#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
kable(X1,align='c')
E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(C,D,X2))
kable(X2, align ='c')
#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata
#Display the descriptive statistics
summary(cars)
#Histogram
hist(cars$price, main = "Histogram of Car Prices")
#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")
boxplot(cars$price~cars$drive, main ="Price vs Drive")
boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")
boxplot(cars$price~cars$origin, main ="Price vs Origin")
#FFD
#FFD
#
#FFD_t = FrF2(8, nfactors = 6, res3=T, estimable = formula("~airbag+drive+man.trans+origin+airbag:(drive+man.trans+origin)"))
f1 = FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))
f1
#EXCEL
ffd <- as.data.frame(FrF2(64,6, randomize = FALSE))
A <- c(as.integer(ffd[,1]))
B <- c(as.integer(ffd[,2]))
C <- c(as.integer(ffd[,3]))
D <- c(as.integer(ffd[,4]))
CD <- C*D
E <- c(as.integer(ffd[,5]))
F <- c(as.integer(ffd[,6]))
EF <- E*F
excel <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(ffd)
for (i in 1:n){
if (excel$A[i] == 1){excel$TreatA[i] <- "Low"}
if (excel$A[i] == 2){excel$TreatA[i] <- "High"}
if (excel$B[i] == 1){excel$TreatB[i] <- "Low"}
if (excel$B[i] == 2){excel$TreatB[i] <- "High"}
if (excel$CD[i] == 1){excel$TreatX1[i] <- "Low"}
if (excel$CD[i] == 2){excel$TreatX1[i] <- "Med"}
if (excel$CD[i] == 4){excel$TreatX1[i] <- "High"}
if (excel$EF[i] == 1){excel$TreatX2[i] <- "Low"}
if (excel$EF[i] == 2){excel$TreatX2[i] <- "Med"}
if (excel$EF[i] == 4){excel$TreatX2[i] <- "High"}
}
ffd12 = cbind(ffd,excel$TreatA, excel$TreatB, excel$TreatX1, excel$TreatX2)
kable(ffd12, align = 'c')
#This is the fractional factorial design of 2^6-3
t = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t[,1]))
B = c(as.integer(t[,2]))
C = c(as.integer(t[,3]))
D = c(as.integer(t[,4]))
CD = C*D
E = c(as.integer(t[,5]))
F = c(as.integer(t[,6]))
EF = E*F
ffdp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t)
for (i in 1:n){
if (ffdp$A[i] == 1){ffdp$TreatA[i] <- "Low"}
if (ffdp$A[i] == 2){ffdp$TreatA[i] <- "High"}
if (ffdp$B[i] == 1){ffdp$TreatB[i] <- "Low"}
if (ffdp$B[i] == 2){ffdp$TreatB[i] <- "High"}
if (ffdp$CD[i] == 1){ffdp$TreatX1[i] <- "Low"}
if (ffdp$CD[i] == 2){ffdp$TreatX1[i] <- "Med"}
if (ffdp$CD[i] == 4){ffdp$TreatX1[i] <- "High"}
if (ffdp$EF[i] == 1){ffdp$TreatX2[i] <- "Low"}
if (ffdp$EF[i] == 2){ffdp$TreatX2[i] <- "Med"}
if (ffdp$EF[i] == 4){ffdp$TreatX2[i] <- "High"}
}
ffdMain = cbind(t,ffdp$TreatA,ffdp$TreatB,ffdp$TreatX1,ffdp$TreatX2)
kable(Fractional_Factorial, align = 'c')
#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)
#Automatically replace the categories as 0,1,2
for (i in 1:a){
#For airbags
if (cars$airbag[i] =="Driver & Passenger"){
airbag[i,1] = 0
}
if (cars$airbag[i] == "Driver only"){
airbag[i,1] = 1
}
if (cars$airbag[i] == "None"){
airbag[i,1] = 2
}
#For drive
if (cars$drive[i] =="4WD"){
drive[i,1] = 0
}
if (cars$drive[i] == "Front"){
drive[i,1] = 1
}
if (cars$drive[i] == "Rear"){
drive[i,1] = 2
}
#For man.trans
if (cars$man.trans[i] == "No"){
man.trans[i,1] = 0
} else{
man.trans[i,1] = 1
}
#For origin
if (cars$origin[i] == "non-USA"){
origin[i,1] = 0
} else{
origin[i,1] = 1
}
}
#Get match corresponding
for (i in 1:a){
#For A
if (test$A[i] == "low"){
A[i,1] = 0
} else{
A[i,1] = 1
}
#For B
if (test$B[i] == "low"){
B[i,1] = 0
} else{
B[i,1] = 1
}
#For X1
if (test$X1[i] =="low"){
X1[i,1] = 0
}
if (test$X1[i] == "med"){
X1[i,1] = 1
}
if (test$X1[i] == "high"){
X1[i,1] = 2
}
#For X2
if (test$X2[i] =="low"){
X2[i,1] = 0
}
if (test$X2[i] == "med"){
X2[i,1] = 1
}
if (test$X2[i] == "high"){
X2[i,1] = 2
}
}
#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")
head(cars2,10)
tail(cars2, 20)
str(cars2)
# Boxplot of overall factors with labeled axes
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")
#Fractional Factorial Design
ffd1a = FrF2(8, nfactors = 4, res3=T, factor.names = c("airbag", "drive", "man.trans", "origin"))
ffd1a
ffd2a = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
ffd2a
t1 = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t1[,1]))
B = c(as.integer(t1[,2]))
C = c(as.integer(t1[,3]))
D = c(as.integer(t1[,4]))
CD = C*D
E = c(as.integer(t1[,5]))
F = c(as.integer(t1[,6]))
EF = E*F
temp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t1)
for (i in 1:n){
if (temp$CD[i] == 1){temp$airbag[i] <- "Low"}
if (temp$CD[i] == 2){temp$airbag[i] <- "Med"}
if (temp$CD[i] == 4){temp$airbag[i] <- "High"}
if (temp$EF[i] == 1){temp$drive[i] <- "Low"}
if (temp$EF[i] == 2){temp$drive[i] <- "Med"}
if (temp$EF[i] == 4){temp$drive[i] <- "High"}
if (temp$A[i] == 1){temp$man.trans[i] <- "no"}
if (temp$A[i] == 2){temp$man.trans[i] <- "yes"}
if (temp$B[i] == 1){temp$origin[i] <- "no"}
if (temp$B[i] == 2){temp$origin[i] <- "yes"}
}
project <- cbind(t1,temp$airbag, temp$drive, temp$man.trans, temp$origin)
kable(project, align = 'c')
#Randomized
FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t2)
for (i in 1:n){
if (design$CD[i] == 1){design$airbag[i] <- "Low"}
if (design$CD[i] == 2){design$airbag[i] <- "Med"}
if (design$CD[i] == 4){design$airbag[i] <- "High"}
if (design$EF[i] == 1){design$drive[i] <- "Low"}
if (design$EF[i] == 2){design$drive[i] <- "Med"}
if (design$EF[i] == 4){design$drive[i] <- "High"}
if (design$A[i] == 1){design$man.trans[i] <- "no"}
if (design$A[i] == 2){design$man.trans[i] <- "yes"}
if (design$B[i] == 1){design$origin[i] <- "no"}
if (design$B[i] == 2){design$origin[i] <- "yes"}
}
fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
kable(fractionalfactorialdesign, align = 'c')
FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)
#Showing it in perspective
FFD2 = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
FFD2
summary(FFD2)
aliasprint(FFD2)
#me and ie in general and without matching
m1 = lm(price~(airbag+ drive+man.trans+origin)^2,data=cars2)
anova(m1)
# Subset creation for factorial design pulled randomly from the table
seta = subset(cars2, airbag == "1" & drive == "1" & man.trans == "1" & origin == "1")
setb = subset(cars2, airbag == "1" & drive == "2" & man.trans == "1" & origin == "0")
setc = subset(cars2, airbag == "1" & drive == "2" & man.trans == "0" & origin == "0")
setd = subset(cars2, airbag == "1" & drive == "1" & man.trans == "0" & origin == "1")
sete = subset(cars2,airbag == "1" & drive == "1" & man.trans == "1" & origin == "0")
setf = subset(cars2,airbag == "2" & drive == "1" & man.trans == "0" & origin == "0")
setg = subset(cars2,airbag == "2" & drive == "0" & man.trans == "1" & origin == "1")
seth = subset(cars2,airbag == "1" & drive == "0" & man.trans == "1" & origin == "0")
#Function to get a sample of row to use later for me
mec = function (cars){
a = sample(nrow(cars))
b = a[1]
return(cars$price[b])
}
# Use functions to get group samples
meana = mec(seta)
meanb = mec(setb)
meanc = mec(setc)
meand = mec(setd)
meane = mec(sete)
meanf = mec(setf)
meang = mec(setg)
meanh = mec(seth)
#me and ie
mea = 0.25* ((meane[1]+meang[1]+meanh[1]+meand[1])-(meanc[1]+meanf[1]+meanb[1]+meana[1]))
meb = 0.25* ((meang[1]+meand[1]+meanf[1]+meanb[1])-(meanc[1]+meane[1]+meana[1]+meanh[1]))
mec = 0.25* ((meanb[1]+meana[1]+meanh[1]+meand[1])-(meanc[1]+meane[1]+meanf[1]+meang[1]))
med = 0.25* ((meanc[1]+meana[1]+meang[1]+meand[1])-(meanf[1]+meanh[1]+meanb[1]+meane[1]))
mee = 0.25* ((meanc[1]+meanf[1]+meanh[1]+meand[1])-(meana[1]+meanb[1]+meanc[1]+meang[1]))
mef = 0.25* ((meanc[1]+meanb[1]+meane[1]+meand[1])-(meana[1]+meanf[1]+meang[1]+meanh[1]))
ieab = 0.25 * ((meana[1] + meanb[1] + meand[1] + meane[1] - (meanf[1] + meanc[1] + meang[1] + meanh[1])))
ieac = 0.25 * ((meana[1] + meanc[1] + meand[1] + meane[1] - (meanf[1] + meanb[1] + meang[1] + meanh[1])))
iead = 0.25 * ((meana[1] + meand[1] + meanc[1] + meanf[1] - (meane[1] + meanb[1] + meang[1] + meanh[1])))
ieae = 0.25 * ((meana[1] + meand[1] + meanc[1] + meane[1] - (meanf[1] + meanb[1] + meang[1] + meanh[1])))
ieaf = 0.25 * ((meanf[1] + meana[1] + meand[1] + meane[1] - (meanb[1] + meanc[1] + meang[1] + meanh[1])))
iebc = 0.25 * ((meanb[1] + meanc[1] + meana[1] + meane[1] - (meanf[1] + meand[1] + meang[1] + meanh[1])))
iebd = 0.25 * ((meanb[1] + meand[1] + meana[1] + meane[1] - (meanf[1] + meanc[1] + meang[1] + meanh[1])))
iebe = 0.25 * ((meanb[1] + meane[1] + meana[1] + meanc[1] - (meanf[1] + meand[1] + meang[1] + meanh[1])))
iebf = 0.25 * ((meanb[1] + meanf[1] + meana[1] + meanc[1] - (meane[1] + meand[1] + meang[1] + meanh[1])))
iecd = 0.25 * ((meanc[1] + meand[1] + meana[1] + meanb[1] - (meane[1] + meanf[1] + meang[1] + meanh[1])))
iece = 0.25 * ((meanc[1] + meane[1] + meana[1] + meanb[1] - (meand[1] + meanf[1] + meang[1] + meanh[1])))
iecf = 0.25 * ((meanc[1] + meanf[1] + meana[1] + meanb[1] - (meand[1] + meane[1] + meang[1] + meanh[1])))
iede = 0.25 * ((meand[1] + meane[1] + meana[1] + meanb[1] - (meanc[1] + meanf[1] + meang[1] + meanh[1])))
iedf = 0.25 * ((meand[1] + meanf[1] + meana[1] + meanb[1] - (meanc[1] + meane[1] + meang[1] + meanh[1])))
ieef = 0.25 * ((meane[1] + meanf[1] + meana[1] + meanb[1] - (meanc[1] + meand[1] + meang[1] + meanh[1])))
meie = matrix(c(me1, me2,me3, me4, me5,me6,ieab, ieac, iead, ieae, ieaf,iebc,iebd,iebe,iebf,iecd,iecf,iede,iedf,ieef), ncol=1)
meie
#Read price matched:
cars3 = read.csv("cars3.csv", header = TRUE, sep=",")
des= add.response(FFD, cars3$price)
MEPlot(des, abbrev = 5, cex.xax = 1.6, cex.main = 2)
IAPlot(des, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5, main="Interaction Plot")
fit = aov(price~airbag + drive + man.trans + origin + price:airbag + price:drive + price:man.trans + price:origin, data = cars2)
summary(fit)
fit2 <- aov(price~drive + man.trans + origin, data = cars2)# ANOVA for final model
summary(fit2)
confint(fit2)
coef(fit2)
#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))
#Residual Plot
plot(fitted(fit2),residuals(fit2))