DOE Project3

Benjamin Byeon

Rensselaer Polytechnic Institute - Troy, NY

V1.12.10.16

1. Setting

From the Ecdat Package, I selected a dataset that contained N=93 observations. The data looks into different factors that affect the pricing of cars. The study contains one response variable and two 2 level factors and two 3 level factors, which would be decomposed into 2 two level factors. Further analysis were made regarding the study.

The data was copied and pasted into Excel, and then saved as a .csv. Shown below is the loaded data into R.

#Load the package
library("FrF2")

## Warning: package 'FrF2' was built under R version 3.3.2

## Loading required package: DoE.base

## Warning: package 'DoE.base' was built under R version 3.3.2

## Loading required package: grid

## Loading required package: conf.design

## Warning: package 'conf.design' was built under R version 3.3.2

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

## The following object is masked from 'package:base':
## 
##     lengths

#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars

##                airbag drive man.trans  origin price
## 1                None Front       Yes non-USA  15.9
## 2  Driver & Passenger Front       Yes non-USA  33.9
## 3         Driver only Front       Yes non-USA  29.1
## 4  Driver & Passenger Front       Yes non-USA  37.7
## 5         Driver only  Rear       Yes non-USA  30.0
## 6         Driver only Front        No     USA  15.7
## 7         Driver only Front        No     USA  20.8
## 8         Driver only  Rear        No     USA  23.7
## 9         Driver only Front        No     USA  26.3
## 10        Driver only Front        No     USA  34.7
## 11 Driver & Passenger Front        No     USA  40.1
## 12               None Front       Yes     USA  13.4
## 13        Driver only Front       Yes     USA  11.4
## 14 Driver & Passenger  Rear       Yes     USA  15.1
## 15               None Front        No     USA  15.9
## 16               None Front        No     USA  16.3
## 17               None   4WD        No     USA  16.6
## 18        Driver only  Rear        No     USA  18.8
## 19        Driver only  Rear       Yes     USA  38.0
## 20 Driver & Passenger Front        No     USA  18.4
## 21 Driver & Passenger Front        No     USA  15.8
## 22        Driver only Front        No     USA  29.5
## 23               None Front       Yes     USA   9.2
## 24        Driver only Front       Yes     USA  11.3
## 25        Driver only Front       Yes     USA  13.3
## 26        Driver only   4WD        No     USA  19.0
## 27        Driver only Front        No     USA  15.6
## 28        Driver only   4WD       Yes     USA  25.8
## 29               None Front       Yes     USA  12.2
## 30 Driver & Passenger Front        No     USA  19.3
## 31               None Front       Yes     USA   7.4
## 32               None Front       Yes     USA  10.1
## 33               None Front       Yes     USA  11.3
## 34        Driver only  Rear       Yes     USA  15.9
## 35        Driver only Front       Yes     USA  14.0
## 36        Driver only   4WD       Yes     USA  19.9
## 37        Driver only Front        No     USA  20.2
## 38        Driver only  Rear        No     USA  20.9
## 39               None Front       Yes non-USA   8.4
## 40        Driver only Front       Yes non-USA  12.5
## 41 Driver & Passenger Front       Yes non-USA  19.8
## 42        Driver only Front       Yes non-USA  12.1
## 43 Driver & Passenger Front       Yes non-USA  17.5
## 44               None Front       Yes non-USA   8.0
## 45               None Front       Yes non-USA  10.0
## 46               None Front       Yes non-USA  10.0
## 47               None Front       Yes non-USA  13.9
## 48        Driver only  Rear        No non-USA  47.9
## 49        Driver only Front       Yes non-USA  28.0
## 50 Driver & Passenger  Rear       Yes non-USA  35.2
## 51 Driver & Passenger Front        No     USA  34.3
## 52 Driver & Passenger  Rear        No     USA  36.1
## 53               None Front       Yes non-USA   8.3
## 54               None Front       Yes non-USA  11.6
## 55        Driver only Front       Yes non-USA  16.5
## 56               None   4WD        No non-USA  19.1
## 57        Driver only  Rear       Yes non-USA  32.5
## 58        Driver only  Rear       Yes non-USA  31.9
## 59 Driver & Passenger  Rear        No non-USA  61.9
## 60        Driver only Front       Yes     USA  14.1
## 61               None  Rear        No     USA  14.9
## 62               None Front       Yes non-USA  10.3
## 63        Driver only Front        No non-USA  26.1
## 64        Driver only Front       Yes non-USA  11.8
## 65        Driver only Front       Yes non-USA  15.7
## 66               None Front        No non-USA  19.1
## 67        Driver only Front        No non-USA  21.5
## 68               None Front        No     USA  13.5
## 69        Driver only Front        No     USA  16.3
## 70               None Front        No     USA  19.5
## 71        Driver only Front        No     USA  20.7
## 72               None   4WD       Yes     USA  14.4
## 73               None Front       Yes     USA   9.0
## 74               None Front       Yes     USA  11.1
## 75 Driver & Passenger  Rear       Yes     USA  17.7
## 76               None Front       Yes     USA  18.5
## 77 Driver & Passenger Front        No     USA  24.4
## 78        Driver only Front       Yes non-USA  28.7
## 79        Driver only Front       Yes     USA  11.1
## 80               None   4WD       Yes non-USA   8.4
## 81               None   4WD       Yes non-USA  10.9
## 82        Driver only   4WD       Yes non-USA  19.5
## 83               None Front       Yes non-USA   8.6
## 84        Driver only Front       Yes non-USA   9.8
## 85        Driver only Front       Yes non-USA  18.4
## 86        Driver only Front       Yes non-USA  18.2
## 87        Driver only   4WD       Yes non-USA  22.7
## 88               None Front       Yes non-USA   9.1
## 89               None Front       Yes non-USA  19.7
## 90               None Front       Yes non-USA  20.0
## 91               None Front       Yes non-USA  23.3
## 92        Driver only  Rear       Yes non-USA  22.7
## 93 Driver & Passenger Front       Yes non-USA  26.7

System under test

The dataset helps determine the what effects a price of a car in 1993, with factors the presence of airbags, driving train type, avaiability of manual transmission, and the origin of the machine parts. The factors were chosen in order to match the Fractional Factorial Design criteria. Within the choices, Best Guess was used to select specific factors due to my personal interests and experience with knowledge of cars.

Below are the definitions of the variables:

price: minimum price of the cars

airbag: air bags: none, driver only, or driver and passenger

drive: rear wheel, front wheel, or 4WD

man.trans: manual transmission: yes or no

origin: product origin made in the USA: yes or no

#Display the descriptive summaery 
summary(cars)

##                 airbag     drive    man.trans     origin       price      
##  Driver & Passenger:16   4WD  :10   No :32    non-USA:45   Min.   : 7.40  
##  Driver only       :43   Front:67   Yes:61    USA    :48   1st Qu.:12.20  
##  None              :34   Rear :16                          Median :17.70  
##                                                            Mean   :19.51  
##                                                            3rd Qu.:23.30  
##                                                            Max.   :61.90

head(cars)

##               airbag drive man.trans  origin price
## 1               None Front       Yes non-USA  15.9
## 2 Driver & Passenger Front       Yes non-USA  33.9
## 3        Driver only Front       Yes non-USA  29.1
## 4 Driver & Passenger Front       Yes non-USA  37.7
## 5        Driver only  Rear       Yes non-USA  30.0
## 6        Driver only Front        No     USA  15.7

tail(cars)

##                airbag drive man.trans  origin price
## 88               None Front       Yes non-USA   9.1
## 89               None Front       Yes non-USA  19.7
## 90               None Front       Yes non-USA  20.0
## 91               None Front       Yes non-USA  23.3
## 92        Driver only  Rear       Yes non-USA  22.7
## 93 Driver & Passenger Front       Yes non-USA  26.7

names(cars)

## [1] "airbag"    "drive"     "man.trans" "origin"    "price"

#Display the structure
names(cars)

## [1] "airbag"    "drive"     "man.trans" "origin"    "price"

str(cars)

## 'data.frame':    93 obs. of  5 variables:
##  $ airbag   : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
##  $ drive    : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
##  $ man.trans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ origin   : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
##  $ price    : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

Factors and Levels

The data contains 5 factors, two 2 level factors and two 3 level factors. The as.factors was not necessary because they were categorical.

#Check the levels
levels(cars$price)

## NULL

levels(cars$airbag)

## [1] "Driver & Passenger" "Driver only"        "None"

levels(cars$drive)

## [1] "4WD"   "Front" "Rear"

levels(cars$man.trans)

## [1] "No"  "Yes"

levels(cars$origin)

## [1] "non-USA" "USA"

The two 2 level factors:

man.trans: Does the car have manual transmission yes or no?

origin: Are product parts of the car made from origin in the USA yes or no?

The two 3 level factors:

airbag: Do the cars have air bags none, driver only, or driver and passenger?

drive: Is the car rear wheel, front wheel, or 4WD?

Continuous Variables (if any)

The data had one continuous variable that was also the dependent variable. It was the price. The price was in USD. Note, the may not reflect the USD worth today because the data was collected in 1993.

Response Variable(s)

The main objective of the experiment was to determine the effects of car’s price Therefore, the price variable also represents the response variable as well.

The Data: How is it organized and what does it look like?

The data consists of N=93 obversations with 6 variables. The data was collected in the year 1993 and the country the data came from was the USA.

In order to satisfy the factorial design with levels of high and low, I needed to make changes to the data. I replaced the factors with binary integer terms of 0,1,and 2.

#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)

#Automatically replace the categories as 0,1,2 
for (i in 1:a){
  #For airbags
  if (cars$airbag[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (cars$airbag[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (cars$airbag[i] == "None"){
    airbag[i,1] = 2
  }
  
  #For drive
  if (cars$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (cars$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (cars$drive[i] == "Rear"){
    drive[i,1] = 2
  }
  

  #For man.trans
  if (cars$man.trans[i] == "No"){
    man.trans[i,1] = 0
  } else{
    man.trans[i,1] = 1
  }
  
  #For origin
  if (cars$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
}

Here is the new breakdown of the levels:

airbag: driver and passenger = 0, driver only = 1, none = 2

drive: 4WD = 0, front = 1, rear = 2

man.trans: no = 0, yes = 1

origin: no = 0, yes = 1

Here is the view of the new organization:

#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")

head(cars2,10)

##    airbag drive man.trans origin price
## 1       2     1         1      0  15.9
## 2       0     1         1      0  33.9
## 3       1     1         1      0  29.1
## 4       0     1         1      0  37.7
## 5       1     2         1      0  30.0
## 6       1     1         0      1  15.7
## 7       1     1         0      1  20.8
## 8       1     2         0      1  23.7
## 9       1     1         0      1  26.3
## 10      1     1         0      1  34.7

tail(cars2, 20)

##    airbag drive man.trans origin price
## 74      2     1         1      1  11.1
## 75      0     2         1      1  17.7
## 76      2     1         1      1  18.5
## 77      0     1         0      1  24.4
## 78      1     1         1      0  28.7
## 79      1     1         1      1  11.1
## 80      2     0         1      0   8.4
## 81      2     0         1      0  10.9
## 82      1     0         1      0  19.5
## 83      2     1         1      0   8.6
## 84      1     1         1      0   9.8
## 85      1     1         1      0  18.4
## 86      1     1         1      0  18.2
## 87      1     0         1      0  22.7
## 88      2     1         1      0   9.1
## 89      2     1         1      0  19.7
## 90      2     1         1      0  20.0
## 91      2     1         1      0  23.3
## 92      1     2         1      0  22.7
## 93      0     1         1      0  26.7

str(cars2)

## 'data.frame':    93 obs. of  5 variables:
##  $ airbag   : num  2 0 1 0 1 1 1 1 1 1 ...
##  $ drive    : num  1 1 1 1 2 1 1 2 1 1 ...
##  $ man.trans: num  1 1 1 1 1 0 0 0 0 0 ...
##  $ origin   : num  0 0 0 0 0 1 1 1 1 1 ...
##  $ price    : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

2. (Experimental) Design

The experimental design for the fractional factorial design had many procedurals steps that needed to taken.

A fractional factorial design helps reduce the calculations needed.

How will the experiment be organized and conducted to test the hypothesis?

The first step was to make sure the I appropriately construct the correct 2^k design by representing each 3-level factor in terms of 2-level factors. That is what I precisely did by : Here is the decomposition of the 3-level factors:

#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
knitr::kable(X1,align='c')

C	D	X1
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(E,F,X2))
knitr::kable(X2, align ='c')

E	F	X2
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

This decomposition of the 3 level factors was a primary step of the project. To match later, I also changed the factors to integers for the corresponding:

Get match corresponding

for (i in 1:a){
  #For A
  if (test$A[i] == "low"){
    A[i,1] = 0
  } else{
    A[i,1] = 1
  }
  
  #For B
  if (test$B[i] == "low"){
    B[i,1] = 0
  } else{
    B[i,1] = 1
  }
  
  #For X1
  if (test$X1[i] =="low"){
    X1[i,1] = 0
  }
  if (test$X1[i] == "med"){
    X1[i,1] = 1
  }
  if (test$X1[i] == "high"){
    X1[i,1] = 2
  }
  
  #For X2
  if (test$X2[i] =="low"){
    X2[i,1] = 0
  }
  if (test$X2[i] == "med"){
    X2[i,1] = 1
  }
  if (test$X2[i] == "high"){
    X2[i,1] = 2
  }
  
}

The design is a 2^m. Technically, we should have 2^4 = 32 observational runs. However, we are going to reduce later as well to make it more efficient and resourceful.

With 64 measurements now, here is the fractional factorial design table on Excel as an image. Only a snippet was shown because it will be coded into R as well.

Excel

#FFD_t = FrF2(8, nfactors = 6, res3=T, estimable = formula("~airbag+drive+man.trans+origin+airbag:(drive+man.trans+origin)"))
f1 = FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))

## creating full factorial with 64 runs ...

f1

##     A  B  C  D  E  F
## 1   1  1 -1 -1  1 -1
## 2   1 -1 -1 -1  1  1
## 3  -1 -1  1  1 -1  1
## 4   1 -1  1  1 -1  1
## 5  -1  1 -1 -1  1 -1
## 6   1  1 -1 -1  1  1
## 7  -1 -1 -1  1  1  1
## 8   1 -1 -1  1 -1  1
## 9   1 -1 -1 -1 -1 -1
## 10 -1 -1 -1 -1  1  1
## 11  1 -1  1  1  1  1
## 12  1 -1  1  1  1 -1
## 13  1  1  1 -1  1  1
## 14 -1  1  1  1 -1  1
## 15  1  1  1  1 -1 -1
## 16 -1 -1 -1 -1 -1 -1
## 17  1  1 -1  1 -1 -1
## 18 -1  1  1  1 -1 -1
## 19  1 -1 -1  1  1 -1
## 20 -1 -1  1  1  1 -1
## 21  1  1  1  1 -1  1
## 22  1 -1 -1  1  1  1
## 23 -1  1  1 -1 -1 -1
## 24 -1 -1  1 -1 -1  1
## 25 -1  1  1 -1  1 -1
## 26 -1 -1 -1 -1  1 -1
## 27 -1 -1  1 -1  1  1
## 28 -1 -1 -1 -1 -1  1
## 29  1  1 -1  1 -1  1
## 30 -1 -1 -1  1  1 -1
## 31 -1 -1  1 -1 -1 -1
## 32  1  1  1 -1  1 -1
## 33 -1  1 -1  1  1  1
## 34  1  1 -1  1  1 -1
## 35  1 -1  1 -1 -1 -1
## 36  1  1  1  1  1 -1
## 37  1 -1  1 -1  1 -1
## 38  1  1 -1 -1 -1  1
## 39 -1  1  1  1  1  1
## 40 -1 -1 -1  1 -1 -1
## 41 -1  1  1 -1 -1  1
## 42 -1  1 -1  1  1 -1
## 43  1 -1 -1 -1  1 -1
## 44  1 -1 -1  1 -1 -1
## 45  1 -1  1  1 -1 -1
## 46 -1 -1  1 -1  1 -1
## 47  1 -1 -1 -1 -1  1
## 48 -1  1 -1 -1 -1 -1
## 49 -1  1  1  1  1 -1
## 50  1  1  1 -1 -1  1
## 51 -1  1 -1  1 -1 -1
## 52 -1  1  1 -1  1  1
## 53  1  1  1 -1 -1 -1
## 54  1 -1  1 -1 -1  1
## 55 -1 -1  1  1  1  1
## 56 -1 -1 -1  1 -1  1
## 57 -1  1 -1 -1  1  1
## 58 -1  1 -1 -1 -1  1
## 59 -1  1 -1  1 -1  1
## 60  1  1 -1 -1 -1 -1
## 61  1  1 -1  1  1  1
## 62  1 -1  1 -1  1  1
## 63  1  1  1  1  1  1
## 64 -1 -1  1  1 -1 -1
## class=design, type= full factorial

#EXCEL
ffd = as.data.frame(FrF2(64,6, randomize = FALSE))

## creating full factorial with 64 runs ...

A = c(as.integer(ffd[,1]))
B = c(as.integer(ffd[,2]))
C = c(as.integer(ffd[,3]))
D = c(as.integer(ffd[,4]))
CD = C*D
E = c(as.integer(ffd[,5]))
F = c(as.integer(ffd[,6]))
EF = E*F
excel = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(ffd)
for (i in 1:n){
  if (excel$A[i] == 1){excel$TreatA[i] = "Low"}
  if (excel$A[i] == 2){excel$TreatA[i] = "High"}
  
  if (excel$B[i] == 1){excel$TreatB[i] = "Low"}
  if (excel$B[i] == 2){excel$TreatB[i] = "High"}
  
  if (excel$CD[i] == 1){excel$TreatX1[i] = "Low"}
  if (excel$CD[i] == 2){excel$TreatX1[i] = "Med"}
  if (excel$CD[i] == 4){excel$TreatX1[i] = "High"}
  
  if (excel$EF[i] == 1){excel$TreatX2[i] = "Low"}
  if (excel$EF[i] == 2){excel$TreatX2[i] = "Med"}
  if (excel$EF[i] == 4){excel$TreatX2[i] = "High"}
}
ffd12 = cbind(ffd,excel$TreatA, excel$TreatB, excel$TreatX1, excel$TreatX2)
knitr::kable(ffd12, align = 'c')

A	B	C	D	E	F	excel$TreatA	excel$TreatB	excel$TreatX1	excel$TreatX2
-1	-1	-1	-1	-1	-1	Low	Low	Low	Low
1	-1	-1	-1	-1	-1	High	Low	Low	Low
-1	1	-1	-1	-1	-1	Low	High	Low	Low
1	1	-1	-1	-1	-1	High	High	Low	Low
-1	-1	1	-1	-1	-1	Low	Low	Med	Low
1	-1	1	-1	-1	-1	High	Low	Med	Low
-1	1	1	-1	-1	-1	Low	High	Med	Low
1	1	1	-1	-1	-1	High	High	Med	Low
-1	-1	-1	1	-1	-1	Low	Low	Med	Low
1	-1	-1	1	-1	-1	High	Low	Med	Low
-1	1	-1	1	-1	-1	Low	High	Med	Low
1	1	-1	1	-1	-1	High	High	Med	Low
-1	-1	1	1	-1	-1	Low	Low	High	Low
1	-1	1	1	-1	-1	High	Low	High	Low
-1	1	1	1	-1	-1	Low	High	High	Low
1	1	1	1	-1	-1	High	High	High	Low
-1	-1	-1	-1	1	-1	Low	Low	Low	Med
1	-1	-1	-1	1	-1	High	Low	Low	Med
-1	1	-1	-1	1	-1	Low	High	Low	Med
1	1	-1	-1	1	-1	High	High	Low	Med
-1	-1	1	-1	1	-1	Low	Low	Med	Med
1	-1	1	-1	1	-1	High	Low	Med	Med
-1	1	1	-1	1	-1	Low	High	Med	Med
1	1	1	-1	1	-1	High	High	Med	Med
-1	-1	-1	1	1	-1	Low	Low	Med	Med
1	-1	-1	1	1	-1	High	Low	Med	Med
-1	1	-1	1	1	-1	Low	High	Med	Med
1	1	-1	1	1	-1	High	High	Med	Med
-1	-1	1	1	1	-1	Low	Low	High	Med
1	-1	1	1	1	-1	High	Low	High	Med
-1	1	1	1	1	-1	Low	High	High	Med
1	1	1	1	1	-1	High	High	High	Med
-1	-1	-1	-1	-1	1	Low	Low	Low	Med
1	-1	-1	-1	-1	1	High	Low	Low	Med
-1	1	-1	-1	-1	1	Low	High	Low	Med
1	1	-1	-1	-1	1	High	High	Low	Med
-1	-1	1	-1	-1	1	Low	Low	Med	Med
1	-1	1	-1	-1	1	High	Low	Med	Med
-1	1	1	-1	-1	1	Low	High	Med	Med
1	1	1	-1	-1	1	High	High	Med	Med
-1	-1	-1	1	-1	1	Low	Low	Med	Med
1	-1	-1	1	-1	1	High	Low	Med	Med
-1	1	-1	1	-1	1	Low	High	Med	Med
1	1	-1	1	-1	1	High	High	Med	Med
-1	-1	1	1	-1	1	Low	Low	High	Med
1	-1	1	1	-1	1	High	Low	High	Med
-1	1	1	1	-1	1	Low	High	High	Med
1	1	1	1	-1	1	High	High	High	Med
-1	-1	-1	-1	1	1	Low	Low	Low	High
1	-1	-1	-1	1	1	High	Low	Low	High
-1	1	-1	-1	1	1	Low	High	Low	High
1	1	-1	-1	1	1	High	High	Low	High
-1	-1	1	-1	1	1	Low	Low	Med	High
1	-1	1	-1	1	1	High	Low	Med	High
-1	1	1	-1	1	1	Low	High	Med	High
1	1	1	-1	1	1	High	High	Med	High
-1	-1	-1	1	1	1	Low	Low	Med	High
1	-1	-1	1	1	1	High	Low	Med	High
-1	1	-1	1	1	1	Low	High	Med	High
1	1	-1	1	1	1	High	High	Med	High
-1	-1	1	1	1	1	Low	Low	High	High
1	-1	1	1	1	1	High	Low	High	High
-1	1	1	1	1	1	Low	High	High	High
1	1	1	1	1	1	High	High	High	High

For the project, I also did a temporary 2^m-3 design, where m=6. Based on the calculations, this is a 1/8 fractional factorial design. The design would be utilized to compute the resolution once after matching and collecting the data. Then, using the matched and collected data, I would get the me and ie’s that exist. I will also later assess the aliasing to ensure and determine structure as well in order to cope for the design itself as it is limited and reduced.

Before actually even testing, I was able to predict that the Resolution would be Resolution III. Since it is a 1/8 fraction design, it will be Resolution III. We would also most likley be unable to get higher Resolutions due to the data itself and the parameters.

To construct a temporary 1/8 fractional factorial design, I did the same using FrF2.

#This is the fractional factorial design of 2^6-3
t = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t[,1]))
B = c(as.integer(t[,2]))
C = c(as.integer(t[,3]))
D = c(as.integer(t[,4]))
CD = C*D
E = c(as.integer(t[,5]))
F = c(as.integer(t[,6]))
EF = E*F
ffdp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t)
for (i in 1:n){
  if (ffdp$A[i] == 1){ffdp$TreatA[i] = "Low"}
  if (ffdp$A[i] == 2){ffdp$TreatA[i] = "High"}
  
  if (ffdp$B[i] == 1){ffdp$TreatB[i] = "Low"}
  if (ffdp$B[i] == 2){ffdp$TreatB[i] = "High"}
  
  if (ffdp$CD[i] == 1){ffdp$TreatX1[i] = "Low"}
  if (ffdp$CD[i] == 2){ffdp$TreatX1[i] = "Med"}
  if (ffdp$CD[i] == 4){ffdp$TreatX1[i] = "High"}
  
  if (ffdp$EF[i] == 1){ffdp$TreatX2[i] = "Low"}
  if (ffdp$EF[i] == 2){ffdp$TreatX2[i] = "Med"}
  if (ffdp$EF[i] == 4){ffdp$TreatX2[i] = "High"}
}
ffdMain = cbind(t,ffdp$TreatA,ffdp$TreatB,ffdp$TreatX1,ffdp$TreatX2)
knitr::kable(ffdMain, align = 'c')

A	B	C	D	E	F	ffdp$TreatA	ffdp$TreatB	ffdp$TreatX1	ffdp$TreatX2
-1	-1	-1	1	1	1	Low	Low	Med	High
1	-1	-1	-1	-1	1	High	Low	Low	Med
-1	1	-1	-1	1	-1	Low	High	Low	Med
1	1	-1	1	-1	-1	High	High	Med	Low
-1	-1	1	1	-1	-1	Low	Low	High	Low
1	-1	1	-1	1	-1	High	Low	Med	Med
-1	1	1	-1	-1	1	Low	High	Med	Med
1	1	1	1	1	1	High	High	High	High

As an overview of the design, I first read in the .csv datafile that had the categorical variables. Then, I converted them into integers. For the 2 level factors there was no decomposition, and so the integers were simply 0 and 1 (low and high). For the 3 level factors, I needed to decompose them into two 2 level factors. In other words, I map the 3 level factors to a 2 level factor. However, I needed to accomodate the middle, repeating case. To do so, I repeated the middle case. For the 3 level factors, I decomposed them as 0,1,and 2 (low, med, high). In practice, I implemented the 3 level factor as 2 level factors and the middle combination (med) was repeated as 0/1 and again as 1/0. While searching for the middle combination, I looked for 1/0 or 0/1. The low was 0/0 and the high was 1/1, which would correspond to 2. It was important to realize that I should not search on the 0 and 2 cases, but include the runs that have the equivalent physical value associated with the 1 as well. It is not mapping to 0,2 or 2,0, but more like 1/0 and 0/1 for the med, 0/0 for the low, and 1/1 for high. By having such design and getting the medium combination, I have now treated and mapped the 3 level as 2 level. In the Excel, the A and B are 2 levels, while the X1 and X2 are decomposed as C and D and E and F, respectively.

What is the rationale for this design?

The rationale for this design was limited as I adapted a pre-existing experiment design dataset. The original factors and procedures to collect the data were done by the experimentor. Therefore, I had less flexibility and control over some aspects.

However, at least for the fractional factorial design, I used it to reduce the number of measurements and would use alias to determine the confounding variables and resolution. The aliasing would help check for accuracy. Later, in the future, I would validate this by using the Response Surface Methods. For now, let’s just focus on the factorial design.

Randomize: What is the Randomization Scheme?

First of all, I assume the data collected was random because the cars were chosen randomly and at a random time frame of the year 1993, including day and time.

Also, to ensure further randomization, I randomized the order of the data:

#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata

##                airbag drive man.trans  origin price
## 19        Driver only  Rear       Yes     USA  38.0
## 36        Driver only   4WD       Yes     USA  19.9
## 5         Driver only  Rear       Yes non-USA  30.0
## 18        Driver only  Rear        No     USA  18.8
## 26        Driver only   4WD        No     USA  19.0
## 3         Driver only Front       Yes non-USA  29.1
## 38        Driver only  Rear        No     USA  20.9
## 46               None Front       Yes non-USA  10.0
## 42        Driver only Front       Yes non-USA  12.1
## 89               None Front       Yes non-USA  19.7
## 83               None Front       Yes non-USA   8.6
## 84        Driver only Front       Yes non-USA   9.8
## 85        Driver only Front       Yes non-USA  18.4
## 92        Driver only  Rear       Yes non-USA  22.7
## 39               None Front       Yes non-USA   8.4
## 71        Driver only Front        No     USA  20.7
## 16               None Front        No     USA  16.3
## 90               None Front       Yes non-USA  20.0
## 40        Driver only Front       Yes non-USA  12.5
## 29               None Front       Yes     USA  12.2
## 21 Driver & Passenger Front        No     USA  15.8
## 81               None   4WD       Yes non-USA  10.9
## 6         Driver only Front        No     USA  15.7
## 15               None Front        No     USA  15.9
## 37        Driver only Front        No     USA  20.2
## 24        Driver only Front       Yes     USA  11.3
## 75 Driver & Passenger  Rear       Yes     USA  17.7
## 35        Driver only Front       Yes     USA  14.0
## 61               None  Rear        No     USA  14.9
## 27        Driver only Front        No     USA  15.6
## 65        Driver only Front       Yes non-USA  15.7
## 8         Driver only  Rear        No     USA  23.7
## 54               None Front       Yes non-USA  11.6
## 93 Driver & Passenger Front       Yes non-USA  26.7
## 74               None Front       Yes     USA  11.1
## 47               None Front       Yes non-USA  13.9
## 86        Driver only Front       Yes non-USA  18.2
## 66               None Front        No non-USA  19.1
## 11 Driver & Passenger Front        No     USA  40.1
## 32               None Front       Yes     USA  10.1
## 51 Driver & Passenger Front        No     USA  34.3
## 10        Driver only Front        No     USA  34.7
## 30 Driver & Passenger Front        No     USA  19.3
## 78        Driver only Front       Yes non-USA  28.7
## 68               None Front        No     USA  13.5
## 20 Driver & Passenger Front        No     USA  18.4
## 34        Driver only  Rear       Yes     USA  15.9
## 44               None Front       Yes non-USA   8.0
## 1                None Front       Yes non-USA  15.9
## 82        Driver only   4WD       Yes non-USA  19.5
## 64        Driver only Front       Yes non-USA  11.8
## 13        Driver only Front       Yes     USA  11.4
## 55        Driver only Front       Yes non-USA  16.5
## 45               None Front       Yes non-USA  10.0
## 58        Driver only  Rear       Yes non-USA  31.9
## 56               None   4WD        No non-USA  19.1
## 88               None Front       Yes non-USA   9.1
## 9         Driver only Front        No     USA  26.3
## 52 Driver & Passenger  Rear        No     USA  36.1
## 87        Driver only   4WD       Yes non-USA  22.7
## 2  Driver & Passenger Front       Yes non-USA  33.9
## 22        Driver only Front        No     USA  29.5
## 67        Driver only Front        No non-USA  21.5
## 70               None Front        No     USA  19.5
## 49        Driver only Front       Yes non-USA  28.0
## 59 Driver & Passenger  Rear        No non-USA  61.9
## 79        Driver only Front       Yes     USA  11.1
## 31               None Front       Yes     USA   7.4
## 41 Driver & Passenger Front       Yes non-USA  19.8
## 48        Driver only  Rear        No non-USA  47.9
## 76               None Front       Yes     USA  18.5
## 43 Driver & Passenger Front       Yes non-USA  17.5
## 53               None Front       Yes non-USA   8.3
## 28        Driver only   4WD       Yes     USA  25.8
## 62               None Front       Yes non-USA  10.3
## 33               None Front       Yes     USA  11.3
## 14 Driver & Passenger  Rear       Yes     USA  15.1
## 50 Driver & Passenger  Rear       Yes non-USA  35.2
## 23               None Front       Yes     USA   9.2
## 4  Driver & Passenger Front       Yes non-USA  37.7
## 80               None   4WD       Yes non-USA   8.4
## 72               None   4WD       Yes     USA  14.4
## 25        Driver only Front       Yes     USA  13.3
## 73               None Front       Yes     USA   9.0
## 12               None Front       Yes     USA  13.4
## 63        Driver only Front        No non-USA  26.1
## 7         Driver only Front        No     USA  20.8
## 69        Driver only Front        No     USA  16.3
## 57        Driver only  Rear       Yes non-USA  32.5
## 91               None Front       Yes non-USA  23.3
## 17               None   4WD        No     USA  16.6
## 77 Driver & Passenger Front        No     USA  24.4
## 60        Driver only Front       Yes     USA  14.1

In addition, further randomization would occur below whie matching and subsetting (selection and assignment). In essence, I randomized many times. First, the data itself. Then, I randomized the order in which the experiment would run. Finally, the last randomization component was the random subset groups I would place them after Matching.

Therefore, there were 3 levels of randomization: measurement, assignment, and performing.

Replicate: Are there replicates and/or repeated measures?

Due to the simplification of the dataset, which was during 1993, I cannot verify if there were any replicates and/or repeated measures. At least for the project, I have not implemented such.

Block: Did you use blocking in the design?

There is no indication that blocking was used in the original experiment. Personally, I did not use blocking in my design as well.

The next steps left for the study was as follows:

First design the table
Assign factor levels to low and high
Match the table design of low and high to the actual data set
“Create” a new dataset that corresponds to the table design
Use this “new dataset” to continue the next procedures of me, ie, and ANOVA
Confirm the Resolution.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Below is the descriptive summary of the data.

summary(cars2)

##      airbag          drive         man.trans          origin      
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.000   Median :1.000   Median :1.0000   Median :1.0000  
##  Mean   :1.194   Mean   :1.065   Mean   :0.6559   Mean   :0.5161  
##  3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :2.000   Max.   :2.000   Max.   :1.0000   Max.   :1.0000  
##      price      
##  Min.   : 7.40  
##  1st Qu.:12.20  
##  Median :17.70  
##  Mean   :19.51  
##  3rd Qu.:23.30  
##  Max.   :61.90

head(cars2,10)

##    airbag drive man.trans origin price
## 1       2     1         1      0  15.9
## 2       0     1         1      0  33.9
## 3       1     1         1      0  29.1
## 4       0     1         1      0  37.7
## 5       1     2         1      0  30.0
## 6       1     1         0      1  15.7
## 7       1     1         0      1  20.8
## 8       1     2         0      1  23.7
## 9       1     1         0      1  26.3
## 10      1     1         0      1  34.7

Here are some graphical representations:

#Histogram
hist(cars$price, main = "Histogram of Car Prices")

Boxplots for each factors:

#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")

boxplot(cars$price~cars$drive, main ="Price vs Drive")

boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")

boxplot(cars$price~cars$origin, main ="Price vs Origin")

# Boxplot of overall factors with labeled axes 
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")

Testing

In this experiment, the null hypothesis states that there is no difference between the price of the car because of the airbags, drive, man.trans, and origin.

On the contrary, the alternative hypothesis states that there is a difference between the price of the car because of the airbags, drive, man.trans, and origin.

Fractional Factorial Design

I created the basic fractional factorial design of 6 factors (decomposed) and then the 2^6-3 design.

#Fractional Factorial Design
ffd1a = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
ffd1a

##   airbagA airbagB driveA driveB man.trans origin
## 1      -1       1     -1     -1         1     -1
## 2      -1      -1      1      1        -1     -1
## 3      -1       1      1     -1        -1      1
## 4       1       1      1      1         1      1
## 5       1      -1     -1     -1        -1      1
## 6      -1      -1     -1      1         1      1
## 7       1       1     -1      1        -1     -1
## 8       1      -1      1     -1         1     -1
## class=design, type= FrF2

ffd2a = FrF2(8, nfactors = 4, res3=T, factor.names = c("airbag", "drive", "man.trans", "origin"))
ffd2a

##   airbag drive man.trans origin
## 1     -1    -1         1      1
## 2      1     1        -1     -1
## 3     -1     1         1     -1
## 4     -1    -1        -1     -1
## 5     -1     1        -1      1
## 6      1    -1         1     -1
## 7      1    -1        -1      1
## 8      1     1         1      1
## class=design, type= FrF2

I implemented the same design into R. However, first I did an unrandomized design.

Here is the Excel table I made.

Excel2

Here is the R version.

t1 = as.data.frame(FrF2(8,6, res3=TRUE, randomize = FALSE))
A = c(as.integer(t1[,1]))
B = c(as.integer(t1[,2]))
C = c(as.integer(t1[,3]))
D = c(as.integer(t1[,4]))
CD = C*D
E = c(as.integer(t1[,5]))
F = c(as.integer(t1[,6]))
EF = E*F
temp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t1)
for (i in 1:n){
  if (temp$CD[i] == 1){temp$airbag[i] = "Low"}
  if (temp$CD[i] == 2){temp$airbag[i] = "Med"}
  if (temp$CD[i] == 4){temp$airbag[i] = "High"}
  
  if (temp$EF[i] == 1){temp$drive[i] = "Low"}
  if (temp$EF[i] == 2){temp$drive[i] = "Med"}
  if (temp$EF[i] == 4){temp$drive[i] = "High"}
  
  if (temp$A[i] == 1){temp$man.trans[i] = "no"}
  if (temp$A[i] == 2){temp$man.trans[i] = "yes"}
  
  if (temp$B[i] == 1){temp$origin[i] = "no"}
  if (temp$B[i] == 2){temp$origin[i] = "yes"}
  
}
project = cbind(t1,temp$airbag, temp$drive, temp$man.trans, temp$origin)

knitr::kable(project, align = 'c')

A	B	C	D	E	F	temp$airbag	temp$drive	temp$man.trans	temp$origin
-1	-1	-1	1	1	1	Med	High	no	no
1	-1	-1	-1	-1	1	Low	Med	yes	no
-1	1	-1	-1	1	-1	Low	Med	no	yes
1	1	-1	1	-1	-1	Med	Low	yes	yes
-1	-1	1	1	-1	-1	High	Low	no	no
1	-1	1	-1	1	-1	Med	Med	yes	no
-1	1	1	-1	-1	1	Med	Med	no	yes
1	1	1	1	1	1	High	High	yes	yes

Then, using this, I randomized the order of the runs.

This is essentially the same, except I did randomize=TRUE.

#Randomized
FFD = (FrF2(8,6, res3=TRUE, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t2)
for (i in 1:n){
  if (design$CD[i] == 1){design$airbag[i] <- "Low"}
  if (design$CD[i] == 2){design$airbag[i] <- "Med"}
  if (design$CD[i] == 4){design$airbag[i] <- "High"}
  
  if (design$EF[i] == 1){design$drive[i] <- "Low"}
  if (design$EF[i] == 2){design$drive[i] <- "Med"}
  if (design$EF[i] == 4){design$drive[i] <- "High"}
  
  if (design$A[i] == 1){design$man.trans[i] <- "no"}
  if (design$A[i] == 2){design$man.trans[i] <- "yes"}
  
  if (design$B[i] == 1){design$origin[i] <- "no"}
  if (design$B[i] == 2){design$origin[i] <- "yes"}
  
}
fractionalfactorialdesign = cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)

##   A      B      C      D      E      F     design$airbag design$drive
##  -1:4   -1:4   -1:4   -1:4   -1:4   -1:4   High:2        High:2      
##  1 :4   1 :4   1 :4   1 :4   1 :4   1 :4   Low :2        Low :2      
##                                            Med :4        Med :4      
##  design$man.trans design$origin
##  no :4            no :4        
##  yes:4            yes:4        
##

knitr::kable(fractionalfactorialdesign, align = 'c')

A	B	C	D	E	F	design$airbag	design$drive	design$man.trans	design$origin
-1	1	1	-1	-1	1	Med	Med	no	yes
1	1	-1	1	-1	-1	Med	Low	yes	yes
1	1	1	1	1	1	High	High	yes	yes
-1	1	-1	-1	1	-1	Low	Med	no	yes
1	-1	-1	-1	-1	1	Low	Med	yes	no
-1	-1	-1	1	1	1	Med	High	no	no
1	-1	1	-1	1	-1	Med	Med	yes	no
-1	-1	1	1	-1	-1	High	Low	no	no

The idea behind this design is the matching component. I first generated the dataset and then the fractional factorial design using FrF2, which I used to compare. The design consisted of the decomposition process. The explanation was described before. The matching process was using those designs. Once having the two dataset (the car design and the FrF2 design), I was able to compare them. I would see which ones corresponds. For example, if I had a high, med, low, high in the cars dataset and a high, med, low, high in the FrF2 design, then they matched. The matching process was the selection method to collect the data. There was only 8 chosen because of the design of the 1/8th we chose to do.

FFD_main = FrF2(8, nfactors = 6, res3=T)
FFD_main

##    A  B  C  D  E  F
## 1  1  1 -1  1 -1 -1
## 2 -1 -1 -1  1  1  1
## 3  1 -1  1 -1  1 -1
## 4 -1  1  1 -1 -1  1
## 5 -1 -1  1  1 -1 -1
## 6 -1  1 -1 -1  1 -1
## 7  1 -1 -1 -1 -1  1
## 8  1  1  1  1  1  1
## class=design, type= FrF2

aliasprint(FFD_main)

## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

I looked into the aliasing more later after the me and ie.

Main Effects and Interaction Effects

First, I computed the me and ie without matching. So, this would be before the collecting data.

#me and ie in general and without matching
m1 = lm(price~(airbag+ drive+man.trans+origin)^2,data=cars2)
anova(m1)

## Analysis of Variance Table
## 
## Response: price
##                  Df Sum Sq Mean Sq F value    Pr(>F)    
## airbag            1 2742.6 2742.56 65.7071 4.308e-12 ***
## drive             1  360.4  360.39  8.6343 0.0042829 ** 
## man.trans         1  390.9  390.87  9.3645 0.0029895 ** 
## origin            1  576.9  576.92 13.8219 0.0003663 ***
## airbag:drive      1   99.5   99.52  2.3843 0.1264075    
## airbag:man.trans  1   10.5   10.50  0.2517 0.6172519    
## airbag:origin     1  574.2  574.17 13.7562 0.0003774 ***
## drive:man.trans   1    7.4    7.40  0.1772 0.6748971    
## drive:origin      1  307.8  307.80  7.3743 0.0080660 ** 
## man.trans:origin  1   91.3   91.28  2.1870 0.1430149    
## Residuals        82 3422.6   41.74                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Then, I computed the me and ie after the data was collected. To do so, I made subsets that contained the matched ones in the Excel (or the one above).

The me and 2fi were calculated the same way it was always calculated.

meie = matrix(c(mea, meb,mec, med, mee,mef,ieab, ieac, iead, ieae, ieaf,iebc,iebd,iebe,iebf,iecd,iecf,iede,iedf,ieef), ncol=1)
meie

##          [,1]
##  [1,]  -9.100
##  [2,]   0.450
##  [3,]   1.700
##  [4,]   5.250
##  [5,]   2.775
##  [6,]  12.550
##  [7,]  -4.750
##  [8,]   2.950
##  [9,]   7.600
## [10,]   2.950
## [11,] -11.450
## [12,]   4.450
## [13,]  -4.750
## [14,]   4.450
## [15,]   9.100
## [16,]  14.300
## [17,]   9.100
## [18,]  -4.750
## [19,]  -0.100
## [20,]  -9.950

NOTE: IMPORTANT When I ran it the first time, this is the meie generated. The RMarkdown was generated using my second computer. Therefore, meie values may differ. Please refer to the meie table below:

The response variable was added into the design as well.

#Read price matched:
cars3 = read.csv("cars3.csv", header = TRUE, sep=",")

des= add.response(FFD, cars3$price)
MEPlot(des, abbrev = 5, cex.xax = 1.6, cex.main = 2)

IAPlot(des, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5, main="Interaction Plot")

#Showing it in perspective 
FFD2 = FrF2(8, nfactors = 6, res3= TRUE, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
FFD2

##   airbagA airbagB driveA driveB man.trans origin
## 1       1       1      1      1         1      1
## 2      -1      -1      1      1        -1     -1
## 3       1       1     -1      1        -1     -1
## 4      -1       1      1     -1        -1      1
## 5       1      -1     -1     -1        -1      1
## 6       1      -1      1     -1         1     -1
## 7      -1       1     -1     -1         1     -1
## 8      -1      -1     -1      1         1      1
## class=design, type= FrF2

summary(FFD2)

## Call:
## FrF2(8, nfactors = 6, res3 = TRUE, factor.names = c("airbagA", 
##     "airbagB", "driveA", "driveB", "man.trans", "origin"))
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   airbagA airbagB driveA driveB man.trans origin
## 1      -1      -1     -1     -1        -1     -1
## 2       1       1      1      1         1      1
## 
## Design generating information:
## $legend
## [1] A=airbagA   B=airbagB   C=driveA    D=driveB    E=man.trans F=origin   
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   airbagA airbagB driveA driveB man.trans origin
## 1       1       1      1      1         1      1
## 2      -1      -1      1      1        -1     -1
## 3       1       1     -1      1        -1     -1
## 4      -1       1      1     -1        -1      1
## 5       1      -1     -1     -1        -1      1
## 6       1      -1      1     -1         1     -1
## 7      -1       1     -1     -1         1     -1
## 8      -1      -1     -1      1         1      1
## class=design, type= FrF2

aliasprint(FFD2)

## $legend
## [1] A=airbagA   B=airbagB   C=driveA    D=driveB    E=man.trans F=origin   
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

According to the meie table and the aliasprint of the FFD2, I could see that the model was highly aliased. As expected the Resolution was Resolution III because it was an 1/8th 2^6-3 design. The definition of Resolution III is “No main effect is aliased with any other main effect (me). Main effects are aliased with two factor interactions (2fi)”. The definition matched to what really happened because the me were not aliased with any other me, but with only the 2fis. The fi’s should be looked upon with caution because of the confounding variable and not many factors can be looked at indepdently. The me’s seem to be the only ones that look reliable and accurate. Therefore, to create my linear model, I decided to factor in both the me’s only and the ANOVA.

Constructing the Linear Model

To create the linear model, I first check the ANOVA results for all the factors togther. Then, I came up with the factors to select based off this and the me’s.

fit = aov(price~airbag + drive + man.trans + origin + price:airbag + price:drive + price:man.trans + price:origin, data = cars2)
summary(fit)

##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## airbag           1 2742.6  2742.6  643.69  < 2e-16 ***
## drive            1  360.4   360.4   84.58 2.38e-14 ***
## man.trans        1  390.9   390.9   91.74 4.08e-15 ***
## origin           1  576.9   576.9  135.40  < 2e-16 ***
## price:airbag     1 1655.5  1655.5  388.56  < 2e-16 ***
## price:drive      1 2351.3  2351.3  551.86  < 2e-16 ***
## price:man.trans  1   85.2    85.2   20.00 2.41e-05 ***
## price:origin     1   63.3    63.3   14.87 0.000225 ***
## Residuals       84  357.9     4.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As you can see, there was significance for all the factors. This made sense because the price of cars, especially back then, would matter a lot depending if the car was manual transmission or not or had 4WD. Even today, 4WD drive cars tend to be more expensive than a front or rear drive car. Also, origin is another factor that makes sense because ‘Made in the USA’ is definitely more expensive than a foreign part due to many factors such as labor costs and production/manufacturing costs. Therefore, ‘Made in the USA’ parts would increase the car’s price. Airbags costs should also be included in the car. Making airbags are not free. There is a cost to it, therefore increasing the price of the car. Using good judgement and the meie table, I knew not to consider the 2fi because they were highly aliased. Then, I wanted to differentiate the “more significant ones”. To do so, the best logical way was to refer to the meie table again. Airbags had a relatively small me of -2. Drive, man.trans, and origin had a bigger me of -11.3 and -11.65. Therefore, I decided to include drive, man.trans, and origin into the linear model. Intuitively and using prior knowledge this made sense. Drive is based on the car’s engine and bottom layout design, requiring more time, hence more costs. So, the results made sense.

Therefore, I generated another ANOVA using those factors.

fit2 = aov(price~man.trans +drive + origin, data = cars2)
summary(fit2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## man.trans    1    915   915.1  13.382 0.000430 ***
## drive        1    970   969.9  14.184 0.000297 ***
## origin       1    613   613.1   8.966 0.003561 ** 
## Residuals   89   6086    68.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results from the ANOVA, my Best Guess experiment, and the me all matched together and were consistent. The ANOVA factors that were significant had a high me. This also matched with my prior background knowledge of cars in general.

The linear model is as follows:

fit.lm = lm(price~ drive+man.trans + origin, data = cars2)
summary(fit.lm)

## 
## Call:
## lm.default(formula = price ~ drive + man.trans + origin, data = cars2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.283  -5.997  -2.005   4.731  28.028 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   21.509      2.813   7.645 2.31e-11 ***
## drive          6.182      1.640   3.770 0.000293 ***
## man.trans     -8.604      2.004  -4.292 4.49e-05 ***
## origin        -5.690      1.900  -2.994 0.003561 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.269 on 89 degrees of freedom
## Multiple R-squared:  0.291,  Adjusted R-squared:  0.2671 
## F-statistic: 12.18 on 3 and 89 DF,  p-value: 9.477e-07

Equation

Estimation

confint(fit2)

##                  2.5 %    97.5 %
## (Intercept)  15.918817 27.099343
## man.trans   -12.586332 -4.621029
## drive         2.923599  9.439590
## origin       -9.464954 -1.914121

coef(fit2)

## (Intercept)   man.trans       drive      origin 
##   21.509080   -8.603681    6.181595   -5.689537

Diagnostics / Model Adequacy Checking

To check if the normality assumption holds, I used the qqnorm and qqlines.

#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))

#Residual Plot
plot(fitted(fit2),residuals(fit2))

Based on the qqnorm and qqline, the qqnorm line looks fairly linear and closely fits the qqline drawn. Also, the residual plot showed an evenly distributed residual as a function of fitted values. Also, there was a good scatter and not close to 0. The pattern seemed random and evenly distributed, two important conditions that satisfy the normality check. Therefore, the normality assumption was satisfied because the residuals were evenly distributed and the qqnorm line is linear enough. The normality check passed. There is no need to do transformation because the normality assumption was passed.

Summary: Overall, the experiment finding was an interesting one. The null hypothesis was rejected because the price of cars depended on the factors, like drive, man.trans, and origin. The fractional factorial design was implemented and I looked into the me and ie as well. The aliasing structure was examined and the Resolution was Resolution III. Resolution III was the highest resolution possible because of the parameters and data. To get a higher Resolution, I would need a new dataset and have more runs. For Resolution IV, I need 16 runs (2^4) and for Resolution VI I need 32 runs. For now, Resolution III would be satisfying. In general, the drive, man.trans, and origin had a great impact, while the 2fi were not reliable. The results make sense intuitvely and meets the expectations I had.

For the findings, it was interesting to explore the design procedure.

The analysis made sense and the fractional factorial design was well executed.

4. References to the literature

1.https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html

2.https://onlinecourses.science.psu.edu/stat503/book/export/html/48

5. Appendices

#Benjamin Byeon
#11/24/16
#Section 1

#Load the package
library("FrF2")

#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars

#System under test
summary(cars)
head(cars)
tail(cars)
names(cars)

#Display the structure
names(cars)
str(cars)

#Check the levels
levels(cars$price)
levels(cars$airbag)
levels(cars$drive)
levels(cars$man.trans)
levels(cars$origin)

#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
kable(X1,align='c')

E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(C,D,X2))
kable(X2, align ='c')

#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata

#Display the descriptive statistics 
summary(cars)

#Histogram
hist(cars$price, main = "Histogram of Car Prices")

#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")
boxplot(cars$price~cars$drive, main ="Price vs Drive")
boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")
boxplot(cars$price~cars$origin, main ="Price vs Origin")

#FFD
#FFD
#![Effect Size](figure2.png)

#FFD_t = FrF2(8, nfactors = 6, res3=T, estimable = formula("~airbag+drive+man.trans+origin+airbag:(drive+man.trans+origin)"))
f1 = FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))
f1 

#EXCEL
ffd <- as.data.frame(FrF2(64,6, randomize = FALSE))
A <- c(as.integer(ffd[,1]))
B <- c(as.integer(ffd[,2]))
C <- c(as.integer(ffd[,3]))
D <- c(as.integer(ffd[,4]))
CD <- C*D
E <- c(as.integer(ffd[,5]))
F <- c(as.integer(ffd[,6]))
EF <- E*F
excel <- as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(ffd)
for (i in 1:n){
  if (excel$A[i] == 1){excel$TreatA[i] <- "Low"}
  if (excel$A[i] == 2){excel$TreatA[i] <- "High"}
  
  if (excel$B[i] == 1){excel$TreatB[i] <- "Low"}
  if (excel$B[i] == 2){excel$TreatB[i] <- "High"}
  
  if (excel$CD[i] == 1){excel$TreatX1[i] <- "Low"}
  if (excel$CD[i] == 2){excel$TreatX1[i] <- "Med"}
  if (excel$CD[i] == 4){excel$TreatX1[i] <- "High"}
  
  if (excel$EF[i] == 1){excel$TreatX2[i] <- "Low"}
  if (excel$EF[i] == 2){excel$TreatX2[i] <- "Med"}
  if (excel$EF[i] == 4){excel$TreatX2[i] <- "High"}
}
ffd12 = cbind(ffd,excel$TreatA, excel$TreatB, excel$TreatX1, excel$TreatX2)
kable(ffd12, align = 'c')

#This is the fractional factorial design of 2^6-3
t = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t[,1]))
B = c(as.integer(t[,2]))
C = c(as.integer(t[,3]))
D = c(as.integer(t[,4]))
CD = C*D
E = c(as.integer(t[,5]))
F = c(as.integer(t[,6]))
EF = E*F
ffdp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t)
for (i in 1:n){
  if (ffdp$A[i] == 1){ffdp$TreatA[i] <- "Low"}
  if (ffdp$A[i] == 2){ffdp$TreatA[i] <- "High"}
  
  if (ffdp$B[i] == 1){ffdp$TreatB[i] <- "Low"}
  if (ffdp$B[i] == 2){ffdp$TreatB[i] <- "High"}
  
  if (ffdp$CD[i] == 1){ffdp$TreatX1[i] <- "Low"}
  if (ffdp$CD[i] == 2){ffdp$TreatX1[i] <- "Med"}
  if (ffdp$CD[i] == 4){ffdp$TreatX1[i] <- "High"}
  
  if (ffdp$EF[i] == 1){ffdp$TreatX2[i] <- "Low"}
  if (ffdp$EF[i] == 2){ffdp$TreatX2[i] <- "Med"}
  if (ffdp$EF[i] == 4){ffdp$TreatX2[i] <- "High"}
}
ffdMain = cbind(t,ffdp$TreatA,ffdp$TreatB,ffdp$TreatX1,ffdp$TreatX2)
kable(Fractional_Factorial, align = 'c')

#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)

#Automatically replace the categories as 0,1,2 
for (i in 1:a){
  #For airbags
  if (cars$airbag[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (cars$airbag[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (cars$airbag[i] == "None"){
    airbag[i,1] = 2
  }
  
  #For drive
  if (cars$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (cars$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (cars$drive[i] == "Rear"){
    drive[i,1] = 2
  }
  

  #For man.trans
  if (cars$man.trans[i] == "No"){
    man.trans[i,1] = 0
  } else{
    man.trans[i,1] = 1
  }
  
  #For origin
  if (cars$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
}


#Get match corresponding
for (i in 1:a){
  #For A
  if (test$A[i] == "low"){
    A[i,1] = 0
  } else{
    A[i,1] = 1
  }
  
  #For B
  if (test$B[i] == "low"){
    B[i,1] = 0
  } else{
    B[i,1] = 1
  }
  
  #For X1
  if (test$X1[i] =="low"){
    X1[i,1] = 0
  }
  if (test$X1[i] == "med"){
    X1[i,1] = 1
  }
  if (test$X1[i] == "high"){
    X1[i,1] = 2
  }
  
  #For X2
  if (test$X2[i] =="low"){
    X2[i,1] = 0
  }
  if (test$X2[i] == "med"){
    X2[i,1] = 1
  }
  if (test$X2[i] == "high"){
    X2[i,1] = 2
  }
  
}

#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")

head(cars2,10)
tail(cars2, 20)
str(cars2)

# Boxplot of overall factors with labeled axes 
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")

#Fractional Factorial Design
ffd1a = FrF2(8, nfactors = 4, res3=T, factor.names = c("airbag", "drive", "man.trans", "origin"))
ffd1a

ffd2a = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
ffd2a

t1 = as.data.frame(FrF2(8,6, randomize = FALSE))
A = c(as.integer(t1[,1]))
B = c(as.integer(t1[,2]))
C = c(as.integer(t1[,3]))
D = c(as.integer(t1[,4]))
CD = C*D
E = c(as.integer(t1[,5]))
F = c(as.integer(t1[,6]))
EF = E*F
temp = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t1)
for (i in 1:n){
  if (temp$CD[i] == 1){temp$airbag[i] <- "Low"}
  if (temp$CD[i] == 2){temp$airbag[i] <- "Med"}
  if (temp$CD[i] == 4){temp$airbag[i] <- "High"}
  
  if (temp$EF[i] == 1){temp$drive[i] <- "Low"}
  if (temp$EF[i] == 2){temp$drive[i] <- "Med"}
  if (temp$EF[i] == 4){temp$drive[i] <- "High"}
  
  if (temp$A[i] == 1){temp$man.trans[i] <- "no"}
  if (temp$A[i] == 2){temp$man.trans[i] <- "yes"}
  
  if (temp$B[i] == 1){temp$origin[i] <- "no"}
  if (temp$B[i] == 2){temp$origin[i] <- "yes"}
  
}
project <- cbind(t1,temp$airbag, temp$drive, temp$man.trans, temp$origin)
kable(project, align = 'c')

#Randomized
FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t2)
for (i in 1:n){
  if (design$CD[i] == 1){design$airbag[i] <- "Low"}
  if (design$CD[i] == 2){design$airbag[i] <- "Med"}
  if (design$CD[i] == 4){design$airbag[i] <- "High"}
  
  if (design$EF[i] == 1){design$drive[i] <- "Low"}
  if (design$EF[i] == 2){design$drive[i] <- "Med"}
  if (design$EF[i] == 4){design$drive[i] <- "High"}
  
  if (design$A[i] == 1){design$man.trans[i] <- "no"}
  if (design$A[i] == 2){design$man.trans[i] <- "yes"}
  
  if (design$B[i] == 1){design$origin[i] <- "no"}
  if (design$B[i] == 2){design$origin[i] <- "yes"}
  
}

fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
kable(fractionalfactorialdesign, align = 'c')

FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)

#Showing it in perspective 
FFD2 = FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'man.trans', 'origin'))
FFD2
summary(FFD2)
aliasprint(FFD2)

#me and ie in general and without matching
m1 = lm(price~(airbag+ drive+man.trans+origin)^2,data=cars2)
anova(m1)

# Subset creation for factorial design pulled randomly from the table
seta = subset(cars2, airbag == "1" & drive == "1" & man.trans == "1" & origin == "1")
setb = subset(cars2, airbag == "1" & drive == "2" & man.trans == "1" & origin == "0")
setc = subset(cars2, airbag == "1" & drive == "2" & man.trans == "0" & origin == "0")
setd = subset(cars2, airbag == "1" & drive == "1" & man.trans == "0" & origin == "1")
sete = subset(cars2,airbag == "1" & drive == "1" & man.trans == "1" & origin == "0")
setf = subset(cars2,airbag == "2" & drive == "1" & man.trans == "0" & origin == "0")
setg = subset(cars2,airbag == "2" & drive == "0" & man.trans == "1" & origin == "1")
seth = subset(cars2,airbag == "1" & drive == "0" & man.trans == "1" & origin == "0")
 

#Function to get a sample of row to use later for me
mec = function (cars){
  a = sample(nrow(cars))
  b = a[1]
  
  return(cars$price[b])
}

# Use  functions to get group samples
meana = mec(seta)
meanb = mec(setb)
meanc = mec(setc)
meand = mec(setd)
meane = mec(sete)
meanf = mec(setf)
meang = mec(setg)
meanh = mec(seth)


#me and ie
mea = 0.25* ((meane[1]+meang[1]+meanh[1]+meand[1])-(meanc[1]+meanf[1]+meanb[1]+meana[1]))

meb = 0.25* ((meang[1]+meand[1]+meanf[1]+meanb[1])-(meanc[1]+meane[1]+meana[1]+meanh[1]))

mec = 0.25* ((meanb[1]+meana[1]+meanh[1]+meand[1])-(meanc[1]+meane[1]+meanf[1]+meang[1]))

med = 0.25* ((meanc[1]+meana[1]+meang[1]+meand[1])-(meanf[1]+meanh[1]+meanb[1]+meane[1]))

mee = 0.25* ((meanc[1]+meanf[1]+meanh[1]+meand[1])-(meana[1]+meanb[1]+meanc[1]+meang[1]))

mef = 0.25* ((meanc[1]+meanb[1]+meane[1]+meand[1])-(meana[1]+meanf[1]+meang[1]+meanh[1]))

ieab = 0.25 * ((meana[1] + meanb[1] + meand[1] + meane[1] - (meanf[1] + meanc[1] + meang[1] + meanh[1])))
ieac = 0.25 * ((meana[1] + meanc[1] + meand[1] + meane[1] - (meanf[1] + meanb[1] + meang[1] + meanh[1])))
iead = 0.25 * ((meana[1] + meand[1] + meanc[1] + meanf[1] - (meane[1] + meanb[1] + meang[1] + meanh[1])))
ieae = 0.25 * ((meana[1] + meand[1] + meanc[1] + meane[1] - (meanf[1] + meanb[1] + meang[1] + meanh[1])))
ieaf = 0.25 * ((meanf[1] + meana[1] + meand[1] + meane[1] - (meanb[1] + meanc[1] + meang[1] + meanh[1])))
iebc = 0.25 * ((meanb[1] + meanc[1] + meana[1] + meane[1] - (meanf[1] + meand[1] + meang[1] + meanh[1])))
iebd = 0.25 * ((meanb[1] + meand[1] + meana[1] + meane[1] - (meanf[1] + meanc[1] + meang[1] + meanh[1])))
iebe = 0.25 * ((meanb[1] + meane[1] + meana[1] + meanc[1] - (meanf[1] + meand[1] + meang[1] + meanh[1])))
iebf = 0.25 * ((meanb[1] + meanf[1] + meana[1] + meanc[1] - (meane[1] + meand[1] + meang[1] + meanh[1])))
iecd = 0.25 * ((meanc[1] + meand[1] + meana[1] + meanb[1] - (meane[1] + meanf[1] + meang[1] + meanh[1])))
iece = 0.25 * ((meanc[1] + meane[1] + meana[1] + meanb[1] - (meand[1] + meanf[1] + meang[1] + meanh[1])))
iecf = 0.25 * ((meanc[1] + meanf[1] + meana[1] + meanb[1] - (meand[1] + meane[1] + meang[1] + meanh[1])))
iede = 0.25 * ((meand[1] + meane[1] + meana[1] + meanb[1] - (meanc[1] + meanf[1] + meang[1] + meanh[1])))
iedf = 0.25 * ((meand[1] + meanf[1] + meana[1] + meanb[1] - (meanc[1] + meane[1] + meang[1] + meanh[1])))
ieef = 0.25 * ((meane[1] + meanf[1] + meana[1] + meanb[1] - (meanc[1] + meand[1] + meang[1] + meanh[1])))

meie = matrix(c(me1, me2,me3, me4, me5,me6,ieab, ieac, iead, ieae, ieaf,iebc,iebd,iebe,iebf,iecd,iecf,iede,iedf,ieef), ncol=1)
meie

#Read price matched:
cars3 = read.csv("cars3.csv", header = TRUE, sep=",")

des= add.response(FFD, cars3$price)
MEPlot(des, abbrev = 5, cex.xax = 1.6, cex.main = 2)
IAPlot(des, abbrev = 5, show.alias = TRUE, lwd = 2, cex = 2, cex.xax = 1.2, cex.lab = 1.5, main="Interaction Plot")

fit = aov(price~airbag + drive + man.trans + origin + price:airbag + price:drive + price:man.trans + price:origin, data = cars2)
summary(fit)

fit2 <- aov(price~drive + man.trans + origin, data = cars2)# ANOVA for final model
summary(fit2)
confint(fit2)
coef(fit2)

#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))

#Residual Plot
plot(fitted(fit2),residuals(fit2))

Car Prices

Benjamin Byeon

December 10, 2016

DOE Project3

Benjamin Byeon

Rensselaer Polytechnic Institute - Troy, NY

V1.12.10.16

1. Setting

System under test

Factors and Levels

Continuous Variables (if any)

Response Variable(s)

The Data: How is it organized and what does it look like?

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

What is the rationale for this design?

Randomize: What is the Randomization Scheme?

Replicate: Are there replicates and/or repeated measures?

Block: Did you use blocking in the design?

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Testing

Fractional Factorial Design

Main Effects and Interaction Effects

Constructing the Linear Model

Estimation

Diagnostics / Model Adequacy Checking

4. References to the literature

5. Appendices