DOE Project4

Benjamin Byeon

Rensselaer Polytechnic Institute - Troy, NY

V1.12.16.16

1. Setting

From the Ecdat Package, I selected a dataset that contained N=93 observations. The data looks into different factors that affect the pricing of cars. The study contains one response variable and two 2 level factors and two 3 level factors, which would be decomposed into 2 two level factors. Further analysis were made regarding the study.

The study is an extension and continuation of Project3. I looked at the Fractional Factorial Design, and now using that I apply Response Surface Methods.

The data was copied and pasted into Excel, and then saved as a .csv. Shown below is the loaded data into R.

#Load the package
library("FrF2")

## Warning: package 'FrF2' was built under R version 3.3.2

## Loading required package: DoE.base

## Warning: package 'DoE.base' was built under R version 3.3.2

## Loading required package: grid

## Loading required package: conf.design

## Warning: package 'conf.design' was built under R version 3.3.2

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

## The following object is masked from 'package:base':
## 
##     lengths

library("rsm")

## Warning: package 'rsm' was built under R version 3.3.2

library("rgl")

## Warning: package 'rgl' was built under R version 3.3.2

#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars

##                airbag drive man.trans  origin price
## 1                None Front       Yes non-USA  15.9
## 2  Driver & Passenger Front       Yes non-USA  33.9
## 3         Driver only Front       Yes non-USA  29.1
## 4  Driver & Passenger Front       Yes non-USA  37.7
## 5         Driver only  Rear       Yes non-USA  30.0
## 6         Driver only Front        No     USA  15.7
## 7         Driver only Front        No     USA  20.8
## 8         Driver only  Rear        No     USA  23.7
## 9         Driver only Front        No     USA  26.3
## 10        Driver only Front        No     USA  34.7
## 11 Driver & Passenger Front        No     USA  40.1
## 12               None Front       Yes     USA  13.4
## 13        Driver only Front       Yes     USA  11.4
## 14 Driver & Passenger  Rear       Yes     USA  15.1
## 15               None Front        No     USA  15.9
## 16               None Front        No     USA  16.3
## 17               None   4WD        No     USA  16.6
## 18        Driver only  Rear        No     USA  18.8
## 19        Driver only  Rear       Yes     USA  38.0
## 20 Driver & Passenger Front        No     USA  18.4
## 21 Driver & Passenger Front        No     USA  15.8
## 22        Driver only Front        No     USA  29.5
## 23               None Front       Yes     USA   9.2
## 24        Driver only Front       Yes     USA  11.3
## 25        Driver only Front       Yes     USA  13.3
## 26        Driver only   4WD        No     USA  19.0
## 27        Driver only Front        No     USA  15.6
## 28        Driver only   4WD       Yes     USA  25.8
## 29               None Front       Yes     USA  12.2
## 30 Driver & Passenger Front        No     USA  19.3
## 31               None Front       Yes     USA   7.4
## 32               None Front       Yes     USA  10.1
## 33               None Front       Yes     USA  11.3
## 34        Driver only  Rear       Yes     USA  15.9
## 35        Driver only Front       Yes     USA  14.0
## 36        Driver only   4WD       Yes     USA  19.9
## 37        Driver only Front        No     USA  20.2
## 38        Driver only  Rear        No     USA  20.9
## 39               None Front       Yes non-USA   8.4
## 40        Driver only Front       Yes non-USA  12.5
## 41 Driver & Passenger Front       Yes non-USA  19.8
## 42        Driver only Front       Yes non-USA  12.1
## 43 Driver & Passenger Front       Yes non-USA  17.5
## 44               None Front       Yes non-USA   8.0
## 45               None Front       Yes non-USA  10.0
## 46               None Front       Yes non-USA  10.0
## 47               None Front       Yes non-USA  13.9
## 48        Driver only  Rear        No non-USA  47.9
## 49        Driver only Front       Yes non-USA  28.0
## 50 Driver & Passenger  Rear       Yes non-USA  35.2
## 51 Driver & Passenger Front        No     USA  34.3
## 52 Driver & Passenger  Rear        No     USA  36.1
## 53               None Front       Yes non-USA   8.3
## 54               None Front       Yes non-USA  11.6
## 55        Driver only Front       Yes non-USA  16.5
## 56               None   4WD        No non-USA  19.1
## 57        Driver only  Rear       Yes non-USA  32.5
## 58        Driver only  Rear       Yes non-USA  31.9
## 59 Driver & Passenger  Rear        No non-USA  61.9
## 60        Driver only Front       Yes     USA  14.1
## 61               None  Rear        No     USA  14.9
## 62               None Front       Yes non-USA  10.3
## 63        Driver only Front        No non-USA  26.1
## 64        Driver only Front       Yes non-USA  11.8
## 65        Driver only Front       Yes non-USA  15.7
## 66               None Front        No non-USA  19.1
## 67        Driver only Front        No non-USA  21.5
## 68               None Front        No     USA  13.5
## 69        Driver only Front        No     USA  16.3
## 70               None Front        No     USA  19.5
## 71        Driver only Front        No     USA  20.7
## 72               None   4WD       Yes     USA  14.4
## 73               None Front       Yes     USA   9.0
## 74               None Front       Yes     USA  11.1
## 75 Driver & Passenger  Rear       Yes     USA  17.7
## 76               None Front       Yes     USA  18.5
## 77 Driver & Passenger Front        No     USA  24.4
## 78        Driver only Front       Yes non-USA  28.7
## 79        Driver only Front       Yes     USA  11.1
## 80               None   4WD       Yes non-USA   8.4
## 81               None   4WD       Yes non-USA  10.9
## 82        Driver only   4WD       Yes non-USA  19.5
## 83               None Front       Yes non-USA   8.6
## 84        Driver only Front       Yes non-USA   9.8
## 85        Driver only Front       Yes non-USA  18.4
## 86        Driver only Front       Yes non-USA  18.2
## 87        Driver only   4WD       Yes non-USA  22.7
## 88               None Front       Yes non-USA   9.1
## 89               None Front       Yes non-USA  19.7
## 90               None Front       Yes non-USA  20.0
## 91               None Front       Yes non-USA  23.3
## 92        Driver only  Rear       Yes non-USA  22.7
## 93 Driver & Passenger Front       Yes non-USA  26.7

System under test

The dataset helps determine the what effects a price of a car in 1993, with factors the presence of airbags, driving train type, avaiability of manual transmission, and the origin of the machine parts. The factors were chosen in order to match the Fractional Factorial Design criteria. Within the choices, Best Guess was used to select specific factors due to my personal interests and experience with knowledge of cars.

Below are the definitions of the variables:

price: minimum price of the cars

airbag: air bags: none, driver only, or driver and passenger

drive: rear wheel, front wheel, or 4WD

man.trans: manual transmission: yes or no

origin: product origin made in the USA: yes or no

#Display the descriptive summaery 
summary(cars)

##                 airbag     drive    man.trans     origin       price      
##  Driver & Passenger:16   4WD  :10   No :32    non-USA:45   Min.   : 7.40  
##  Driver only       :43   Front:67   Yes:61    USA    :48   1st Qu.:12.20  
##  None              :34   Rear :16                          Median :17.70  
##                                                            Mean   :19.51  
##                                                            3rd Qu.:23.30  
##                                                            Max.   :61.90

head(cars)

##               airbag drive man.trans  origin price
## 1               None Front       Yes non-USA  15.9
## 2 Driver & Passenger Front       Yes non-USA  33.9
## 3        Driver only Front       Yes non-USA  29.1
## 4 Driver & Passenger Front       Yes non-USA  37.7
## 5        Driver only  Rear       Yes non-USA  30.0
## 6        Driver only Front        No     USA  15.7

tail(cars)

##                airbag drive man.trans  origin price
## 88               None Front       Yes non-USA   9.1
## 89               None Front       Yes non-USA  19.7
## 90               None Front       Yes non-USA  20.0
## 91               None Front       Yes non-USA  23.3
## 92        Driver only  Rear       Yes non-USA  22.7
## 93 Driver & Passenger Front       Yes non-USA  26.7

names(cars)

## [1] "airbag"    "drive"     "man.trans" "origin"    "price"

#Display the structure
names(cars)

## [1] "airbag"    "drive"     "man.trans" "origin"    "price"

str(cars)

## 'data.frame':    93 obs. of  5 variables:
##  $ airbag   : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
##  $ drive    : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
##  $ man.trans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ origin   : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
##  $ price    : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

Factors and Levels

The data contains 5 factors, two 2 level factors and two 3 level factors. The as.factors was not necessary because they were categorical.

#Check the levels
levels(cars$price)

## NULL

levels(cars$airbag)

## [1] "Driver & Passenger" "Driver only"        "None"

levels(cars$drive)

## [1] "4WD"   "Front" "Rear"

levels(cars$man.trans)

## [1] "No"  "Yes"

levels(cars$origin)

## [1] "non-USA" "USA"

The two 2 level factors:

man.trans: Does the car have manual transmission yes or no?

origin: Are product parts of the car made from origin in the USA yes or no?

The two 3 level factors:

airbag: Do the cars have air bags none, driver only, or driver and passenger?

drive: Is the car rear wheel, front wheel, or 4WD?

Continuous Variables (if any)

The data had one continuous variable that was also the dependent variable. It was the price. The price was in USD. Note, the may not reflect the USD worth today because the data was collected in 1993.

Response Variable(s)

The main objective of the experiment was to determine the effects of car’s price Therefore, the price variable also represents the response variable as well.

The Data: How is it organized and what does it look like?

The data consists of N=93 obversations with 6 variables. The data was collected in the year 1993 and the country the data came from was the USA.

In order to satisfy the factorial design with levels of high and low, I needed to make changes to the data. I replaced the factors with binary integer terms of 0,1,and 2.

#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)

#Automatically replace the categories as 0,1,2 
for (i in 1:a){
  #For airbags
  if (cars$airbag[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (cars$airbag[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (cars$airbag[i] == "None"){
    airbag[i,1] = 2
  }
  
  #For drive
  if (cars$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (cars$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (cars$drive[i] == "Rear"){
    drive[i,1] = 2
  }
  

  #For man.trans
  if (cars$man.trans[i] == "No"){
    man.trans[i,1] = 0
  } else{
    man.trans[i,1] = 1
  }
  
  #For origin
  if (cars$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
}

Here is the new breakdown of the levels:

airbag: driver and passenger = 0, driver only = 1, none = 2

drive: 4WD = 0, front = 1, rear = 2

man.trans: no = 0, yes = 1

origin: no = 0, yes = 1

This was done to be suitable to the FFD and rsm designs.

Here is the view of the new organization:

#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")

head(cars2,10)

##    airbag drive man.trans origin price
## 1       2     1         1      0  15.9
## 2       0     1         1      0  33.9
## 3       1     1         1      0  29.1
## 4       0     1         1      0  37.7
## 5       1     2         1      0  30.0
## 6       1     1         0      1  15.7
## 7       1     1         0      1  20.8
## 8       1     2         0      1  23.7
## 9       1     1         0      1  26.3
## 10      1     1         0      1  34.7

tail(cars2, 20)

##    airbag drive man.trans origin price
## 74      2     1         1      1  11.1
## 75      0     2         1      1  17.7
## 76      2     1         1      1  18.5
## 77      0     1         0      1  24.4
## 78      1     1         1      0  28.7
## 79      1     1         1      1  11.1
## 80      2     0         1      0   8.4
## 81      2     0         1      0  10.9
## 82      1     0         1      0  19.5
## 83      2     1         1      0   8.6
## 84      1     1         1      0   9.8
## 85      1     1         1      0  18.4
## 86      1     1         1      0  18.2
## 87      1     0         1      0  22.7
## 88      2     1         1      0   9.1
## 89      2     1         1      0  19.7
## 90      2     1         1      0  20.0
## 91      2     1         1      0  23.3
## 92      1     2         1      0  22.7
## 93      0     1         1      0  26.7

str(cars2)

## 'data.frame':    93 obs. of  5 variables:
##  $ airbag   : num  2 0 1 0 1 1 1 1 1 1 ...
##  $ drive    : num  1 1 1 1 2 1 1 2 1 1 ...
##  $ man.trans: num  1 1 1 1 1 0 0 0 0 0 ...
##  $ origin   : num  0 0 0 0 0 1 1 1 1 1 ...
##  $ price    : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

2. (Experimental) Design

The experimental design for the fractional factorial design had many procedurals steps that needed to taken.

A fractional factorial design helps reduce the calculations needed.

How will the experiment be organized and conducted to test the hypothesis?

The first step was to make sure the I appropriately construct the correct 2^k design by representing each 3-level factor in terms of 2-level factors. That is what I precisely did by : Here is the decomposition of the 3-level factors:

#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
knitr::kable(X1,align='c')

C	D	X1
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(E,F,X2))
knitr::kable(X2, align ='c')

E	F	X2
-1	-1	Low
1	-1	Med
-1	1	Med
1	1	High

For simiplicity, I have included only the key steps in generating the FFD model. The project builds on the previous one.

FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t2)
for (i in 1:n){
  if (design$CD[i] == 1){design$airbag[i] <- "Low"}
  if (design$CD[i] == 2){design$airbag[i] <- "Med"}
  if (design$CD[i] == 4){design$airbag[i] <- "High"}
  
  if (design$EF[i] == 1){design$drive[i] <- "Low"}
  if (design$EF[i] == 2){design$drive[i] <- "Med"}
  if (design$EF[i] == 4){design$drive[i] <- "High"}
  
  if (design$A[i] == 1){design$man.trans[i] <- "no"}
  if (design$A[i] == 2){design$man.trans[i] <- "yes"}
  
  if (design$B[i] == 1){design$origin[i] <- "no"}
  if (design$B[i] == 2){design$origin[i] <- "yes"}
  
}

fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)

##   A      B      C      D      E      F     design$airbag design$drive
##  -1:4   -1:4   -1:4   -1:4   -1:4   -1:4   High:2        High:2      
##  1 :4   1 :4   1 :4   1 :4   1 :4   1 :4   Low :2        Low :2      
##                                            Med :4        Med :4      
##  design$man.trans design$origin
##  no :4            no :4        
##  yes:4            yes:4        
##

knitr::kable(fractionalfactorialdesign, align = 'c')

A	B	C	D	E	F	design$airbag	design$drive	design$man.trans	design$origin
1	-1	-1	-1	-1	1	Low	Med	yes	no
-1	1	-1	-1	1	-1	Low	Med	no	yes
1	1	1	1	1	1	High	High	yes	yes
-1	-1	1	1	-1	-1	High	Low	no	no
1	-1	1	-1	1	-1	Med	Med	yes	no
-1	-1	-1	1	1	1	Med	High	no	no
-1	1	1	-1	-1	1	Med	Med	no	yes
1	1	-1	1	-1	-1	Med	Low	yes	yes

FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)

## $legend
## [1] A=A B=B C=C D=D E=E F=F
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

Once the FFD was created, I looked at the aliased structure, me (see below), and the ANOVA results to generate a model (equation).

The FFD design had Resolution III, which meant the me and 2fi are confounded with the 2fi. The me of Factor A are aliased with the 2fi’s.

The equation that will be used in the rsm and formulated previously is shown again below for reference:

# ANOVA from ffd
fit2 <- aov(price~drive + man.trans + origin, data = cars2)
summary(fit2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## drive        1   1126  1126.2  16.470 0.000106 ***
## man.trans    1    759   758.7  11.096 0.001261 ** 
## origin       1    613   613.1   8.966 0.003561 ** 
## Residuals   89   6086    68.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

model

To ensure that all factors are integers for the rsm, I made sure to do as.integers:

cars2$airbag = as.integer(cars2$airbag)
cars2$drive = as.integer(cars2$drive)
cars2$man.trans = as.integer(cars2$man.trans)
cars2$origin = as.integer(cars2$origin)

What is the rationale for this design?

The rationale for this design was limited as I adapted a pre-existing experiment design dataset. The original factors and procedures to collect the data were done by the experimentor. Therefore, I had less flexibility and control over some aspects.

However, at least for the fractional factorial design, I used it to reduce the number of measurements and used alias to determine the confounding variables and resolution. The aliasing would help check for accuracy. Then, I will validate this by using the Response Surface Methods.

Randomize: What is the Randomization Scheme?

First of all, I assume the data collected was random because the cars were chosen randomly and at a random time frame of the year 1993, including day and time.

Also, to ensure further randomization, I randomized the order of the data:

#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata

##                airbag drive man.trans  origin price
## 46               None Front       Yes non-USA  10.0
## 43 Driver & Passenger Front       Yes non-USA  17.5
## 42        Driver only Front       Yes non-USA  12.1
## 37        Driver only Front        No     USA  20.2
## 39               None Front       Yes non-USA   8.4
## 89               None Front       Yes non-USA  19.7
## 15               None Front        No     USA  15.9
## 32               None Front       Yes     USA  10.1
## 68               None Front        No     USA  13.5
## 85        Driver only Front       Yes non-USA  18.4
## 29               None Front       Yes     USA  12.2
## 57        Driver only  Rear       Yes non-USA  32.5
## 91               None Front       Yes non-USA  23.3
## 34        Driver only  Rear       Yes     USA  15.9
## 67        Driver only Front        No non-USA  21.5
## 7         Driver only Front        No     USA  20.8
## 20 Driver & Passenger Front        No     USA  18.4
## 74               None Front       Yes     USA  11.1
## 11 Driver & Passenger Front        No     USA  40.1
## 28        Driver only   4WD       Yes     USA  25.8
## 63        Driver only Front        No non-USA  26.1
## 40        Driver only Front       Yes non-USA  12.5
## 14 Driver & Passenger  Rear       Yes     USA  15.1
## 55        Driver only Front       Yes non-USA  16.5
## 86        Driver only Front       Yes non-USA  18.2
## 51 Driver & Passenger Front        No     USA  34.3
## 73               None Front       Yes     USA   9.0
## 21 Driver & Passenger Front        No     USA  15.8
## 10        Driver only Front        No     USA  34.7
## 64        Driver only Front       Yes non-USA  11.8
## 2  Driver & Passenger Front       Yes non-USA  33.9
## 52 Driver & Passenger  Rear        No     USA  36.1
## 17               None   4WD        No     USA  16.6
## 80               None   4WD       Yes non-USA   8.4
## 27        Driver only Front        No     USA  15.6
## 70               None Front        No     USA  19.5
## 23               None Front       Yes     USA   9.2
## 53               None Front       Yes non-USA   8.3
## 56               None   4WD        No non-USA  19.1
## 44               None Front       Yes non-USA   8.0
## 90               None Front       Yes non-USA  20.0
## 79        Driver only Front       Yes     USA  11.1
## 49        Driver only Front       Yes non-USA  28.0
## 16               None Front        No     USA  16.3
## 3         Driver only Front       Yes non-USA  29.1
## 61               None  Rear        No     USA  14.9
## 81               None   4WD       Yes non-USA  10.9
## 72               None   4WD       Yes     USA  14.4
## 78        Driver only Front       Yes non-USA  28.7
## 13        Driver only Front       Yes     USA  11.4
## 36        Driver only   4WD       Yes     USA  19.9
## 69        Driver only Front        No     USA  16.3
## 54               None Front       Yes non-USA  11.6
## 65        Driver only Front       Yes non-USA  15.7
## 45               None Front       Yes non-USA  10.0
## 9         Driver only Front        No     USA  26.3
## 83               None Front       Yes non-USA   8.6
## 84        Driver only Front       Yes non-USA   9.8
## 77 Driver & Passenger Front        No     USA  24.4
## 71        Driver only Front        No     USA  20.7
## 75 Driver & Passenger  Rear       Yes     USA  17.7
## 76               None Front       Yes     USA  18.5
## 30 Driver & Passenger Front        No     USA  19.3
## 35        Driver only Front       Yes     USA  14.0
## 48        Driver only  Rear        No non-USA  47.9
## 47               None Front       Yes non-USA  13.9
## 58        Driver only  Rear       Yes non-USA  31.9
## 33               None Front       Yes     USA  11.3
## 92        Driver only  Rear       Yes non-USA  22.7
## 24        Driver only Front       Yes     USA  11.3
## 87        Driver only   4WD       Yes non-USA  22.7
## 88               None Front       Yes non-USA   9.1
## 93 Driver & Passenger Front       Yes non-USA  26.7
## 4  Driver & Passenger Front       Yes non-USA  37.7
## 82        Driver only   4WD       Yes non-USA  19.5
## 5         Driver only  Rear       Yes non-USA  30.0
## 26        Driver only   4WD        No     USA  19.0
## 22        Driver only Front        No     USA  29.5
## 25        Driver only Front       Yes     USA  13.3
## 41 Driver & Passenger Front       Yes non-USA  19.8
## 59 Driver & Passenger  Rear        No non-USA  61.9
## 6         Driver only Front        No     USA  15.7
## 12               None Front       Yes     USA  13.4
## 31               None Front       Yes     USA   7.4
## 8         Driver only  Rear        No     USA  23.7
## 60        Driver only Front       Yes     USA  14.1
## 62               None Front       Yes non-USA  10.3
## 66               None Front        No non-USA  19.1
## 19        Driver only  Rear       Yes     USA  38.0
## 50 Driver & Passenger  Rear       Yes non-USA  35.2
## 18        Driver only  Rear        No     USA  18.8
## 1                None Front       Yes non-USA  15.9
## 38        Driver only  Rear        No     USA  20.9

In addition, further randomization had occurred whie matching and subsetting (selection and assignment). In essence, I randomized many times. First, the data itself. Then, I randomized the order in which the experiment would run. Finally, the last randomization component was the random subset groups I would place them after Matching.

Therefore, there were 3 levels of randomization: measurement, assignment, and performing.

Replicate: Are there replicates and/or repeated measures?

Due to the simplification of the dataset, which was during 1993, I cannot verify if there were any replicates and/or repeated measures. At least for the project, I have not implemented such.

Block: Did you use blocking in the design?

There is no indication that blocking was used in the original experiment. Personally, I did not use blocking in my design as well.

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Below is the descriptive summary of the data.

summary(cars2)

##      airbag          drive         man.trans          origin      
##  Min.   :0.000   Min.   :0.000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:1.000   1st Qu.:1.000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :1.000   Median :1.000   Median :1.0000   Median :1.0000  
##  Mean   :1.194   Mean   :1.065   Mean   :0.6559   Mean   :0.5161  
##  3rd Qu.:2.000   3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :2.000   Max.   :2.000   Max.   :1.0000   Max.   :1.0000  
##      price      
##  Min.   : 7.40  
##  1st Qu.:12.20  
##  Median :17.70  
##  Mean   :19.51  
##  3rd Qu.:23.30  
##  Max.   :61.90

head(cars2,10)

##    airbag drive man.trans origin price
## 1       2     1         1      0  15.9
## 2       0     1         1      0  33.9
## 3       1     1         1      0  29.1
## 4       0     1         1      0  37.7
## 5       1     2         1      0  30.0
## 6       1     1         0      1  15.7
## 7       1     1         0      1  20.8
## 8       1     2         0      1  23.7
## 9       1     1         0      1  26.3
## 10      1     1         0      1  34.7

Here are some graphical representations:

#Histogram
hist(cars$price, main = "Histogram of Car Prices")

Boxplots for each factors:

#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")

boxplot(cars$price~cars$drive, main ="Price vs Drive")

boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")

boxplot(cars$price~cars$origin, main ="Price vs Origin")

# Boxplot of overall factors with labeled axes 
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")

#Pairs plot

pairs(price~airbag, cars2)

pairs(price~drive, cars2)

pairs(price~man.trans, cars2)

pairs(price~origin, cars2)

pairs(cars2)

Testing

In this experiment, the null hypothesis states that there is no difference between the price of the car because of the airbags, drive, man.trans, and origin.

On the contrary, the alternative hypothesis states that there is a difference between the price of the car because of the airbags, drive, man.trans, and origin.

I kept the results from Project3 using FFD and built on it using rsm.

Using the model from Project3:

Initial ANOVA on all:

fit = aov(price~airbag*drive*man.trans*origin, cars2)
summary(fit)

##                               Df Sum Sq Mean Sq F value   Pr(>F)    
## airbag                         1 2742.6  2742.6  67.570 3.83e-12 ***
## drive                          1  360.4   360.4   8.879 0.003858 ** 
## man.trans                      1  390.9   390.9   9.630 0.002677 ** 
## origin                         1  576.9   576.9  14.214 0.000318 ***
## airbag:drive                   1   99.5    99.5   2.452 0.121482    
## airbag:man.trans               1   10.5    10.5   0.259 0.612404    
## drive:man.trans                1   58.8    58.8   1.448 0.232611    
## airbag:origin                  1  522.8   522.8  12.881 0.000582 ***
## drive:origin                   1  307.8   307.8   7.583 0.007346 ** 
## man.trans:origin               1   91.3    91.3   2.249 0.137794    
## airbag:drive:man.trans         1  100.6   100.6   2.478 0.119569    
## airbag:drive:origin            1   71.9    71.9   1.772 0.187062    
## airbag:man.trans:origin        1    9.2     9.2   0.227 0.634936    
## drive:man.trans:origin         1   90.2    90.2   2.222 0.140120    
## airbag:drive:man.trans:origin  1   25.4    25.4   0.625 0.431608    
## Residuals                     77 3125.3    40.6                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As you can see, there was significance for all the factors. This made sense because the price of cars, especially back then, would matter a lot depending if the car was manual transmission or not or had 4WD. Even today, 4WD drive cars tend to be more expensive than a front or rear drive car. Also, origin is another factor that makes sense because ‘Made in the USA’ is definitely more expensive than a foreign part due to many factors such as labor costs and production/manufacturing costs. Therefore, ‘Made in the USA’ parts would increase the car’s price. Airbags costs should also be included in the car. Making airbags are not free. There is a cost to it, therefore increasing the price of the car. Using good judgement and the meie table, I knew not to consider the 2fi because they were highly aliased. Then, I wanted to differentiate the “more significant ones”. To do so, the best logical way was to refer to the meie table again. Airbags had a relatively small me of -2. Drive, man.trans, and origin had a bigger me of -11.3 and -11.65. Therefore, I decided to include drive, man.trans, and origin into the linear model. Intuitively and using prior knowledge this made sense. Drive is based on the car’s engine and bottom layout design, requiring more time, hence more costs. So, the results made sense.

Therefore, I generated another ANOVA using those factors.

fit2 = aov(price~man.trans +drive + origin, data = cars2)
summary(fit2)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## man.trans    1    915   915.1  13.382 0.000430 ***
## drive        1    970   969.9  14.184 0.000297 ***
## origin       1    613   613.1   8.966 0.003561 ** 
## Residuals   89   6086    68.4                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results from the ANOVA, my Best Guess experiment, and the me all matched together and were consistent. The ANOVA factors that were significant had a high me. This also matched with my prior background knowledge of cars in general.

The linear model is as follows:

fit.lm = lm(price~ drive+man.trans + origin, data = cars2)
summary(fit.lm)

## 
## Call:
## lm.default(formula = price ~ drive + man.trans + origin, data = cars2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -13.283  -5.997  -2.005   4.731  28.028 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   21.509      2.813   7.645 2.31e-11 ***
## drive          6.182      1.640   3.770 0.000293 ***
## man.trans     -8.604      2.004  -4.292 4.49e-05 ***
## origin        -5.690      1.900  -2.994 0.003561 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.269 on 89 degrees of freedom
## Multiple R-squared:  0.291,  Adjusted R-squared:  0.2671 
## F-statistic: 12.18 on 3 and 89 DF,  p-value: 9.477e-07

This is the new ANOVA fit using the me and aliased information (the same from Project3). Now, I expanded the testing with this model with rsm.

Response Surface Methods (rsm)

I first tested using the 3 factors that I found from the model in Project3. I tried to the find the model that did not give me an “ALIAS” error. I only included the models that gave me some reasonable outputs and no “ALAIS” error. The 2fi’s were not included due to the Resolution III design given from the FFD design.

#Aliased
model=rsm(price~SO(airbag,drive, man.trans, origin), data=cars2)

## Warning in rsm(price ~ SO(airbag, drive, man.trans, origin), data = cars2): Some coefficients are aliased - cannot use 'rsm' methods.
##   Returning an 'lm' object.

summary(model)

## 
## Call:
## rsm(formula = price ~ SO(airbag, drive, man.trans, origin), data = cars2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -10.4869  -3.7063  -0.8791   2.3283  19.9749 
## 
## Coefficients: (2 not defined because of singularities)
##                                                        Estimate Std. Error
## (Intercept)                                            39.99750    9.09023
## FO(airbag, drive, man.trans, origin)airbag            -11.83856    5.72803
## FO(airbag, drive, man.trans, origin)drive              -2.78542    7.62714
## FO(airbag, drive, man.trans, origin)man.trans         -14.48849    6.45127
## FO(airbag, drive, man.trans, origin)origin             -9.32015    6.15339
## TWI(airbag, drive, man.trans, origin)airbag:drive      -0.03507    2.70484
## TWI(airbag, drive, man.trans, origin)airbag:man.trans   4.55544    2.71398
## TWI(airbag, drive, man.trans, origin)airbag:origin      5.55241    2.51099
## TWI(airbag, drive, man.trans, origin)drive:man.trans   -1.61661    3.12726
## TWI(airbag, drive, man.trans, origin)drive:origin      -6.81350    2.95664
## TWI(airbag, drive, man.trans, origin)man.trans:origin   5.15235    3.47151
## PQ(airbag, drive, man.trans, origin)airbag^2            0.08181    1.53921
## PQ(airbag, drive, man.trans, origin)drive^2             5.20849    1.89470
## PQ(airbag, drive, man.trans, origin)man.trans^2              NA         NA
## PQ(airbag, drive, man.trans, origin)origin^2                 NA         NA
##                                                       t value Pr(>|t|)    
## (Intercept)                                             4.400 3.31e-05 ***
## FO(airbag, drive, man.trans, origin)airbag             -2.067  0.04199 *  
## FO(airbag, drive, man.trans, origin)drive              -0.365  0.71593    
## FO(airbag, drive, man.trans, origin)man.trans          -2.246  0.02747 *  
## FO(airbag, drive, man.trans, origin)origin             -1.515  0.13381    
## TWI(airbag, drive, man.trans, origin)airbag:drive      -0.013  0.98969    
## TWI(airbag, drive, man.trans, origin)airbag:man.trans   1.679  0.09715 .  
## TWI(airbag, drive, man.trans, origin)airbag:origin      2.211  0.02987 *  
## TWI(airbag, drive, man.trans, origin)drive:man.trans   -0.517  0.60662    
## TWI(airbag, drive, man.trans, origin)drive:origin      -2.304  0.02379 *  
## TWI(airbag, drive, man.trans, origin)man.trans:origin   1.484  0.14169    
## PQ(airbag, drive, man.trans, origin)airbag^2            0.053  0.95774    
## PQ(airbag, drive, man.trans, origin)drive^2             2.749  0.00739 ** 
## PQ(airbag, drive, man.trans, origin)man.trans^2            NA       NA    
## PQ(airbag, drive, man.trans, origin)origin^2               NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 6.233 on 80 degrees of freedom
## Multiple R-squared:  0.6379, Adjusted R-squared:  0.5836 
## F-statistic: 11.74 on 12 and 80 DF,  p-value: 3.121e-13

aov(model)

## Call:
##    aov.default(formula = model)
## 
## Terms:
##                 FO(airbag, drive, man.trans, origin)
## Sum of Squares                              4070.732
## Deg. of Freedom                                    4
##                 TWI(airbag, drive, man.trans, origin)
## Sum of Squares                               1090.676
## Deg. of Freedom                                     6
##                 PQ(airbag, drive, man.trans, origin) Residuals
## Sum of Squares                               314.386  3108.228
## Deg. of Freedom                                    2        80
## 
## Residual standard error: 6.233205
## 2 out of 15 effects not estimable
## Estimated effects may be unbalanced

The above model gave an alias error, indicating that I try a different order. So, I tried a first order model.

#Non-Aliased
model=rsm(price~FO(airbag,drive, man.trans, origin), data=cars2)
summary(model)

## 
## Call:
## rsm(formula = price ~ FO(airbag, drive, man.trans, origin), data = cars2)
## 
##             Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  30.1856     2.8969 10.4200 < 2.2e-16 ***
## airbag       -6.1889     1.1176 -5.5374 3.139e-07 ***
## drive         3.8479     1.4812  2.5977 0.0109999 *  
## man.trans    -6.9160     1.7624 -3.9242 0.0001723 ***
## origin       -5.5199     1.6458 -3.3539 0.0011769 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.4742, Adjusted R-squared:  0.4503 
## F-statistic: 19.84 on 4 and 88 DF,  p-value: 1.135e-11
## 
## Analysis of Variance Table
## 
## Response: price
##                                      Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag, drive, man.trans, origin)  4 4070.7 1017.68 19.8428 1.135e-11
## Residuals                            88 4513.3   51.29                  
## Lack of fit                          21 2057.5   97.98  2.6731  0.001232
## Pure error                           67 2455.7   36.65                  
## 
## Direction of steepest ascent (at radius 1):
##     airbag      drive  man.trans     origin 
## -0.5398817  0.3356648 -0.6033139 -0.4815281 
## 
## Corresponding increment in original units:
##     airbag      drive  man.trans     origin 
## -0.5398817  0.3356648 -0.6033139 -0.4815281

aov(model)

## Call:
##    aov.default(formula = model)
## 
## Terms:
##                 FO(airbag, drive, man.trans, origin) Residuals
## Sum of Squares                              4070.732  4513.289
## Deg. of Freedom                                    4        88
## 
## Residual standard error: 7.161521
## Estimated effects may be unbalanced

NOTE: Result dicussion to follow below

Now, I tested a second model, changing the TWI and PQ terms of the first model. I tested different combinations and tried 8 models in total.

#Test other models 
model3=rsm(price~FO(airbag*drive, man.trans), data=cars2)
summary(model3)

## 
## Call:
## rsm(formula = price ~ FO(airbag * drive, man.trans), data = cars2)
## 
##                           Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)                26.8679     1.9109 14.0606 < 2.2e-16 ***
## c("*", "airbag", "drive")  -2.9355     1.0610 -2.7666  0.006874 ** 
## man.trans                  -6.0209     1.9457 -3.0945  0.002627 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.1766, Adjusted R-squared:  0.1583 
## F-statistic: 9.653 on 2 and 90 DF,  p-value: 0.0001591
## 
## Analysis of Variance Table
## 
## Response: price
##                               Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag * drive, man.trans)  2 1516.2  758.10  9.6535 0.0001591
## Residuals                     90 7067.8   78.53                  
## Lack of fit                    4  102.9   25.73  0.3177 0.8654438
## Pure error                    86 6964.9   80.99                  
## 
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "drive")                 man.trans 
##                -0.4382394                -0.8988583 
## 
## Corresponding increment in original units:
## c("*", "airbag", "drive")                 man.trans 
##                -0.4382394                -0.8988583

model5=rsm(price~FO(airbag*origin), data=cars2)
summary(model5)

## 
## Call:
## rsm(formula = price ~ FO(airbag * origin), data = cars2)
## 
##                            Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)                 21.5663     1.2230 17.6345 < 2.2e-16 ***
## c("*", "airbag", "origin")  -3.4775     1.2644 -2.7503  0.007183 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.07674,    Adjusted R-squared:  0.0666 
## F-statistic: 7.564 on 1 and 91 DF,  p-value: 0.007183
## 
## Analysis of Variance Table
## 
## Response: price
##                     Df Sum Sq Mean Sq F value   Pr(>F)
## FO(airbag * origin)  1  658.8  658.76  7.5640 0.007183
## Residuals           91 7925.3   87.09                 
## Lack of fit          1  106.9  106.92  1.2308 0.270208
## Pure error          90 7818.3   86.87                 
## 
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "origin") 
##                         -1 
## 
## Corresponding increment in original units:
## c("*", "airbag", "origin") 
##                         -1

model6=rsm(price~FO(airbag^2, drive), data=cars2)
summary(model6)

## 
## Call:
## rsm(formula = price ~ FO(airbag^2, drive), data = cars2)
## 
##                       Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           20.65901    2.39543  8.6243 2.080e-13 ***
## c("^", "airbag", "2") -2.86813    0.52992 -5.4124 5.107e-07 ***
## drive                  4.10614    1.63047  2.5184   0.01356 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.3445, Adjusted R-squared:   0.33 
## F-statistic: 23.65 on 2 and 90 DF,  p-value: 5.552e-09
## 
## Analysis of Variance Table
## 
## Response: price
##                     Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag^2, drive)  2 2957.6 1478.79 23.6545 5.552e-09
## Residuals           90 5626.4   62.52                  
## Lack of fit          5  636.0  127.19  2.1664   0.06534
## Pure error          85 4990.5   58.71                  
## 
## Direction of steepest ascent (at radius 1):
## c("^", "airbag", "2")                 drive 
##            -0.5726362             0.8198096 
## 
## Corresponding increment in original units:
## c("^", "airbag", "2")                 drive 
##            -0.5726362             0.8198096

model7=rsm(price~FO(airbag, drive^2), data=cars2)
summary(model7)

## 
## Call:
## rsm(formula = price ~ FO(airbag, drive^2), data = cars2)
## 
##                      Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)          24.06576    2.07065 11.6223 < 2.2e-16 ***
## airbag               -6.49191    1.16917 -5.5526  2.82e-07 ***
## c("^", "drive", "2")  2.26631    0.67764  3.3444  0.001203 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.3947, Adjusted R-squared:  0.3813 
## F-statistic: 29.35 on 2 and 90 DF,  p-value: 1.542e-10
## 
## Analysis of Variance Table
## 
## Response: price
##                     Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag, drive^2)  2 3388.3 1694.14 29.3458 1.542e-10
## Residuals           90 5195.7   57.73                  
## Lack of fit          5  205.3   41.05  0.6992    0.6255
## Pure error          85 4990.5   58.71                  
## 
## Direction of steepest ascent (at radius 1):
##               airbag c("^", "drive", "2") 
##           -0.9441238            0.3295910 
## 
## Corresponding increment in original units:
##               airbag c("^", "drive", "2") 
##           -0.9441238            0.3295910

All 4 models have a p-value less than 0.05 of 0.0001591, 0.0001591, 0.007183, and 5.552e-09 in the order it was executed. This suggests that the TWI and PQ does significantly contribute to the model.

Deeper discussions would follow for the latter cases. From the second models, I wanted to focus on model2, model4, and model8. I excluded these 3 models above to make it easier to look at.

Model2

model2=rsm(price~SO(airbag, drive), data=cars2)
summary(model2)

## 
## Call:
## rsm(formula = price ~ SO(airbag, drive), data = cars2)
## 
##               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)  27.699582   6.419830  4.3147 4.209e-05 ***
## airbag       -6.951925   6.271269 -1.1085   0.27069    
## drive        -6.810102   7.371851 -0.9238   0.35815    
## airbag:drive  0.082981   3.116621  0.0266   0.97882    
## airbag^2      0.143322   1.828783  0.0784   0.93771    
## drive^2       4.958672   2.198004  2.2560   0.02658 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.4117, Adjusted R-squared:  0.3779 
## F-statistic: 12.18 on 5 and 87 DF,  p-value: 5.91e-09
## 
## Analysis of Variance Table
## 
## Response: price
##                    Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag, drive)   2 3103.0 1551.48 26.7301 8.923e-10
## TWI(airbag, drive)  1  107.6  107.60  1.8539   0.17685
## PQ(airbag, drive)   2  323.8  161.90  2.7893   0.06697
## Residuals          87 5049.7   58.04                  
## Lack of fit         2   59.2   29.60  0.5042   0.60581
## Pure error         85 4990.5   58.71                  
## 
## Stationary point of response surface:
##     airbag      drive 
## 24.1124672  0.4849304 
## 
## Eigenanalysis:
## $values
## [1] 4.9590290 0.1429643
## 
## $vectors
##               [,1]         [,2]
## airbag 0.008615357 -0.999962887
## drive  0.999962887  0.008615357

Model4

model4=rsm(price~FO(airbag*man.trans), data=cars2)
summary(model4)

## 
## Call:
## rsm(formula = price ~ FO(airbag * man.trans), data = cars2)
## 
##                               Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)                    24.5667     1.2374 19.8542 < 2.2e-16 ***
## c("*", "airbag", "man.trans")  -5.9532     1.0426 -5.7102 1.403e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.2638, Adjusted R-squared:  0.2557 
## F-statistic: 32.61 on 1 and 91 DF,  p-value: 1.403e-07
## 
## Analysis of Variance Table
## 
## Response: price
##                        Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag * man.trans)  1 2264.4 2264.39 32.6062 1.403e-07
## Residuals              91 6319.6   69.45                  
## Lack of fit             1   55.3   55.31  0.7947    0.3751
## Pure error             90 6264.3   69.60                  
## 
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "man.trans") 
##                            -1 
## 
## Corresponding increment in original units:
## c("*", "airbag", "man.trans") 
##                            -1

Model8:

model8=rsm(price~FO(airbag^2, drive^2, origin^2), data=cars2)
summary(model8)

## 
## Call:
## rsm(formula = price ~ FO(airbag^2, drive^2, origin^2), data = cars2)
## 
##                       Estimate Std. Error t value  Pr(>|t|)    
## (Intercept)           23.01509    1.96063 11.7386 < 2.2e-16 ***
## c("^", "airbag", "2") -2.77252    0.51526 -5.3809 5.944e-07 ***
## c("^", "drive", "2")   2.33267    0.68067  3.4270 0.0009257 ***
## c("^", "origin", "2") -2.81879    1.58617 -1.7771 0.0789679 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Multiple R-squared:  0.3975, Adjusted R-squared:  0.3772 
## F-statistic: 19.58 on 3 and 89 DF,  p-value: 7.834e-10
## 
## Analysis of Variance Table
## 
## Response: price
##                                 Df Sum Sq Mean Sq F value    Pr(>F)
## FO(airbag^2, drive^2, origin^2)  3 3412.4 1137.48 19.5754 7.834e-10
## Residuals                       89 5171.6   58.11                  
## Lack of fit                     11 1215.7  110.52  2.1792   0.02387
## Pure error                      78 3955.8   50.72                  
## 
## Direction of steepest ascent (at radius 1):
## c("^", "airbag", "2")  c("^", "drive", "2") c("^", "origin", "2") 
##            -0.6039533             0.5081385            -0.6140323 
## 
## Corresponding increment in original units:
## c("^", "airbag", "2")  c("^", "drive", "2") c("^", "origin", "2") 
##            -0.6039533             0.5081385            -0.6140323

NOTE: Result dicussion to follow below

Result Discussion:Do either of the optimal designs (in natural units) make physical sense for the problem?

First of all, the results. For the first model, that is testing the FO of the model from Project3, the p-value was 1.135e-11, which was significantly less than 0.05. This indicated the FO model contributed significantly to the model. The same applied for Models2 and 8 because the p-values were significantly less than 0.05.

For all 4 cases considered so far, except Model2, I was able to reject the null hypothesis because the factors of airbag, drive, man.trans, and origin had a significant effect on the price of car. Note, the reason why airbag was added to this model was because if I used the model developed in Project3 strictly, then I got an alias error. Therefore, I had to make small modifications by adding ‘airbag’. This was justified because initially in Project3, airbag was still highly significant, but I excluded as it was “comparetively” less significant to the other 3 factors. Technically, I could have added “airbag” in Project3’s model. Thus, I would proceed with the modification. The results were consistent with the FFD design.

In terms of Model2, there was more to talk about. Model2 was a second order model. Examining the p-values of the factors, only drive^2 was significant. The other factors were actually near 0.5 or near 1. This was inconsistent to the previous models. However, when looking at the p-value of the model itself, the SO model was insignificant. The stationary points of the response surface were airbag: 24.1124672 and drive: 0.4849304.

Let’s look at the Canonical Analysis. Looking at the Eignevalues, point eigenvalues were positive, “4.9590290 0.1429643”. Since both eigenvalues were positive, the stationary point is of maximum response. The maximum point can be inferred to be at airbag.

In terms of analzing whether the optimal designs (in natural units) make physical sense for the problem, I was able to take an attempt by comparing the dimensionless quantities I specified, which would be translated to physical units, to a Central Composite Design (CCD). A Central Composite Design has points at +-1,0, and +-1.414. Mapping these quantities to the Central Composite Design, it makes sense because the vectors are roughly mapped to 1,0, and -1. This implies that in this specific case of the model, the levels of the factors are at points 1 and -1 of the “CCD cube”. Perhaps, in this case because the factors are mapped at -1, 0, and 1, maybe the levels of the factors are all within the design cube space. None of the points are outside of the cube. This makes sense because it is difficult to test the effects of those outside of the cube. This is not a Box-Behnken because there were no points that were in between 0 and 1. The values are very close to -1, 0, and 1. Therefore, the levels of the factors that were tested overall could include: 0 - Center Point, (0,-1.414), (1,-1) - high/low, (1,1) - high/high, (-1,1) - low/high, and (-1,-1) - low/low. After close examination of the SO case, the levels of the factors for airbag was tested as (0,-0.99999), suggesting that this could be a point outside the box (depending on R’s rounding rule), and the drive factor was tested as (1.4,0), also outside the box. As you can see, by looking at the values mapped in this space, they make sense to the physically if comparing to the CCD. The values can be logically mapped to the box points. In our case for the SO model, there were no points in the Center and no points tested as low/low or high/high. Therefore, I am confident that the optimal designs in natural units make some physical sense for the problem because the points can be mapped to the real physical value.

The response surfaces can be visually interpreted by studying the contour plots below:

par(mfrow=c(2,3))
contour(model,~ airbag+drive+man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))

par(mfrow=c(1,1))
contour(model,~ airbag+drive, image=TRUE, at=summary(model2$canoncial$xs))

contour(model,~ airbag+man.trans, image=TRUE, at=summary(model2$canoncial$xs))

contour(model,~ airbag+origin, image=TRUE, at=summary(model2$canoncial$xs))

contour(model,~ drive+origin, image=TRUE, at=summary(model2$canoncial$xs))

contour(model,~ drive+man.trans, image=TRUE, at=summary(model2$canoncial$xs))

contour(model,~ man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))

persp(model,~airbag+drive,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model,~airbag+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model,~airbag+origin,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model,~drive+origin,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model,~drive+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

persp(model,~man.trans+origin,image=TRUE,contour="colors",zlab="Price",theta=30)

## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter

## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter

## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter

Estimation

confint(fit)

##                                     2.5 %    97.5 %
## (Intercept)                   -53.4077848 28.807785
## airbag                         -7.3762505 37.296250
## drive                          15.5037495 60.176250
## man.trans                     -10.7883286 74.678802
## origin                        -11.2860669 74.654884
## airbag:drive                  -29.3861378 -4.013862
## airbag:man.trans              -43.9039425  3.836701
## drive:man.trans               -54.3021648 -5.519062
## airbag:origin                 -39.3929001  8.871018
## drive:origin                  -55.7455438 -6.476929
## man.trans:origin              -75.3952600 19.385139
## airbag:drive:man.trans         -0.4831718 29.311586
## airbag:drive:origin            -2.7706219 26.962826
## airbag:man.trans:origin       -14.3652170 41.819390
## drive:man.trans:origin         -7.5805897 49.164888
## airbag:drive:man.trans:origin -26.4456237 11.414042

coef(fit)

##                   (Intercept)                        airbag 
##                    -12.300000                     14.960000 
##                         drive                     man.trans 
##                     37.840000                     31.945237 
##                        origin                  airbag:drive 
##                     31.684409                    -16.700000 
##              airbag:man.trans               drive:man.trans 
##                    -20.033621                    -29.910613 
##                 airbag:origin                  drive:origin 
##                    -15.260941                    -31.111237 
##              man.trans:origin        airbag:drive:man.trans 
##                    -28.005061                     14.414207 
##           airbag:drive:origin       airbag:man.trans:origin 
##                     12.096102                     13.727087 
##        drive:man.trans:origin airbag:drive:man.trans:origin 
##                     20.792149                     -7.515791

Diagnostics / Model Adequacy Checking

To check if the normality assumption holds, I used the qqnorm and qqlines. I also did a Shapiro-test.

shapiro.test(cars2$price)

## 
##  Shapiro-Wilk normality test
## 
## data:  cars2$price
## W = 0.88051, p-value = 4.235e-07

shapiro.test(1/(cars2$price))

## 
##  Shapiro-Wilk normality test
## 
## data:  1/(cars2$price)
## W = 0.95218, p-value = 0.00186

shapiro.test((cars2$price)^2)

## 
##  Shapiro-Wilk normality test
## 
## data:  (cars2$price)^2
## W = 0.6641, p-value = 2.839e-13

shapiro.test(log(cars2$price))

## 
##  Shapiro-Wilk normality test
## 
## data:  log(cars2$price)
## W = 0.9841, p-value = 0.32

shapiro.test(sqrt(cars2$price))

## 
##  Shapiro-Wilk normality test
## 
## data:  sqrt(cars2$price)
## W = 0.95076, p-value = 0.001511

Normality was valid as is. In fact, transformations like log and sqrt, made the normality assumption worse.

#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))

#Residual Plot
plot(fitted(fit2),residuals(fit2))

Based on the qqnorm and qqline, the qqnorm line looks fairly linear and closely fits the qqline drawn. Also, the residual plot showed an evenly distributed residual as a function of fitted values. Also, there was a good scatter and not close to 0. The pattern seemed random and evenly distributed, two important conditions that satisfy the normality check. Therefore, the normality assumption was satisfied because the residuals were evenly distributed and the qqnorm line is linear enough. The normality check passed. There is no need to do transformation because the normality assumption was passed.

Summary: Overall, the experiment finding was an interesting one. The null hypothesis was rejected because the price of cars depended on the factors, like drive, man.trans, and origin. The fractional factorial design was implemented and I looked into the me and ie as well. The aliasing structure was examined and the Resolution was Resolution III. Then, as an extension, I looked at a response surface method by testing different models. The model’s results in the response surface methods were fairly consistent with that of the fractional factorial design. In general, the airbag, drive, man.trans, and origin had a great impact, while the 2fi were not reliable.

For the findings, it was interesting to explore the design procedure.

The analysis made sense and the fractional factorial design and response surface methods were well executed.

4. References to the literature

1.https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html

http://support.minitab.com/en-us/minitab/17/topic-library/modeling-statistics/doe/response-surface-designs/what-is-a-central-composite-design/

5. Appendices

#Benjamin Byeon
#12/16/16
#Section 1

#Load the package
library("FrF2")
library("rsm")
library("rgl")

#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars

#System under test
summary(cars)
head(cars)
tail(cars)
names(cars)

#Display the structure
names(cars)
str(cars)

#Check the levels
levels(cars$price)
levels(cars$airbag)
levels(cars$drive)
levels(cars$man.trans)
levels(cars$origin)


#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
kable(X1,align='c')

E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(C,D,X2))
kable(X2, align ='c')

#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)

#Automatically replace the categories as 0,1,2 
for (i in 1:a){
  #For airbags
  if (cars$airbag[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (cars$airbag[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (cars$airbag[i] == "None"){
    airbag[i,1] = 2
  }
  
  #For drive
  if (cars$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (cars$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (cars$drive[i] == "Rear"){
    drive[i,1] = 2
  }
  
  
  #For man.trans
  if (cars$man.trans[i] == "No"){
    man.trans[i,1] = 0
  } else{
    man.trans[i,1] = 1
  }
  
  #For origin
  if (cars$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
}
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")

#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata

#Display the descriptive statistics 
summary(cars)

#Histogram
hist(cars$price, main = "Histogram of Car Prices")

#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")
boxplot(cars$price~cars$drive, main ="Price vs Drive")
boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")
boxplot(cars$price~cars$origin, main ="Price vs Origin")

#Pairs plot

pairs(price~airbag, cars2)
pairs(price~drive, cars2)
pairs(price~man.trans, cars2)
pairs(price~origin, cars2)

pairs(cars2)

#New levels
head(cars2,10)
tail(cars2, 20)
str(cars2)

# Boxplot of overall factors with labeled axes 
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")

#FFD
FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))

n = nrow(t2)
for (i in 1:n){
  if (design$CD[i] == 1){design$airbag[i] <- "Low"}
  if (design$CD[i] == 2){design$airbag[i] <- "Med"}
  if (design$CD[i] == 4){design$airbag[i] <- "High"}
  
  if (design$EF[i] == 1){design$drive[i] <- "Low"}
  if (design$EF[i] == 2){design$drive[i] <- "Med"}
  if (design$EF[i] == 4){design$drive[i] <- "High"}
  
  if (design$A[i] == 1){design$man.trans[i] <- "no"}
  if (design$A[i] == 2){design$man.trans[i] <- "yes"}
  
  if (design$B[i] == 1){design$origin[i] <- "no"}
  if (design$B[i] == 2){design$origin[i] <- "yes"}
  
}

fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
kable(fractionalfactorialdesign, align = 'c')

FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)


# ANOVA from ffd
fit2 <- aov(price~drive + man.trans + origin, data = cars2)
summary(fit2)

#as.integer settings for rsm
cars2$airbag = as.integer(cars2$airbag)
cars2$drive = as.integer(cars2$drive)
cars2$man.trans = as.integer(cars2$man.trans)
cars2$origin = as.integer(cars2$origin)

#Finding the one without alias
model=rsm(price~FO(airbag,drive, man.trans, origin), data=cars2)
summary(model)
aov(model)

#ANOVA again
fit = aov(price~airbag*drive*man.trans*origin, cars2)
summary(fit)

#Test other models 
model2=rsm(price~SO(airbag, drive), data=cars2)
summary(model2)

model3=rsm(price~FO(airbag*drive, man.trans), data=cars2)
summary(model3)

model4=rsm(price~FO(airbag*man.trans), data=cars2)
summary(model4)

model5=rsm(price~FO(airbag*origin), data=cars2)
summary(model5)

model6=rsm(price~FO(airbag^2, drive), data=cars2)
summary(model6)

model7=rsm(price~FO(airbag, drive^2), data=cars2)
summary(model7)

model8=rsm(price~FO(airbag^2, drive^2, origin^2), data=cars2)
summary(model8)

#Contour plots
par(mfrow=c(2,3))
contour(model,~ airbag+drive+man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))

par(mfrow=c(1,1))
contour(model,~ airbag+drive, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))

persp(model,~airbag+drive,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~airbag+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~airbag+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~drive+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~drive+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~man.trans+origin,image=TRUE,contour="colors",zlab="Price",theta=30)

#Estimation
confint(fit)
coef(fit)

#Check for Normality
shapiro.test(cars2$price)
shapiro.test(1/(cars2$price))
shapiro.test((cars2$price)^2)
shapiro.test(log(cars2$price))
shapiro.test(sqrt(cars2$price))

qqnorm(residuals(fit))
qqline(residuals(fit))

#Residual Plot
plot(fitted(fit),residuals(fit))

Car Prices

Benjamin Byeon

December 16, 2016

DOE Project4

Benjamin Byeon

Rensselaer Polytechnic Institute - Troy, NY

V1.12.16.16

1. Setting

System under test

Factors and Levels

Continuous Variables (if any)

Response Variable(s)

The Data: How is it organized and what does it look like?

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

What is the rationale for this design?

Randomize: What is the Randomization Scheme?

Replicate: Are there replicates and/or repeated measures?

Block: Did you use blocking in the design?

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

Testing

Response Surface Methods (rsm)

Result Discussion:Do either of the optimal designs (in natural units) make physical sense for the problem?

Estimation

Diagnostics / Model Adequacy Checking

4. References to the literature

5. Appendices