From the Ecdat Package, I selected a dataset that contained N=93 observations. The data looks into different factors that affect the pricing of cars. The study contains one response variable and two 2 level factors and two 3 level factors, which would be decomposed into 2 two level factors. Further analysis were made regarding the study.
The study is an extension and continuation of Project3. I looked at the Fractional Factorial Design, and now using that I apply Response Surface Methods.
The data was copied and pasted into Excel, and then saved as a .csv. Shown below is the loaded data into R.
#Load the package
library("FrF2")
## Warning: package 'FrF2' was built under R version 3.3.2
## Loading required package: DoE.base
## Warning: package 'DoE.base' was built under R version 3.3.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.3.2
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
library("rsm")
## Warning: package 'rsm' was built under R version 3.3.2
library("rgl")
## Warning: package 'rgl' was built under R version 3.3.2
#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars
## airbag drive man.trans origin price
## 1 None Front Yes non-USA 15.9
## 2 Driver & Passenger Front Yes non-USA 33.9
## 3 Driver only Front Yes non-USA 29.1
## 4 Driver & Passenger Front Yes non-USA 37.7
## 5 Driver only Rear Yes non-USA 30.0
## 6 Driver only Front No USA 15.7
## 7 Driver only Front No USA 20.8
## 8 Driver only Rear No USA 23.7
## 9 Driver only Front No USA 26.3
## 10 Driver only Front No USA 34.7
## 11 Driver & Passenger Front No USA 40.1
## 12 None Front Yes USA 13.4
## 13 Driver only Front Yes USA 11.4
## 14 Driver & Passenger Rear Yes USA 15.1
## 15 None Front No USA 15.9
## 16 None Front No USA 16.3
## 17 None 4WD No USA 16.6
## 18 Driver only Rear No USA 18.8
## 19 Driver only Rear Yes USA 38.0
## 20 Driver & Passenger Front No USA 18.4
## 21 Driver & Passenger Front No USA 15.8
## 22 Driver only Front No USA 29.5
## 23 None Front Yes USA 9.2
## 24 Driver only Front Yes USA 11.3
## 25 Driver only Front Yes USA 13.3
## 26 Driver only 4WD No USA 19.0
## 27 Driver only Front No USA 15.6
## 28 Driver only 4WD Yes USA 25.8
## 29 None Front Yes USA 12.2
## 30 Driver & Passenger Front No USA 19.3
## 31 None Front Yes USA 7.4
## 32 None Front Yes USA 10.1
## 33 None Front Yes USA 11.3
## 34 Driver only Rear Yes USA 15.9
## 35 Driver only Front Yes USA 14.0
## 36 Driver only 4WD Yes USA 19.9
## 37 Driver only Front No USA 20.2
## 38 Driver only Rear No USA 20.9
## 39 None Front Yes non-USA 8.4
## 40 Driver only Front Yes non-USA 12.5
## 41 Driver & Passenger Front Yes non-USA 19.8
## 42 Driver only Front Yes non-USA 12.1
## 43 Driver & Passenger Front Yes non-USA 17.5
## 44 None Front Yes non-USA 8.0
## 45 None Front Yes non-USA 10.0
## 46 None Front Yes non-USA 10.0
## 47 None Front Yes non-USA 13.9
## 48 Driver only Rear No non-USA 47.9
## 49 Driver only Front Yes non-USA 28.0
## 50 Driver & Passenger Rear Yes non-USA 35.2
## 51 Driver & Passenger Front No USA 34.3
## 52 Driver & Passenger Rear No USA 36.1
## 53 None Front Yes non-USA 8.3
## 54 None Front Yes non-USA 11.6
## 55 Driver only Front Yes non-USA 16.5
## 56 None 4WD No non-USA 19.1
## 57 Driver only Rear Yes non-USA 32.5
## 58 Driver only Rear Yes non-USA 31.9
## 59 Driver & Passenger Rear No non-USA 61.9
## 60 Driver only Front Yes USA 14.1
## 61 None Rear No USA 14.9
## 62 None Front Yes non-USA 10.3
## 63 Driver only Front No non-USA 26.1
## 64 Driver only Front Yes non-USA 11.8
## 65 Driver only Front Yes non-USA 15.7
## 66 None Front No non-USA 19.1
## 67 Driver only Front No non-USA 21.5
## 68 None Front No USA 13.5
## 69 Driver only Front No USA 16.3
## 70 None Front No USA 19.5
## 71 Driver only Front No USA 20.7
## 72 None 4WD Yes USA 14.4
## 73 None Front Yes USA 9.0
## 74 None Front Yes USA 11.1
## 75 Driver & Passenger Rear Yes USA 17.7
## 76 None Front Yes USA 18.5
## 77 Driver & Passenger Front No USA 24.4
## 78 Driver only Front Yes non-USA 28.7
## 79 Driver only Front Yes USA 11.1
## 80 None 4WD Yes non-USA 8.4
## 81 None 4WD Yes non-USA 10.9
## 82 Driver only 4WD Yes non-USA 19.5
## 83 None Front Yes non-USA 8.6
## 84 Driver only Front Yes non-USA 9.8
## 85 Driver only Front Yes non-USA 18.4
## 86 Driver only Front Yes non-USA 18.2
## 87 Driver only 4WD Yes non-USA 22.7
## 88 None Front Yes non-USA 9.1
## 89 None Front Yes non-USA 19.7
## 90 None Front Yes non-USA 20.0
## 91 None Front Yes non-USA 23.3
## 92 Driver only Rear Yes non-USA 22.7
## 93 Driver & Passenger Front Yes non-USA 26.7
The dataset helps determine the what effects a price of a car in 1993, with factors the presence of airbags, driving train type, avaiability of manual transmission, and the origin of the machine parts. The factors were chosen in order to match the Fractional Factorial Design criteria. Within the choices, Best Guess was used to select specific factors due to my personal interests and experience with knowledge of cars.
Below are the definitions of the variables:
price: minimum price of the cars
airbag: air bags: none, driver only, or driver and passenger
drive: rear wheel, front wheel, or 4WD
man.trans: manual transmission: yes or no
origin: product origin made in the USA: yes or no
#Display the descriptive summaery
summary(cars)
## airbag drive man.trans origin price
## Driver & Passenger:16 4WD :10 No :32 non-USA:45 Min. : 7.40
## Driver only :43 Front:67 Yes:61 USA :48 1st Qu.:12.20
## None :34 Rear :16 Median :17.70
## Mean :19.51
## 3rd Qu.:23.30
## Max. :61.90
head(cars)
## airbag drive man.trans origin price
## 1 None Front Yes non-USA 15.9
## 2 Driver & Passenger Front Yes non-USA 33.9
## 3 Driver only Front Yes non-USA 29.1
## 4 Driver & Passenger Front Yes non-USA 37.7
## 5 Driver only Rear Yes non-USA 30.0
## 6 Driver only Front No USA 15.7
tail(cars)
## airbag drive man.trans origin price
## 88 None Front Yes non-USA 9.1
## 89 None Front Yes non-USA 19.7
## 90 None Front Yes non-USA 20.0
## 91 None Front Yes non-USA 23.3
## 92 Driver only Rear Yes non-USA 22.7
## 93 Driver & Passenger Front Yes non-USA 26.7
names(cars)
## [1] "airbag" "drive" "man.trans" "origin" "price"
#Display the structure
names(cars)
## [1] "airbag" "drive" "man.trans" "origin" "price"
str(cars)
## 'data.frame': 93 obs. of 5 variables:
## $ airbag : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
## $ drive : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
## $ man.trans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
## $ origin : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
The data contains 5 factors, two 2 level factors and two 3 level factors. The as.factors was not necessary because they were categorical.
#Check the levels
levels(cars$price)
## NULL
levels(cars$airbag)
## [1] "Driver & Passenger" "Driver only" "None"
levels(cars$drive)
## [1] "4WD" "Front" "Rear"
levels(cars$man.trans)
## [1] "No" "Yes"
levels(cars$origin)
## [1] "non-USA" "USA"
The two 2 level factors:
man.trans: Does the car have manual transmission yes or no?
origin: Are product parts of the car made from origin in the USA yes or no?
The two 3 level factors:
airbag: Do the cars have air bags none, driver only, or driver and passenger?
drive: Is the car rear wheel, front wheel, or 4WD?
The data had one continuous variable that was also the dependent variable. It was the price. The price was in USD. Note, the may not reflect the USD worth today because the data was collected in 1993.
The main objective of the experiment was to determine the effects of car’s price Therefore, the price variable also represents the response variable as well.
The data consists of N=93 obversations with 6 variables. The data was collected in the year 1993 and the country the data came from was the USA.
In order to satisfy the factorial design with levels of high and low, I needed to make changes to the data. I replaced the factors with binary integer terms of 0,1,and 2.
#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)
#Automatically replace the categories as 0,1,2
for (i in 1:a){
#For airbags
if (cars$airbag[i] =="Driver & Passenger"){
airbag[i,1] = 0
}
if (cars$airbag[i] == "Driver only"){
airbag[i,1] = 1
}
if (cars$airbag[i] == "None"){
airbag[i,1] = 2
}
#For drive
if (cars$drive[i] =="4WD"){
drive[i,1] = 0
}
if (cars$drive[i] == "Front"){
drive[i,1] = 1
}
if (cars$drive[i] == "Rear"){
drive[i,1] = 2
}
#For man.trans
if (cars$man.trans[i] == "No"){
man.trans[i,1] = 0
} else{
man.trans[i,1] = 1
}
#For origin
if (cars$origin[i] == "non-USA"){
origin[i,1] = 0
} else{
origin[i,1] = 1
}
}
Here is the new breakdown of the levels:
airbag: driver and passenger = 0, driver only = 1, none = 2
drive: 4WD = 0, front = 1, rear = 2
man.trans: no = 0, yes = 1
origin: no = 0, yes = 1
This was done to be suitable to the FFD and rsm designs.
Here is the view of the new organization:
#Generate the table from the function used above
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")
head(cars2,10)
## airbag drive man.trans origin price
## 1 2 1 1 0 15.9
## 2 0 1 1 0 33.9
## 3 1 1 1 0 29.1
## 4 0 1 1 0 37.7
## 5 1 2 1 0 30.0
## 6 1 1 0 1 15.7
## 7 1 1 0 1 20.8
## 8 1 2 0 1 23.7
## 9 1 1 0 1 26.3
## 10 1 1 0 1 34.7
tail(cars2, 20)
## airbag drive man.trans origin price
## 74 2 1 1 1 11.1
## 75 0 2 1 1 17.7
## 76 2 1 1 1 18.5
## 77 0 1 0 1 24.4
## 78 1 1 1 0 28.7
## 79 1 1 1 1 11.1
## 80 2 0 1 0 8.4
## 81 2 0 1 0 10.9
## 82 1 0 1 0 19.5
## 83 2 1 1 0 8.6
## 84 1 1 1 0 9.8
## 85 1 1 1 0 18.4
## 86 1 1 1 0 18.2
## 87 1 0 1 0 22.7
## 88 2 1 1 0 9.1
## 89 2 1 1 0 19.7
## 90 2 1 1 0 20.0
## 91 2 1 1 0 23.3
## 92 1 2 1 0 22.7
## 93 0 1 1 0 26.7
str(cars2)
## 'data.frame': 93 obs. of 5 variables:
## $ airbag : num 2 0 1 0 1 1 1 1 1 1 ...
## $ drive : num 1 1 1 1 2 1 1 2 1 1 ...
## $ man.trans: num 1 1 1 1 1 0 0 0 0 0 ...
## $ origin : num 0 0 0 0 0 1 1 1 1 1 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
The experimental design for the fractional factorial design had many procedurals steps that needed to taken.
A fractional factorial design helps reduce the calculations needed.
The first step was to make sure the I appropriately construct the correct 2^k design by representing each 3-level factor in terms of 2-level factors. That is what I precisely did by : Here is the decomposition of the 3-level factors:
#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
knitr::kable(X1,align='c')
| C | D | X1 |
|---|---|---|
| -1 | -1 | Low |
| 1 | -1 | Med |
| -1 | 1 | Med |
| 1 | 1 | High |
E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(E,F,X2))
knitr::kable(X2, align ='c')
| E | F | X2 |
|---|---|---|
| -1 | -1 | Low |
| 1 | -1 | Med |
| -1 | 1 | Med |
| 1 | 1 | High |
For simiplicity, I have included only the key steps in generating the FFD model. The project builds on the previous one.
FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t2)
for (i in 1:n){
if (design$CD[i] == 1){design$airbag[i] <- "Low"}
if (design$CD[i] == 2){design$airbag[i] <- "Med"}
if (design$CD[i] == 4){design$airbag[i] <- "High"}
if (design$EF[i] == 1){design$drive[i] <- "Low"}
if (design$EF[i] == 2){design$drive[i] <- "Med"}
if (design$EF[i] == 4){design$drive[i] <- "High"}
if (design$A[i] == 1){design$man.trans[i] <- "no"}
if (design$A[i] == 2){design$man.trans[i] <- "yes"}
if (design$B[i] == 1){design$origin[i] <- "no"}
if (design$B[i] == 2){design$origin[i] <- "yes"}
}
fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
## A B C D E F design$airbag design$drive
## -1:4 -1:4 -1:4 -1:4 -1:4 -1:4 High:2 High:2
## 1 :4 1 :4 1 :4 1 :4 1 :4 1 :4 Low :2 Low :2
## Med :4 Med :4
## design$man.trans design$origin
## no :4 no :4
## yes:4 yes:4
##
knitr::kable(fractionalfactorialdesign, align = 'c')
| A | B | C | D | E | F | design$airbag | design$drive | design$man.trans | design$origin |
|---|---|---|---|---|---|---|---|---|---|
| 1 | -1 | -1 | -1 | -1 | 1 | Low | Med | yes | no |
| -1 | 1 | -1 | -1 | 1 | -1 | Low | Med | no | yes |
| 1 | 1 | 1 | 1 | 1 | 1 | High | High | yes | yes |
| -1 | -1 | 1 | 1 | -1 | -1 | High | Low | no | no |
| 1 | -1 | 1 | -1 | 1 | -1 | Med | Med | yes | no |
| -1 | -1 | -1 | 1 | 1 | 1 | Med | High | no | no |
| -1 | 1 | 1 | -1 | -1 | 1 | Med | Med | no | yes |
| 1 | 1 | -1 | 1 | -1 | -1 | Med | Low | yes | yes |
FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)
## $legend
## [1] A=A B=B C=C D=D E=E F=F
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
Once the FFD was created, I looked at the aliased structure, me (see below), and the ANOVA results to generate a model (equation).
me
The FFD design had Resolution III, which meant the me and 2fi are confounded with the 2fi. The me of Factor A are aliased with the 2fi’s.
The equation that will be used in the rsm and formulated previously is shown again below for reference:
# ANOVA from ffd
fit2 <- aov(price~drive + man.trans + origin, data = cars2)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## drive 1 1126 1126.2 16.470 0.000106 ***
## man.trans 1 759 758.7 11.096 0.001261 **
## origin 1 613 613.1 8.966 0.003561 **
## Residuals 89 6086 68.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model
To ensure that all factors are integers for the rsm, I made sure to do as.integers:
cars2$airbag = as.integer(cars2$airbag)
cars2$drive = as.integer(cars2$drive)
cars2$man.trans = as.integer(cars2$man.trans)
cars2$origin = as.integer(cars2$origin)
The rationale for this design was limited as I adapted a pre-existing experiment design dataset. The original factors and procedures to collect the data were done by the experimentor. Therefore, I had less flexibility and control over some aspects.
However, at least for the fractional factorial design, I used it to reduce the number of measurements and used alias to determine the confounding variables and resolution. The aliasing would help check for accuracy. Then, I will validate this by using the Response Surface Methods.
First of all, I assume the data collected was random because the cars were chosen randomly and at a random time frame of the year 1993, including day and time.
Also, to ensure further randomization, I randomized the order of the data:
#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata
## airbag drive man.trans origin price
## 46 None Front Yes non-USA 10.0
## 43 Driver & Passenger Front Yes non-USA 17.5
## 42 Driver only Front Yes non-USA 12.1
## 37 Driver only Front No USA 20.2
## 39 None Front Yes non-USA 8.4
## 89 None Front Yes non-USA 19.7
## 15 None Front No USA 15.9
## 32 None Front Yes USA 10.1
## 68 None Front No USA 13.5
## 85 Driver only Front Yes non-USA 18.4
## 29 None Front Yes USA 12.2
## 57 Driver only Rear Yes non-USA 32.5
## 91 None Front Yes non-USA 23.3
## 34 Driver only Rear Yes USA 15.9
## 67 Driver only Front No non-USA 21.5
## 7 Driver only Front No USA 20.8
## 20 Driver & Passenger Front No USA 18.4
## 74 None Front Yes USA 11.1
## 11 Driver & Passenger Front No USA 40.1
## 28 Driver only 4WD Yes USA 25.8
## 63 Driver only Front No non-USA 26.1
## 40 Driver only Front Yes non-USA 12.5
## 14 Driver & Passenger Rear Yes USA 15.1
## 55 Driver only Front Yes non-USA 16.5
## 86 Driver only Front Yes non-USA 18.2
## 51 Driver & Passenger Front No USA 34.3
## 73 None Front Yes USA 9.0
## 21 Driver & Passenger Front No USA 15.8
## 10 Driver only Front No USA 34.7
## 64 Driver only Front Yes non-USA 11.8
## 2 Driver & Passenger Front Yes non-USA 33.9
## 52 Driver & Passenger Rear No USA 36.1
## 17 None 4WD No USA 16.6
## 80 None 4WD Yes non-USA 8.4
## 27 Driver only Front No USA 15.6
## 70 None Front No USA 19.5
## 23 None Front Yes USA 9.2
## 53 None Front Yes non-USA 8.3
## 56 None 4WD No non-USA 19.1
## 44 None Front Yes non-USA 8.0
## 90 None Front Yes non-USA 20.0
## 79 Driver only Front Yes USA 11.1
## 49 Driver only Front Yes non-USA 28.0
## 16 None Front No USA 16.3
## 3 Driver only Front Yes non-USA 29.1
## 61 None Rear No USA 14.9
## 81 None 4WD Yes non-USA 10.9
## 72 None 4WD Yes USA 14.4
## 78 Driver only Front Yes non-USA 28.7
## 13 Driver only Front Yes USA 11.4
## 36 Driver only 4WD Yes USA 19.9
## 69 Driver only Front No USA 16.3
## 54 None Front Yes non-USA 11.6
## 65 Driver only Front Yes non-USA 15.7
## 45 None Front Yes non-USA 10.0
## 9 Driver only Front No USA 26.3
## 83 None Front Yes non-USA 8.6
## 84 Driver only Front Yes non-USA 9.8
## 77 Driver & Passenger Front No USA 24.4
## 71 Driver only Front No USA 20.7
## 75 Driver & Passenger Rear Yes USA 17.7
## 76 None Front Yes USA 18.5
## 30 Driver & Passenger Front No USA 19.3
## 35 Driver only Front Yes USA 14.0
## 48 Driver only Rear No non-USA 47.9
## 47 None Front Yes non-USA 13.9
## 58 Driver only Rear Yes non-USA 31.9
## 33 None Front Yes USA 11.3
## 92 Driver only Rear Yes non-USA 22.7
## 24 Driver only Front Yes USA 11.3
## 87 Driver only 4WD Yes non-USA 22.7
## 88 None Front Yes non-USA 9.1
## 93 Driver & Passenger Front Yes non-USA 26.7
## 4 Driver & Passenger Front Yes non-USA 37.7
## 82 Driver only 4WD Yes non-USA 19.5
## 5 Driver only Rear Yes non-USA 30.0
## 26 Driver only 4WD No USA 19.0
## 22 Driver only Front No USA 29.5
## 25 Driver only Front Yes USA 13.3
## 41 Driver & Passenger Front Yes non-USA 19.8
## 59 Driver & Passenger Rear No non-USA 61.9
## 6 Driver only Front No USA 15.7
## 12 None Front Yes USA 13.4
## 31 None Front Yes USA 7.4
## 8 Driver only Rear No USA 23.7
## 60 Driver only Front Yes USA 14.1
## 62 None Front Yes non-USA 10.3
## 66 None Front No non-USA 19.1
## 19 Driver only Rear Yes USA 38.0
## 50 Driver & Passenger Rear Yes non-USA 35.2
## 18 Driver only Rear No USA 18.8
## 1 None Front Yes non-USA 15.9
## 38 Driver only Rear No USA 20.9
In addition, further randomization had occurred whie matching and subsetting (selection and assignment). In essence, I randomized many times. First, the data itself. Then, I randomized the order in which the experiment would run. Finally, the last randomization component was the random subset groups I would place them after Matching.
Therefore, there were 3 levels of randomization: measurement, assignment, and performing.
Due to the simplification of the dataset, which was during 1993, I cannot verify if there were any replicates and/or repeated measures. At least for the project, I have not implemented such.
There is no indication that blocking was used in the original experiment. Personally, I did not use blocking in my design as well.
Below is the descriptive summary of the data.
summary(cars2)
## airbag drive man.trans origin
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :1.000 Median :1.000 Median :1.0000 Median :1.0000
## Mean :1.194 Mean :1.065 Mean :0.6559 Mean :0.5161
## 3rd Qu.:2.000 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :2.000 Max. :2.000 Max. :1.0000 Max. :1.0000
## price
## Min. : 7.40
## 1st Qu.:12.20
## Median :17.70
## Mean :19.51
## 3rd Qu.:23.30
## Max. :61.90
head(cars2,10)
## airbag drive man.trans origin price
## 1 2 1 1 0 15.9
## 2 0 1 1 0 33.9
## 3 1 1 1 0 29.1
## 4 0 1 1 0 37.7
## 5 1 2 1 0 30.0
## 6 1 1 0 1 15.7
## 7 1 1 0 1 20.8
## 8 1 2 0 1 23.7
## 9 1 1 0 1 26.3
## 10 1 1 0 1 34.7
Here are some graphical representations:
#Histogram
hist(cars$price, main = "Histogram of Car Prices")
Boxplots for each factors:
#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")
boxplot(cars$price~cars$drive, main ="Price vs Drive")
boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")
boxplot(cars$price~cars$origin, main ="Price vs Origin")
# Boxplot of overall factors with labeled axes
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")
#Pairs plot
pairs(price~airbag, cars2)
pairs(price~drive, cars2)
pairs(price~man.trans, cars2)
pairs(price~origin, cars2)
pairs(cars2)
In this experiment, the null hypothesis states that there is no difference between the price of the car because of the airbags, drive, man.trans, and origin.
On the contrary, the alternative hypothesis states that there is a difference between the price of the car because of the airbags, drive, man.trans, and origin.
I kept the results from Project3 using FFD and built on it using rsm.
Using the model from Project3:
Initial ANOVA on all:
fit = aov(price~airbag*drive*man.trans*origin, cars2)
summary(fit)
## Df Sum Sq Mean Sq F value Pr(>F)
## airbag 1 2742.6 2742.6 67.570 3.83e-12 ***
## drive 1 360.4 360.4 8.879 0.003858 **
## man.trans 1 390.9 390.9 9.630 0.002677 **
## origin 1 576.9 576.9 14.214 0.000318 ***
## airbag:drive 1 99.5 99.5 2.452 0.121482
## airbag:man.trans 1 10.5 10.5 0.259 0.612404
## drive:man.trans 1 58.8 58.8 1.448 0.232611
## airbag:origin 1 522.8 522.8 12.881 0.000582 ***
## drive:origin 1 307.8 307.8 7.583 0.007346 **
## man.trans:origin 1 91.3 91.3 2.249 0.137794
## airbag:drive:man.trans 1 100.6 100.6 2.478 0.119569
## airbag:drive:origin 1 71.9 71.9 1.772 0.187062
## airbag:man.trans:origin 1 9.2 9.2 0.227 0.634936
## drive:man.trans:origin 1 90.2 90.2 2.222 0.140120
## airbag:drive:man.trans:origin 1 25.4 25.4 0.625 0.431608
## Residuals 77 3125.3 40.6
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
As you can see, there was significance for all the factors. This made sense because the price of cars, especially back then, would matter a lot depending if the car was manual transmission or not or had 4WD. Even today, 4WD drive cars tend to be more expensive than a front or rear drive car. Also, origin is another factor that makes sense because ‘Made in the USA’ is definitely more expensive than a foreign part due to many factors such as labor costs and production/manufacturing costs. Therefore, ‘Made in the USA’ parts would increase the car’s price. Airbags costs should also be included in the car. Making airbags are not free. There is a cost to it, therefore increasing the price of the car. Using good judgement and the meie table, I knew not to consider the 2fi because they were highly aliased. Then, I wanted to differentiate the “more significant ones”. To do so, the best logical way was to refer to the meie table again. Airbags had a relatively small me of -2. Drive, man.trans, and origin had a bigger me of -11.3 and -11.65. Therefore, I decided to include drive, man.trans, and origin into the linear model. Intuitively and using prior knowledge this made sense. Drive is based on the car’s engine and bottom layout design, requiring more time, hence more costs. So, the results made sense.
Therefore, I generated another ANOVA using those factors.
fit2 = aov(price~man.trans +drive + origin, data = cars2)
summary(fit2)
## Df Sum Sq Mean Sq F value Pr(>F)
## man.trans 1 915 915.1 13.382 0.000430 ***
## drive 1 970 969.9 14.184 0.000297 ***
## origin 1 613 613.1 8.966 0.003561 **
## Residuals 89 6086 68.4
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The results from the ANOVA, my Best Guess experiment, and the me all matched together and were consistent. The ANOVA factors that were significant had a high me. This also matched with my prior background knowledge of cars in general.
The linear model is as follows:
fit.lm = lm(price~ drive+man.trans + origin, data = cars2)
summary(fit.lm)
##
## Call:
## lm.default(formula = price ~ drive + man.trans + origin, data = cars2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13.283 -5.997 -2.005 4.731 28.028
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.509 2.813 7.645 2.31e-11 ***
## drive 6.182 1.640 3.770 0.000293 ***
## man.trans -8.604 2.004 -4.292 4.49e-05 ***
## origin -5.690 1.900 -2.994 0.003561 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.269 on 89 degrees of freedom
## Multiple R-squared: 0.291, Adjusted R-squared: 0.2671
## F-statistic: 12.18 on 3 and 89 DF, p-value: 9.477e-07
This is the new ANOVA fit using the me and aliased information (the same from Project3). Now, I expanded the testing with this model with rsm.
I first tested using the 3 factors that I found from the model in Project3. I tried to the find the model that did not give me an “ALIAS” error. I only included the models that gave me some reasonable outputs and no “ALAIS” error. The 2fi’s were not included due to the Resolution III design given from the FFD design.
#Aliased
model=rsm(price~SO(airbag,drive, man.trans, origin), data=cars2)
## Warning in rsm(price ~ SO(airbag, drive, man.trans, origin), data = cars2): Some coefficients are aliased - cannot use 'rsm' methods.
## Returning an 'lm' object.
summary(model)
##
## Call:
## rsm(formula = price ~ SO(airbag, drive, man.trans, origin), data = cars2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -10.4869 -3.7063 -0.8791 2.3283 19.9749
##
## Coefficients: (2 not defined because of singularities)
## Estimate Std. Error
## (Intercept) 39.99750 9.09023
## FO(airbag, drive, man.trans, origin)airbag -11.83856 5.72803
## FO(airbag, drive, man.trans, origin)drive -2.78542 7.62714
## FO(airbag, drive, man.trans, origin)man.trans -14.48849 6.45127
## FO(airbag, drive, man.trans, origin)origin -9.32015 6.15339
## TWI(airbag, drive, man.trans, origin)airbag:drive -0.03507 2.70484
## TWI(airbag, drive, man.trans, origin)airbag:man.trans 4.55544 2.71398
## TWI(airbag, drive, man.trans, origin)airbag:origin 5.55241 2.51099
## TWI(airbag, drive, man.trans, origin)drive:man.trans -1.61661 3.12726
## TWI(airbag, drive, man.trans, origin)drive:origin -6.81350 2.95664
## TWI(airbag, drive, man.trans, origin)man.trans:origin 5.15235 3.47151
## PQ(airbag, drive, man.trans, origin)airbag^2 0.08181 1.53921
## PQ(airbag, drive, man.trans, origin)drive^2 5.20849 1.89470
## PQ(airbag, drive, man.trans, origin)man.trans^2 NA NA
## PQ(airbag, drive, man.trans, origin)origin^2 NA NA
## t value Pr(>|t|)
## (Intercept) 4.400 3.31e-05 ***
## FO(airbag, drive, man.trans, origin)airbag -2.067 0.04199 *
## FO(airbag, drive, man.trans, origin)drive -0.365 0.71593
## FO(airbag, drive, man.trans, origin)man.trans -2.246 0.02747 *
## FO(airbag, drive, man.trans, origin)origin -1.515 0.13381
## TWI(airbag, drive, man.trans, origin)airbag:drive -0.013 0.98969
## TWI(airbag, drive, man.trans, origin)airbag:man.trans 1.679 0.09715 .
## TWI(airbag, drive, man.trans, origin)airbag:origin 2.211 0.02987 *
## TWI(airbag, drive, man.trans, origin)drive:man.trans -0.517 0.60662
## TWI(airbag, drive, man.trans, origin)drive:origin -2.304 0.02379 *
## TWI(airbag, drive, man.trans, origin)man.trans:origin 1.484 0.14169
## PQ(airbag, drive, man.trans, origin)airbag^2 0.053 0.95774
## PQ(airbag, drive, man.trans, origin)drive^2 2.749 0.00739 **
## PQ(airbag, drive, man.trans, origin)man.trans^2 NA NA
## PQ(airbag, drive, man.trans, origin)origin^2 NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 6.233 on 80 degrees of freedom
## Multiple R-squared: 0.6379, Adjusted R-squared: 0.5836
## F-statistic: 11.74 on 12 and 80 DF, p-value: 3.121e-13
aov(model)
## Call:
## aov.default(formula = model)
##
## Terms:
## FO(airbag, drive, man.trans, origin)
## Sum of Squares 4070.732
## Deg. of Freedom 4
## TWI(airbag, drive, man.trans, origin)
## Sum of Squares 1090.676
## Deg. of Freedom 6
## PQ(airbag, drive, man.trans, origin) Residuals
## Sum of Squares 314.386 3108.228
## Deg. of Freedom 2 80
##
## Residual standard error: 6.233205
## 2 out of 15 effects not estimable
## Estimated effects may be unbalanced
The above model gave an alias error, indicating that I try a different order. So, I tried a first order model.
#Non-Aliased
model=rsm(price~FO(airbag,drive, man.trans, origin), data=cars2)
summary(model)
##
## Call:
## rsm(formula = price ~ FO(airbag, drive, man.trans, origin), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.1856 2.8969 10.4200 < 2.2e-16 ***
## airbag -6.1889 1.1176 -5.5374 3.139e-07 ***
## drive 3.8479 1.4812 2.5977 0.0109999 *
## man.trans -6.9160 1.7624 -3.9242 0.0001723 ***
## origin -5.5199 1.6458 -3.3539 0.0011769 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.4742, Adjusted R-squared: 0.4503
## F-statistic: 19.84 on 4 and 88 DF, p-value: 1.135e-11
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag, drive, man.trans, origin) 4 4070.7 1017.68 19.8428 1.135e-11
## Residuals 88 4513.3 51.29
## Lack of fit 21 2057.5 97.98 2.6731 0.001232
## Pure error 67 2455.7 36.65
##
## Direction of steepest ascent (at radius 1):
## airbag drive man.trans origin
## -0.5398817 0.3356648 -0.6033139 -0.4815281
##
## Corresponding increment in original units:
## airbag drive man.trans origin
## -0.5398817 0.3356648 -0.6033139 -0.4815281
aov(model)
## Call:
## aov.default(formula = model)
##
## Terms:
## FO(airbag, drive, man.trans, origin) Residuals
## Sum of Squares 4070.732 4513.289
## Deg. of Freedom 4 88
##
## Residual standard error: 7.161521
## Estimated effects may be unbalanced
NOTE: Result dicussion to follow below
Now, I tested a second model, changing the TWI and PQ terms of the first model. I tested different combinations and tried 8 models in total.
#Test other models
model3=rsm(price~FO(airbag*drive, man.trans), data=cars2)
summary(model3)
##
## Call:
## rsm(formula = price ~ FO(airbag * drive, man.trans), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.8679 1.9109 14.0606 < 2.2e-16 ***
## c("*", "airbag", "drive") -2.9355 1.0610 -2.7666 0.006874 **
## man.trans -6.0209 1.9457 -3.0945 0.002627 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.1766, Adjusted R-squared: 0.1583
## F-statistic: 9.653 on 2 and 90 DF, p-value: 0.0001591
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag * drive, man.trans) 2 1516.2 758.10 9.6535 0.0001591
## Residuals 90 7067.8 78.53
## Lack of fit 4 102.9 25.73 0.3177 0.8654438
## Pure error 86 6964.9 80.99
##
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "drive") man.trans
## -0.4382394 -0.8988583
##
## Corresponding increment in original units:
## c("*", "airbag", "drive") man.trans
## -0.4382394 -0.8988583
model5=rsm(price~FO(airbag*origin), data=cars2)
summary(model5)
##
## Call:
## rsm(formula = price ~ FO(airbag * origin), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.5663 1.2230 17.6345 < 2.2e-16 ***
## c("*", "airbag", "origin") -3.4775 1.2644 -2.7503 0.007183 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.07674, Adjusted R-squared: 0.0666
## F-statistic: 7.564 on 1 and 91 DF, p-value: 0.007183
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag * origin) 1 658.8 658.76 7.5640 0.007183
## Residuals 91 7925.3 87.09
## Lack of fit 1 106.9 106.92 1.2308 0.270208
## Pure error 90 7818.3 86.87
##
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "origin")
## -1
##
## Corresponding increment in original units:
## c("*", "airbag", "origin")
## -1
model6=rsm(price~FO(airbag^2, drive), data=cars2)
summary(model6)
##
## Call:
## rsm(formula = price ~ FO(airbag^2, drive), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 20.65901 2.39543 8.6243 2.080e-13 ***
## c("^", "airbag", "2") -2.86813 0.52992 -5.4124 5.107e-07 ***
## drive 4.10614 1.63047 2.5184 0.01356 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.3445, Adjusted R-squared: 0.33
## F-statistic: 23.65 on 2 and 90 DF, p-value: 5.552e-09
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag^2, drive) 2 2957.6 1478.79 23.6545 5.552e-09
## Residuals 90 5626.4 62.52
## Lack of fit 5 636.0 127.19 2.1664 0.06534
## Pure error 85 4990.5 58.71
##
## Direction of steepest ascent (at radius 1):
## c("^", "airbag", "2") drive
## -0.5726362 0.8198096
##
## Corresponding increment in original units:
## c("^", "airbag", "2") drive
## -0.5726362 0.8198096
model7=rsm(price~FO(airbag, drive^2), data=cars2)
summary(model7)
##
## Call:
## rsm(formula = price ~ FO(airbag, drive^2), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.06576 2.07065 11.6223 < 2.2e-16 ***
## airbag -6.49191 1.16917 -5.5526 2.82e-07 ***
## c("^", "drive", "2") 2.26631 0.67764 3.3444 0.001203 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.3947, Adjusted R-squared: 0.3813
## F-statistic: 29.35 on 2 and 90 DF, p-value: 1.542e-10
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag, drive^2) 2 3388.3 1694.14 29.3458 1.542e-10
## Residuals 90 5195.7 57.73
## Lack of fit 5 205.3 41.05 0.6992 0.6255
## Pure error 85 4990.5 58.71
##
## Direction of steepest ascent (at radius 1):
## airbag c("^", "drive", "2")
## -0.9441238 0.3295910
##
## Corresponding increment in original units:
## airbag c("^", "drive", "2")
## -0.9441238 0.3295910
All 4 models have a p-value less than 0.05 of 0.0001591, 0.0001591, 0.007183, and 5.552e-09 in the order it was executed. This suggests that the TWI and PQ does significantly contribute to the model.
Deeper discussions would follow for the latter cases. From the second models, I wanted to focus on model2, model4, and model8. I excluded these 3 models above to make it easier to look at.
Model2
model2=rsm(price~SO(airbag, drive), data=cars2)
summary(model2)
##
## Call:
## rsm(formula = price ~ SO(airbag, drive), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 27.699582 6.419830 4.3147 4.209e-05 ***
## airbag -6.951925 6.271269 -1.1085 0.27069
## drive -6.810102 7.371851 -0.9238 0.35815
## airbag:drive 0.082981 3.116621 0.0266 0.97882
## airbag^2 0.143322 1.828783 0.0784 0.93771
## drive^2 4.958672 2.198004 2.2560 0.02658 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.4117, Adjusted R-squared: 0.3779
## F-statistic: 12.18 on 5 and 87 DF, p-value: 5.91e-09
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag, drive) 2 3103.0 1551.48 26.7301 8.923e-10
## TWI(airbag, drive) 1 107.6 107.60 1.8539 0.17685
## PQ(airbag, drive) 2 323.8 161.90 2.7893 0.06697
## Residuals 87 5049.7 58.04
## Lack of fit 2 59.2 29.60 0.5042 0.60581
## Pure error 85 4990.5 58.71
##
## Stationary point of response surface:
## airbag drive
## 24.1124672 0.4849304
##
## Eigenanalysis:
## $values
## [1] 4.9590290 0.1429643
##
## $vectors
## [,1] [,2]
## airbag 0.008615357 -0.999962887
## drive 0.999962887 0.008615357
Model4
model4=rsm(price~FO(airbag*man.trans), data=cars2)
summary(model4)
##
## Call:
## rsm(formula = price ~ FO(airbag * man.trans), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.5667 1.2374 19.8542 < 2.2e-16 ***
## c("*", "airbag", "man.trans") -5.9532 1.0426 -5.7102 1.403e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.2638, Adjusted R-squared: 0.2557
## F-statistic: 32.61 on 1 and 91 DF, p-value: 1.403e-07
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag * man.trans) 1 2264.4 2264.39 32.6062 1.403e-07
## Residuals 91 6319.6 69.45
## Lack of fit 1 55.3 55.31 0.7947 0.3751
## Pure error 90 6264.3 69.60
##
## Direction of steepest ascent (at radius 1):
## c("*", "airbag", "man.trans")
## -1
##
## Corresponding increment in original units:
## c("*", "airbag", "man.trans")
## -1
Model8:
model8=rsm(price~FO(airbag^2, drive^2, origin^2), data=cars2)
summary(model8)
##
## Call:
## rsm(formula = price ~ FO(airbag^2, drive^2, origin^2), data = cars2)
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.01509 1.96063 11.7386 < 2.2e-16 ***
## c("^", "airbag", "2") -2.77252 0.51526 -5.3809 5.944e-07 ***
## c("^", "drive", "2") 2.33267 0.68067 3.4270 0.0009257 ***
## c("^", "origin", "2") -2.81879 1.58617 -1.7771 0.0789679 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Multiple R-squared: 0.3975, Adjusted R-squared: 0.3772
## F-statistic: 19.58 on 3 and 89 DF, p-value: 7.834e-10
##
## Analysis of Variance Table
##
## Response: price
## Df Sum Sq Mean Sq F value Pr(>F)
## FO(airbag^2, drive^2, origin^2) 3 3412.4 1137.48 19.5754 7.834e-10
## Residuals 89 5171.6 58.11
## Lack of fit 11 1215.7 110.52 2.1792 0.02387
## Pure error 78 3955.8 50.72
##
## Direction of steepest ascent (at radius 1):
## c("^", "airbag", "2") c("^", "drive", "2") c("^", "origin", "2")
## -0.6039533 0.5081385 -0.6140323
##
## Corresponding increment in original units:
## c("^", "airbag", "2") c("^", "drive", "2") c("^", "origin", "2")
## -0.6039533 0.5081385 -0.6140323
NOTE: Result dicussion to follow below
First of all, the results. For the first model, that is testing the FO of the model from Project3, the p-value was 1.135e-11, which was significantly less than 0.05. This indicated the FO model contributed significantly to the model. The same applied for Models2 and 8 because the p-values were significantly less than 0.05.
For all 4 cases considered so far, except Model2, I was able to reject the null hypothesis because the factors of airbag, drive, man.trans, and origin had a significant effect on the price of car. Note, the reason why airbag was added to this model was because if I used the model developed in Project3 strictly, then I got an alias error. Therefore, I had to make small modifications by adding ‘airbag’. This was justified because initially in Project3, airbag was still highly significant, but I excluded as it was “comparetively” less significant to the other 3 factors. Technically, I could have added “airbag” in Project3’s model. Thus, I would proceed with the modification. The results were consistent with the FFD design.
In terms of Model2, there was more to talk about. Model2 was a second order model. Examining the p-values of the factors, only drive^2 was significant. The other factors were actually near 0.5 or near 1. This was inconsistent to the previous models. However, when looking at the p-value of the model itself, the SO model was insignificant. The stationary points of the response surface were airbag: 24.1124672 and drive: 0.4849304.
Let’s look at the Canonical Analysis. Looking at the Eignevalues, point eigenvalues were positive, “4.9590290 0.1429643”. Since both eigenvalues were positive, the stationary point is of maximum response. The maximum point can be inferred to be at airbag.
In terms of analzing whether the optimal designs (in natural units) make physical sense for the problem, I was able to take an attempt by comparing the dimensionless quantities I specified, which would be translated to physical units, to a Central Composite Design (CCD). A Central Composite Design has points at +-1,0, and +-1.414. Mapping these quantities to the Central Composite Design, it makes sense because the vectors are roughly mapped to 1,0, and -1. This implies that in this specific case of the model, the levels of the factors are at points 1 and -1 of the “CCD cube”. Perhaps, in this case because the factors are mapped at -1, 0, and 1, maybe the levels of the factors are all within the design cube space. None of the points are outside of the cube. This makes sense because it is difficult to test the effects of those outside of the cube. This is not a Box-Behnken because there were no points that were in between 0 and 1. The values are very close to -1, 0, and 1. Therefore, the levels of the factors that were tested overall could include: 0 - Center Point, (0,-1.414), (1,-1) - high/low, (1,1) - high/high, (-1,1) - low/high, and (-1,-1) - low/low. After close examination of the SO case, the levels of the factors for airbag was tested as (0,-0.99999), suggesting that this could be a point outside the box (depending on R’s rounding rule), and the drive factor was tested as (1.4,0), also outside the box. As you can see, by looking at the values mapped in this space, they make sense to the physically if comparing to the CCD. The values can be logically mapped to the box points. In our case for the SO model, there were no points in the Center and no points tested as low/low or high/high. Therefore, I am confident that the optimal designs in natural units make some physical sense for the problem because the points can be mapped to the real physical value.
The response surfaces can be visually interpreted by studying the contour plots below:
par(mfrow=c(2,3))
contour(model,~ airbag+drive+man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))
par(mfrow=c(1,1))
contour(model,~ airbag+drive, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))
persp(model,~airbag+drive,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model,~airbag+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model,~airbag+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model,~drive+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model,~drive+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
persp(model,~man.trans+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
## Warning in persp.default(dat$x, dat$y, dat$z, zlim = dat$zlim, theta =
## theta, : "image" is not a graphical parameter
## Warning in persp.default(dat$x, dat$y, dat$z, xlab = dat$labs[1], ylab =
## dat$labs[2], : "image" is not a graphical parameter
## Warning in title(sub = dat$labs[5], ...): "image" is not a graphical
## parameter
confint(fit)
## 2.5 % 97.5 %
## (Intercept) -53.4077848 28.807785
## airbag -7.3762505 37.296250
## drive 15.5037495 60.176250
## man.trans -10.7883286 74.678802
## origin -11.2860669 74.654884
## airbag:drive -29.3861378 -4.013862
## airbag:man.trans -43.9039425 3.836701
## drive:man.trans -54.3021648 -5.519062
## airbag:origin -39.3929001 8.871018
## drive:origin -55.7455438 -6.476929
## man.trans:origin -75.3952600 19.385139
## airbag:drive:man.trans -0.4831718 29.311586
## airbag:drive:origin -2.7706219 26.962826
## airbag:man.trans:origin -14.3652170 41.819390
## drive:man.trans:origin -7.5805897 49.164888
## airbag:drive:man.trans:origin -26.4456237 11.414042
coef(fit)
## (Intercept) airbag
## -12.300000 14.960000
## drive man.trans
## 37.840000 31.945237
## origin airbag:drive
## 31.684409 -16.700000
## airbag:man.trans drive:man.trans
## -20.033621 -29.910613
## airbag:origin drive:origin
## -15.260941 -31.111237
## man.trans:origin airbag:drive:man.trans
## -28.005061 14.414207
## airbag:drive:origin airbag:man.trans:origin
## 12.096102 13.727087
## drive:man.trans:origin airbag:drive:man.trans:origin
## 20.792149 -7.515791
To check if the normality assumption holds, I used the qqnorm and qqlines. I also did a Shapiro-test.
shapiro.test(cars2$price)
##
## Shapiro-Wilk normality test
##
## data: cars2$price
## W = 0.88051, p-value = 4.235e-07
shapiro.test(1/(cars2$price))
##
## Shapiro-Wilk normality test
##
## data: 1/(cars2$price)
## W = 0.95218, p-value = 0.00186
shapiro.test((cars2$price)^2)
##
## Shapiro-Wilk normality test
##
## data: (cars2$price)^2
## W = 0.6641, p-value = 2.839e-13
shapiro.test(log(cars2$price))
##
## Shapiro-Wilk normality test
##
## data: log(cars2$price)
## W = 0.9841, p-value = 0.32
shapiro.test(sqrt(cars2$price))
##
## Shapiro-Wilk normality test
##
## data: sqrt(cars2$price)
## W = 0.95076, p-value = 0.001511
Normality was valid as is. In fact, transformations like log and sqrt, made the normality assumption worse.
#Check for Normality
qqnorm(residuals(fit2))
qqline(residuals(fit2))
#Residual Plot
plot(fitted(fit2),residuals(fit2))
Based on the qqnorm and qqline, the qqnorm line looks fairly linear and closely fits the qqline drawn. Also, the residual plot showed an evenly distributed residual as a function of fitted values. Also, there was a good scatter and not close to 0. The pattern seemed random and evenly distributed, two important conditions that satisfy the normality check. Therefore, the normality assumption was satisfied because the residuals were evenly distributed and the qqnorm line is linear enough. The normality check passed. There is no need to do transformation because the normality assumption was passed.
Summary: Overall, the experiment finding was an interesting one. The null hypothesis was rejected because the price of cars depended on the factors, like drive, man.trans, and origin. The fractional factorial design was implemented and I looked into the me and ie as well. The aliasing structure was examined and the Resolution was Resolution III. Then, as an extension, I looked at a response surface method by testing different models. The model’s results in the response surface methods were fairly consistent with that of the fractional factorial design. In general, the airbag, drive, man.trans, and origin had a great impact, while the 2fi were not reliable.
For the findings, it was interesting to explore the design procedure.
The analysis made sense and the fractional factorial design and response surface methods were well executed.
1.https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html
#Benjamin Byeon
#12/16/16
#Section 1
#Load the package
library("FrF2")
library("rsm")
library("rgl")
#Load the data
cars = read.csv("cars.csv", header = TRUE, sep=",")
cars
#System under test
summary(cars)
head(cars)
tail(cars)
names(cars)
#Display the structure
names(cars)
str(cars)
#Check the levels
levels(cars$price)
levels(cars$airbag)
levels(cars$drive)
levels(cars$man.trans)
levels(cars$origin)
#Decomposition Outline
C = c(-1,1,-1,1)
D = c(-1,-1,1,1)
X1 = c("Low","Med","Med","High")
X1 = as.data.frame(cbind(C,D,X1))
kable(X1,align='c')
E = c(-1,1,-1,1)
F = c(-1,-1,1,1)
X2 = c("Low","Med","Med","High")
X2 = as.data.frame(cbind(C,D,X2))
kable(X2, align ='c')
#Numerize the factors into 0 and 1, instead of categories as is
a = nrow(cars)
airbag = data.frame(a)
drive = data.frame(a)
man.trans = data.frame(a)
origin = data.frame(a)
#Automatically replace the categories as 0,1,2
for (i in 1:a){
#For airbags
if (cars$airbag[i] =="Driver & Passenger"){
airbag[i,1] = 0
}
if (cars$airbag[i] == "Driver only"){
airbag[i,1] = 1
}
if (cars$airbag[i] == "None"){
airbag[i,1] = 2
}
#For drive
if (cars$drive[i] =="4WD"){
drive[i,1] = 0
}
if (cars$drive[i] == "Front"){
drive[i,1] = 1
}
if (cars$drive[i] == "Rear"){
drive[i,1] = 2
}
#For man.trans
if (cars$man.trans[i] == "No"){
man.trans[i,1] = 0
} else{
man.trans[i,1] = 1
}
#For origin
if (cars$origin[i] == "non-USA"){
origin[i,1] = 0
} else{
origin[i,1] = 1
}
}
cars2 = cbind(airbag, drive, man.trans, origin, cars$price)
colnames(cars2) = c("airbag", "drive", "man.trans", "origin", "price")
#Randomize the data
index = sample(1:nrow(cars), 93, replace=FALSE)
rdata = cars[index,]
rdata
#Display the descriptive statistics
summary(cars)
#Histogram
hist(cars$price, main = "Histogram of Car Prices")
#Boxplots for each factors
boxplot(cars$price~cars$airbag, main ="Price vs Airbag")
boxplot(cars$price~cars$drive, main ="Price vs Drive")
boxplot(cars$price~cars$man.trans, main ="Price vs Man.Trans")
boxplot(cars$price~cars$origin, main ="Price vs Origin")
#Pairs plot
pairs(price~airbag, cars2)
pairs(price~drive, cars2)
pairs(price~man.trans, cars2)
pairs(price~origin, cars2)
pairs(cars2)
#New levels
head(cars2,10)
tail(cars2, 20)
str(cars2)
# Boxplot of overall factors with labeled axes
boxplot(cars2$price ~ cars2$airbag + cars2$drive+cars2$man.trans+cars2$origin+cars2$price, main="New Boxplot of All Variables")
#FFD
FFD = (FrF2(8,6, randomize = TRUE))
t2 = as.data.frame(FFD)
A = c(as.integer(t2[,1]))
B = c(as.integer(t2[,2]))
C = c(as.integer(t2[,3]))
D = c(as.integer(t2[,4]))
CD = C*D
E = c(as.integer(t2[,5]))
F = c(as.integer(t2[,6]))
EF = E*F
design = as.data.frame(cbind(A,B,C,D,E,F,CD,EF))
n = nrow(t2)
for (i in 1:n){
if (design$CD[i] == 1){design$airbag[i] <- "Low"}
if (design$CD[i] == 2){design$airbag[i] <- "Med"}
if (design$CD[i] == 4){design$airbag[i] <- "High"}
if (design$EF[i] == 1){design$drive[i] <- "Low"}
if (design$EF[i] == 2){design$drive[i] <- "Med"}
if (design$EF[i] == 4){design$drive[i] <- "High"}
if (design$A[i] == 1){design$man.trans[i] <- "no"}
if (design$A[i] == 2){design$man.trans[i] <- "yes"}
if (design$B[i] == 1){design$origin[i] <- "no"}
if (design$B[i] == 2){design$origin[i] <- "yes"}
}
fractionalfactorialdesign <- cbind(t2,design$airbag, design$drive, design$man.trans, design$origin)
summary(fractionalfactorialdesign)
kable(fractionalfactorialdesign, align = 'c')
FFD_main = FrF2(8, nfactors = 6, res3=T)
aliasprint(FFD_main)
# ANOVA from ffd
fit2 <- aov(price~drive + man.trans + origin, data = cars2)
summary(fit2)
#as.integer settings for rsm
cars2$airbag = as.integer(cars2$airbag)
cars2$drive = as.integer(cars2$drive)
cars2$man.trans = as.integer(cars2$man.trans)
cars2$origin = as.integer(cars2$origin)
#Finding the one without alias
model=rsm(price~FO(airbag,drive, man.trans, origin), data=cars2)
summary(model)
aov(model)
#ANOVA again
fit = aov(price~airbag*drive*man.trans*origin, cars2)
summary(fit)
#Test other models
model2=rsm(price~SO(airbag, drive), data=cars2)
summary(model2)
model3=rsm(price~FO(airbag*drive, man.trans), data=cars2)
summary(model3)
model4=rsm(price~FO(airbag*man.trans), data=cars2)
summary(model4)
model5=rsm(price~FO(airbag*origin), data=cars2)
summary(model5)
model6=rsm(price~FO(airbag^2, drive), data=cars2)
summary(model6)
model7=rsm(price~FO(airbag, drive^2), data=cars2)
summary(model7)
model8=rsm(price~FO(airbag^2, drive^2, origin^2), data=cars2)
summary(model8)
#Contour plots
par(mfrow=c(2,3))
contour(model,~ airbag+drive+man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))
par(mfrow=c(1,1))
contour(model,~ airbag+drive, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ airbag+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+origin, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ drive+man.trans, image=TRUE, at=summary(model2$canoncial$xs))
contour(model,~ man.trans+origin, image=TRUE, at=summary(model2$canoncial$xs))
persp(model,~airbag+drive,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~airbag+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~airbag+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~drive+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~drive+man.trans,image=TRUE,contour="colors",zlab="Price",theta=30)
persp(model,~man.trans+origin,image=TRUE,contour="colors",zlab="Price",theta=30)
#Estimation
confint(fit)
coef(fit)
#Check for Normality
shapiro.test(cars2$price)
shapiro.test(1/(cars2$price))
shapiro.test((cars2$price)^2)
shapiro.test(log(cars2$price))
shapiro.test(sqrt(cars2$price))
qqnorm(residuals(fit))
qqline(residuals(fit))
#Residual Plot
plot(fitted(fit),residuals(fit))