The data being used in the experiment is the Cars93 dataset from the Ecdat package. This dataset contains 93 observations of 23 variables. This dataset is a collection of attributes of vehicles that were for sale in the United States in 1993.
For this experiment, we are interested in the effect of 4 factors on the price of the vehicle. As per the design requirements, we are interested in examining two 2-level factors and 2 3-level factors. The analysis is conducted by expressing the two 3-level factors as combinations of 2-level factors. A fractional factorial design is used to reduce the number of experimental runs that are required to achieve results. The effects are then calculated and used to build a model of the vehicle attributes that impact vehicle price.
Cars93 <- read.delim("C:/Users/wheels/Desktop/Design of Experiments/Project #3/Cars93.txt")
airbag is a 3-factor categorical variable that lists the type of airbags that the car has. The three levels are None, Driver only and Driver & Passenger.
drive is a 3-factor categorical variable that lists the type of drive train that the car has. The three levels are Rear, Front, and 4WD.
mantrans is a 2-factor categorical variable that states whether or not a car can have a manual transmission. The two levels are No and Yes.
origin is a 2-factor categorical variable that states whether or not a car was produced in the United States. The two levels are non-USA and USA.
Since this experiment is interested in the effect of certain vehicle features on the price of the vehicle, the response variable is Price. Price is a continuous dependent variable.
head(Cars93)
## airbags drive mantrans origin price
## 1 None Front Yes non-USA 15.9
## 2 Driver & Passenger Front Yes non-USA 33.9
## 3 Driver only Front Yes non-USA 29.1
## 4 Driver & Passenger Front Yes non-USA 37.7
## 5 Driver only Rear Yes non-USA 30.0
## 6 Driver only Front No USA 15.7
str(Cars93)
## 'data.frame': 93 obs. of 5 variables:
## $ airbags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
## $ drive : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
## $ mantrans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
## $ origin : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
summary(Cars93)
## airbags drive mantrans origin price
## Driver & Passenger:16 4WD :10 No :32 non-USA:45 Min. : 7.40
## Driver only :43 Front:67 Yes:61 USA :48 1st Qu.:12.20
## None :34 Rear :16 Median :17.70
## Mean :19.51
## 3rd Qu.:23.30
## Max. :61.90
To be able to carry out a fractional factorial design, the categorical variables need to be replaced with character factors to represent levels of high and low. The levels were assigned to each factor as follows:
airbag: Driver & Passenger= 0, Driver Only= 1, None= 2
drive: 4WD= 0, Front= 1, Rear= 2
mantrans: No= 0, Yes= 1
origin: Non-USA= 0, USA= 1
This manipulated dataset can be seen here:
## airbag drive mantrans origin price
## 1 2 1 1 0 15.9
## 2 0 1 1 0 33.9
## 3 1 1 1 0 29.1
## 4 0 1 1 0 37.7
## 5 1 2 1 0 30.0
## 6 1 1 0 1 15.7
## 7 1 1 0 1 20.8
## 8 1 2 0 1 23.7
## 9 1 1 0 1 26.3
## 10 1 1 0 1 34.7
## 'data.frame': 93 obs. of 5 variables:
## $ airbag : num 2 0 1 0 1 1 1 1 1 1 ...
## $ drive : num 1 1 1 1 2 1 1 2 1 1 ...
## $ mantrans: num 1 1 1 1 1 0 0 0 0 0 ...
## $ origin : num 0 0 0 0 0 1 1 1 1 1 ...
## $ price : num 15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
This experiment will be a 2m-3 fractional factorial design and will be more efficient way to look at the main effects than a full factorial design. The 3-level factors will be deconstructed into two 2-level factors and, upon calculation, the sum of the 2-level factors will represent the data that was stored in the 3-level factor. This experiment will yield the main and interaction effects, as well as the resolution of the design. Aliasing will be described so that the reader knows the limitations of this fractional factorial design. Finally ANOVA will be conducted and the final model will be represented.
This fractional factorial design is used to reduce the resource necessary to conduct an analysis, compared to a full factorial design. While it can be less accurate, it is a good way to do exploratory data analysis before conducting a large experiment. This project will discuss the aliasing that occurs in the design. From this, the reader will know the limitations of what they can reasonably accept from this design.
Randomization was utilized in this experiment. While we can’t comment on the data collection method, we can use randomization by randomly ordering the 8 experiments and randomly selecting a sample. This can be seen in the source code in the user-defined function func(). This experiment is not going to use replication. Since this would double the number of experiments we would have to conduct, replication would make the fractional factorial design less efficient. There is also no blocking in this experiment.
A boxplot showing the combinations of the independent factors and their effect on the response variable price is shown below.
As you can see from this boxplot, there are some outliers, but overall, there are no large issues that would stop us from moving forward with the experiment.
The hypothesis for this test will be:
Null Hypothesis - There is no statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.
Alternate Hypothesis - There is a statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.
A full factorial design for this experiment would have 64 runs. The design for such an experiment would look like this:
## Loading required package: DoE.base
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
## The following objects are masked from 'package:stats':
##
## aov, lm
## The following object is masked from 'package:graphics':
##
## plot.design
## The following object is masked from 'package:base':
##
## lengths
## creating full factorial with 64 runs ...
## A B C D E F
## 1 -1 1 -1 1 -1 -1
## 2 1 -1 1 -1 -1 -1
## 3 -1 1 -1 -1 1 -1
## 4 1 -1 1 -1 -1 1
## 5 -1 1 1 1 1 1
## 6 1 1 -1 -1 -1 -1
## 7 1 -1 1 -1 1 1
## 8 -1 -1 1 1 -1 -1
## 9 -1 -1 -1 -1 1 -1
## 10 -1 1 -1 1 1 -1
## 11 1 1 -1 -1 -1 1
## 12 1 -1 -1 1 -1 1
## 13 1 1 1 -1 1 -1
## 14 -1 -1 -1 1 1 -1
## 15 1 1 1 -1 -1 -1
## 16 1 1 -1 1 1 -1
## 17 -1 -1 1 -1 1 -1
## 18 -1 1 1 -1 -1 1
## 19 -1 -1 -1 1 1 1
## 20 -1 -1 -1 -1 -1 -1
## 21 1 1 1 1 1 -1
## 22 1 1 1 1 -1 1
## 23 1 -1 1 1 1 -1
## 24 1 -1 -1 -1 1 1
## 25 1 -1 1 -1 1 -1
## 26 -1 -1 -1 -1 -1 1
## 27 -1 1 1 -1 1 1
## 28 1 -1 1 1 -1 -1
## 29 -1 -1 1 1 1 1
## 30 -1 -1 -1 1 -1 1
## 31 -1 -1 -1 -1 1 1
## 32 -1 -1 1 1 1 -1
## 33 1 -1 -1 1 -1 -1
## 34 -1 1 -1 1 -1 1
## 35 -1 1 -1 -1 -1 -1
## 36 -1 1 1 1 -1 1
## 37 -1 1 -1 -1 -1 1
## 38 -1 -1 1 -1 1 1
## 39 -1 1 1 -1 -1 -1
## 40 1 -1 -1 1 1 1
## 41 1 1 -1 1 -1 -1
## 42 1 -1 -1 -1 -1 1
## 43 1 -1 -1 -1 1 -1
## 44 -1 1 1 1 1 -1
## 45 -1 1 -1 1 1 1
## 46 1 -1 1 1 -1 1
## 47 1 -1 -1 1 1 -1
## 48 -1 -1 1 -1 -1 -1
## 49 1 1 -1 -1 1 -1
## 50 1 1 -1 -1 1 1
## 51 1 1 -1 1 1 1
## 52 1 -1 -1 -1 -1 -1
## 53 1 1 1 -1 -1 1
## 54 1 1 -1 1 -1 1
## 55 1 1 1 1 -1 -1
## 56 1 1 1 -1 1 1
## 57 -1 1 1 -1 1 -1
## 58 1 -1 1 1 1 1
## 59 1 1 1 1 1 1
## 60 -1 -1 1 1 -1 1
## 61 -1 -1 1 -1 -1 1
## 62 -1 -1 -1 1 -1 -1
## 63 -1 1 1 1 -1 -1
## 64 -1 1 -1 -1 1 1
## class=design, type= full factorial
A fractional factorial design can be made in R using the FrF2() function. Below is an 26-3 design with 8 experimental runs.
## airbagA airbagB driveA driveB mantrans origin
## 1 1 -1 -1 -1 -1 1
## 2 -1 1 -1 -1 1 -1
## 3 1 1 1 1 1 1
## 4 -1 -1 1 1 -1 -1
## 5 -1 -1 -1 1 1 1
## 6 -1 1 1 -1 -1 1
## 7 1 1 -1 1 -1 -1
## 8 1 -1 1 -1 1 -1
## class=design, type= FrF2
The function aliasprint was used to observe the aliasing that will occur with this design. As you can see, there is significant aliasing since it is resolution III. Only the main effects will be reliably observed from this experiment.
## Call:
## FrF2(8, nfactors = 6, factor.names = c("airbagA", "airbagB",
## "driveA", "driveB", "mantrans", "origin"))
##
## Experimental design of type FrF2
## 8 runs
##
## Factor settings (scale ends):
## airbagA airbagB driveA driveB mantrans origin
## 1 -1 -1 -1 -1 -1 -1
## 2 1 1 1 1 1 1
##
## Design generating information:
## $legend
## [1] A=airbagA B=airbagB C=driveA D=driveB E=mantrans F=origin
##
## $generators
## [1] D=AB E=AC F=BC
##
##
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
##
##
## The design itself:
## airbagA airbagB driveA driveB mantrans origin
## 1 1 -1 -1 -1 -1 1
## 2 -1 1 -1 -1 1 -1
## 3 1 1 1 1 1 1
## 4 -1 -1 1 1 -1 -1
## 5 -1 -1 -1 1 1 1
## 6 -1 1 1 -1 -1 1
## 7 1 1 -1 1 -1 -1
## 8 1 -1 1 -1 1 -1
## class=design, type= FrF2
## $legend
## [1] A=airbagA B=airbagB C=driveA D=driveB E=mantrans F=origin
##
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
##
## $fi2
## [1] AF=BE=CD
To calculate the generator, we can start with a 23 full factorial design. With k=6 and p=3, this is a 226-3 design. From here the remaining rows can be filled in using the two factor interactions AB, AC, BC.
## A B C
## A - - -
## B + - -
## C - + -
## D + + -
## E - - +
## F + - +
## G - + +
## H + + +
## A B C D E F
## A - - - + + +
## B + - - - - +
## C - + - - + -
## D + + - + - -
## E - - + + - -
## F + - + - + -
## G - + + - + -
## H + + + + + +
## A B C D E F
## 1 - - - + + +
## 2 + - - - - +
## 3 - + - - + -
## 4 + + - + - -
## 5 - - + + - -
## 6 + - + - + -
## 7 - + + - - +
## 8 + + + + + +
## class=design, type= FrF2
This can be compared to a fractional factorial design created with FrF2, showing that it is correctly calculated. From here, we know that D = AB, E = AC, and F = BC. Therefore, I = ABD = ACE = BCF.
Because of the aliasing mentioned previously, only the main effects and the interaction effect AF could be calculated. All of the other interaction effects and higher level interactions are aliased. Below is a summary table of the effects.
## A
## A -10.925
## B -2.375
## C -2.725
## D 3.675
## E 0.275
## F 11.075
## AB 3.675
## AC 0.275
## AD -2.375
## AE -2.725
## AF -9.225
## BC 11.075
## BD -10.925
## BE -9.225
## BF -2.725
## CD -9.225
## CE -10.925
## CF -2.375
## DE 11.075
## DF 0.275
## EF 3.675
From this table, we can see that there appears to be some large effects. The factors that combine to form the 3-level factor for airbag and the factor for origin stand out in particular. Conducting ANOVA will give us further insight.
ANOVA was conducted to observe the significance that each of the factors has on price. Below is a summary of ANOVA.
## Df Sum Sq Mean Sq F value Pr(>F)
## airbag 1 2743 2742.6 65.707 4.31e-12 ***
## drive 1 360 360.4 8.634 0.004283 **
## mantrans 1 391 390.9 9.364 0.002989 **
## origin 1 577 576.9 13.822 0.000366 ***
## airbag:drive 1 100 99.5 2.384 0.126408
## airbag:mantrans 1 11 10.5 0.252 0.617252
## airbag:origin 1 574 574.2 13.756 0.000377 ***
## drive:mantrans 1 7 7.4 0.177 0.674897
## drive:origin 1 308 307.8 7.374 0.008066 **
## mantrans:origin 1 91 91.3 2.187 0.143015
## Residuals 82 3423 41.7
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
From this summary, it is clear that airbag and origin have the largest effects on price, which is in agreement with the main effect analysis. However, from ANOVA, it is clear that all of the main effects are significant. While there are some significant interaction effects, there were excluded from the model because of aliasing. The model can be represented by the following:
\(price = 3.848X1 - 6.189 X2 - 6.916 X3 - 5.520 X4 + 30.186\)
Model adequacy checking was done to observe the fit of the model. The residuals vs. fitted plot shows that the model doesn’t have as uniform of a distribution of values as we would like to see. This might be due to the small number of observations collected, or just the nature of this type of data. This Q-Q plot has very little deviation, with the exception of a few outliers, which shows that this model fits the data well.
D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013.
https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html
ISYE 6020 class resources
The Cars93 data set was used from the Ecdat package in R. More information on this data set can be found at: https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html
#Shamus Wheeler
#Project 3
#Load the package
library("FrF2")
#show first observations in dataset
head(Cars93)
#show structure of dataset
str(Cars93)
#show summary of dataset
summary(Cars93)
#create dataframes that represent factor levels with numbers
a <- nrow(Cars93)
airbag <- data.frame(a)
drive <- data.frame(a)
mantrans = data.frame(a)
origin <- data.frame(a)
#for loop to replace factor levels with numbers
for (i in 1:a){
# airbags: driver & passenger = 0, driver only = 1, none = 2
if (Cars93$airbags[i] =="Driver & Passenger"){
airbag[i,1] = 0
}
if (Cars93$airbags[i] == "Driver only"){
airbag[i,1] = 1
}
if (Cars93$airbags[i] == "None"){
airbag[i,1] = 2
}
# drive: 4WD = 0, Front = 1, Rear = 2
if (Cars93$drive[i] =="4WD"){
drive[i,1] = 0
}
if (Cars93$drive[i] == "Front"){
drive[i,1] = 1
}
if (Cars93$drive[i] == "Rear"){
drive[i,1] = 2
}
#mantrans: No = 0, Yes = 1
if (Cars93$mantrans[i] == "No"){
mantrans[i,1] = 0
} else{
mantrans[i,1] = 1
}
# origin: Non-USA = 0, USA = 1
if (Cars93$origin[i] == "non-USA"){
origin[i,1] = 0
} else{
origin[i,1] = 1
}
}
#dataframe of column vectors with response variable
car <- cbind(airbag, drive, mantrans, origin, Cars93$price)
colnames(car) <- c("airbag", "drive", "mantrans", "origin", "price")
head(car,10)
str(car)
#boxplot of factor
boxplot(car$price ~ car$airbag + car$drive+car$mantrans+car$origin+car$price, xlab="airbag.drive.mantrans.origin", ylab="Price",main="Analysis of Factors")
#show full factorial design
full <- FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))
print (full)
#create fractional factorial design
FFD <- FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'mantrans', 'origin'))
FFD
summary(FFD)
#show aliasing in dataset
aliasprint(FFD)
#2^3 matrix to start with
k <- matrix(c("-","-","-","+","-","-","-","+","-","+","+","-","-","-","+","+","-","+","-","+","+","+","+","+"),nrow = 8,byrow = T)
# display as table
k<- as.table(k)
k
#full matrix
o <- matrix(c("-","-","-","+","+","+","+","-","-","-","-","+","-","+","-","-","+","-","+","+","-","+","-","-","-","-","+","+","-","-","+","-","+","-","+","-","-","+","+","-","+","-","+","+","+","+","+","+"),nrow = 8,byrow = T)
# display as table
o = as.table(o)
o
## Fractional design for comparison, unrandomized
p<- FrF2(8,6, res3 = T, default.levels = c("-","+"), randomize = F)
p
# Subset creation for factorial design pulled randomly from the table
subseta <- subset(car, airbag == "1" & drive == "1" & mantrans == "1" & origin == "1")
subsetb <- subset(car, airbag == "1" & drive == "2" & mantrans == "1" & origin == "0")
subsetc <- subset(car, airbag == "1" & drive == "2" & mantrans == "0" & origin == "0")
subsetd <- subset(car, airbag == "1" & drive == "1" & mantrans == "0" & origin == "1")
subsete <- subset(car,airbag == "1" & drive == "1" & mantrans == "1" & origin == "0")
subsetf <- subset(car,airbag == "2" & drive == "1" & mantrans == "0" & origin == "0")
subsetg <- subset(car,airbag == "2" & drive == "0" & mantrans == "1" & origin == "1")
subseth <- subset(car,airbag == "1" & drive == "0" & mantrans == "1" & origin == "0")
#Function to get a sample of row
func <- function (Cars93){
a <- sample(nrow(Cars93))
b <- a[1]
return(Cars93$price[b])
}
# Use function to get group samples
m_a <- func(subseta)
m_b <- func(subsetb)
m_c <- func(subsetc)
m_d <- func(subsetd)
m_e <- func(subsete)
m_f <- func(subsetf)
m_g <- func(subsetg)
m_h <- func(subseth)
#calculate main and interaction effects
mea <- 1/4 * ((m_e[1]+m_g[1]+m_h[1]+m_d[1])-(m_c[1]+m_f[1]+m_b[1]+m_a[1]))
meb <- 1/4 * ((m_g[1]+m_d[1]+m_f[1]+m_b[1])-(m_c[1]+m_e[1]+m_a[1]+m_h[1]))
mec <- 1/4 * ((m_b[1]+m_a[1]+m_h[1]+m_d[1])-(m_c[1]+m_e[1]+m_f[1]+m_g[1]))
med <- 1/4 * ((m_c[1]+m_a[1]+m_g[1]+m_d[1])-(m_f[1]+m_h[1]+m_b[1]+m_e[1]))
mee <- 1/4 * ((m_c[1]+m_f[1]+m_h[1]+m_d[1])-(m_a[1]+m_b[1]+m_c[1]+m_g[1]))
mef <- 1/4 * ((m_c[1]+m_b[1]+m_e[1]+m_d[1])-(m_a[1]+m_f[1]+m_g[1]+m_h[1]))
ie <- 1/4 * ((m_f[1]+m_a[1]+m_d[1]+m_e[1])-(m_c[1]+m_d[1]+m_g[1]+m_h[1]))
#create table of main effects
me_table <- matrix(c(mea,meb,mec,med,mee,mef,med,mee,meb,mec,ie, mef, mea, ie, mec, ie, mea, meb, mef, mee, med), ncol = 1)
#convert to table
me_table <- as.table(me_table)
#add names to rows
rownames(me_table) <- c("A","B","C","D","E","F","AB","AC","AD","AE","AF","BC","BD","BE","BF","CD","CE","CF","DE","DF","EF")
#print table
me_table
#perform ANOVA
anova <- aov(price~airbag + drive + mantrans + origin + airbag:drive + airbag:mantrans + airbag:origin + drive:mantrans + drive:origin + mantrans:origin, data=car)
#display summary of ANOVA
summary(anova)
#find coefficients for model
fit <- lm(price~airbag+drive+mantrans+origin, data=car)
fit
#Anova for model
anova2 <- aov(price~airbag + drive + mantrans + origin + airbag:origin +drive:origin, data=car)
#Residuals and QQ
plot(anova2, which = c(1:2))