1. Setting

The data being used in the experiment is the Cars93 dataset from the Ecdat package. This dataset contains 93 observations of 23 variables. This dataset is a collection of attributes of vehicles that were for sale in the United States in 1993.

For this experiment, we are interested in the effect of 4 factors on the price of the vehicle. As per the design requirements, we are interested in examining two 2-level factors and 2 3-level factors. The analysis is conducted by expressing the two 3-level factors as combinations of 2-level factors. A fractional factorial design is used to reduce the number of experimental runs that are required to achieve results. The effects are then calculated and used to build a model of the vehicle attributes that impact vehicle price.

Cars93 <- read.delim("C:/Users/wheels/Desktop/Design of Experiments/Project #3/Cars93.txt")

Factors and Levels

airbag is a 3-factor categorical variable that lists the type of airbags that the car has. The three levels are None, Driver only and Driver & Passenger.

drive is a 3-factor categorical variable that lists the type of drive train that the car has. The three levels are Rear, Front, and 4WD.

mantrans is a 2-factor categorical variable that states whether or not a car can have a manual transmission. The two levels are No and Yes.

origin is a 2-factor categorical variable that states whether or not a car was produced in the United States. The two levels are non-USA and USA.

Continuous and Response Variable

Since this experiment is interested in the effect of certain vehicle features on the price of the vehicle, the response variable is Price. Price is a continuous dependent variable.

Overview of Data

head(Cars93)

##              airbags drive mantrans  origin price
## 1               None Front      Yes non-USA  15.9
## 2 Driver & Passenger Front      Yes non-USA  33.9
## 3        Driver only Front      Yes non-USA  29.1
## 4 Driver & Passenger Front      Yes non-USA  37.7
## 5        Driver only  Rear      Yes non-USA  30.0
## 6        Driver only Front       No     USA  15.7

str(Cars93)

## 'data.frame':    93 obs. of  5 variables:
##  $ airbags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
##  $ drive   : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
##  $ mantrans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ origin  : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
##  $ price   : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

summary(Cars93)

##                airbags     drive    mantrans     origin       price      
##  Driver & Passenger:16   4WD  :10   No :32   non-USA:45   Min.   : 7.40  
##  Driver only       :43   Front:67   Yes:61   USA    :48   1st Qu.:12.20  
##  None              :34   Rear :16                         Median :17.70  
##                                                           Mean   :19.51  
##                                                           3rd Qu.:23.30  
##                                                           Max.   :61.90

To be able to carry out a fractional factorial design, the categorical variables need to be replaced with character factors to represent levels of high and low. The levels were assigned to each factor as follows:

airbag: Driver & Passenger= 0, Driver Only= 1, None= 2

drive: 4WD= 0, Front= 1, Rear= 2

mantrans: No= 0, Yes= 1

origin: Non-USA= 0, USA= 1

This manipulated dataset can be seen here:

##    airbag drive mantrans origin price
## 1       2     1        1      0  15.9
## 2       0     1        1      0  33.9
## 3       1     1        1      0  29.1
## 4       0     1        1      0  37.7
## 5       1     2        1      0  30.0
## 6       1     1        0      1  15.7
## 7       1     1        0      1  20.8
## 8       1     2        0      1  23.7
## 9       1     1        0      1  26.3
## 10      1     1        0      1  34.7

## 'data.frame':    93 obs. of  5 variables:
##  $ airbag  : num  2 0 1 0 1 1 1 1 1 1 ...
##  $ drive   : num  1 1 1 1 2 1 1 2 1 1 ...
##  $ mantrans: num  1 1 1 1 1 0 0 0 0 0 ...
##  $ origin  : num  0 0 0 0 0 1 1 1 1 1 ...
##  $ price   : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

2. Experimental Design

How will the experiment be organized and conducted?

This experiment will be a 2^m-3 fractional factorial design and will be more efficient way to look at the main effects than a full factorial design. The 3-level factors will be deconstructed into two 2-level factors and, upon calculation, the sum of the 2-level factors will represent the data that was stored in the 3-level factor. This experiment will yield the main and interaction effects, as well as the resolution of the design. Aliasing will be described so that the reader knows the limitations of this fractional factorial design. Finally ANOVA will be conducted and the final model will be represented.

What is the rationale for this design?

This fractional factorial design is used to reduce the resource necessary to conduct an analysis, compared to a full factorial design. While it can be less accurate, it is a good way to do exploratory data analysis before conducting a large experiment. This project will discuss the aliasing that occurs in the design. From this, the reader will know the limitations of what they can reasonably accept from this design.

Randomization, Replication, Blocking

Randomization was utilized in this experiment. While we can’t comment on the data collection method, we can use randomization by randomly ordering the 8 experiments and randomly selecting a sample. This can be seen in the source code in the user-defined function func(). This experiment is not going to use replication. Since this would double the number of experiments we would have to conduct, replication would make the fractional factorial design less efficient. There is also no blocking in this experiment.

3. Statistical Analysis

Exploratory Data Analysis

A boxplot showing the combinations of the independent factors and their effect on the response variable price is shown below.

As you can see from this boxplot, there are some outliers, but overall, there are no large issues that would stop us from moving forward with the experiment.

Testing

The hypothesis for this test will be:

Null Hypothesis - There is no statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.

Alternate Hypothesis - There is a statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.

Treatment Structure

A full factorial design for this experiment would have 64 runs. The design for such an experiment would look like this:

## Loading required package: DoE.base

## Loading required package: grid

## Loading required package: conf.design

## 
## Attaching package: 'DoE.base'

## The following objects are masked from 'package:stats':
## 
##     aov, lm

## The following object is masked from 'package:graphics':
## 
##     plot.design

## The following object is masked from 'package:base':
## 
##     lengths

## creating full factorial with 64 runs ...

##     A  B  C  D  E  F
## 1  -1  1 -1  1 -1 -1
## 2   1 -1  1 -1 -1 -1
## 3  -1  1 -1 -1  1 -1
## 4   1 -1  1 -1 -1  1
## 5  -1  1  1  1  1  1
## 6   1  1 -1 -1 -1 -1
## 7   1 -1  1 -1  1  1
## 8  -1 -1  1  1 -1 -1
## 9  -1 -1 -1 -1  1 -1
## 10 -1  1 -1  1  1 -1
## 11  1  1 -1 -1 -1  1
## 12  1 -1 -1  1 -1  1
## 13  1  1  1 -1  1 -1
## 14 -1 -1 -1  1  1 -1
## 15  1  1  1 -1 -1 -1
## 16  1  1 -1  1  1 -1
## 17 -1 -1  1 -1  1 -1
## 18 -1  1  1 -1 -1  1
## 19 -1 -1 -1  1  1  1
## 20 -1 -1 -1 -1 -1 -1
## 21  1  1  1  1  1 -1
## 22  1  1  1  1 -1  1
## 23  1 -1  1  1  1 -1
## 24  1 -1 -1 -1  1  1
## 25  1 -1  1 -1  1 -1
## 26 -1 -1 -1 -1 -1  1
## 27 -1  1  1 -1  1  1
## 28  1 -1  1  1 -1 -1
## 29 -1 -1  1  1  1  1
## 30 -1 -1 -1  1 -1  1
## 31 -1 -1 -1 -1  1  1
## 32 -1 -1  1  1  1 -1
## 33  1 -1 -1  1 -1 -1
## 34 -1  1 -1  1 -1  1
## 35 -1  1 -1 -1 -1 -1
## 36 -1  1  1  1 -1  1
## 37 -1  1 -1 -1 -1  1
## 38 -1 -1  1 -1  1  1
## 39 -1  1  1 -1 -1 -1
## 40  1 -1 -1  1  1  1
## 41  1  1 -1  1 -1 -1
## 42  1 -1 -1 -1 -1  1
## 43  1 -1 -1 -1  1 -1
## 44 -1  1  1  1  1 -1
## 45 -1  1 -1  1  1  1
## 46  1 -1  1  1 -1  1
## 47  1 -1 -1  1  1 -1
## 48 -1 -1  1 -1 -1 -1
## 49  1  1 -1 -1  1 -1
## 50  1  1 -1 -1  1  1
## 51  1  1 -1  1  1  1
## 52  1 -1 -1 -1 -1 -1
## 53  1  1  1 -1 -1  1
## 54  1  1 -1  1 -1  1
## 55  1  1  1  1 -1 -1
## 56  1  1  1 -1  1  1
## 57 -1  1  1 -1  1 -1
## 58  1 -1  1  1  1  1
## 59  1  1  1  1  1  1
## 60 -1 -1  1  1 -1  1
## 61 -1 -1  1 -1 -1  1
## 62 -1 -1 -1  1 -1 -1
## 63 -1  1  1  1 -1 -1
## 64 -1  1 -1 -1  1  1
## class=design, type= full factorial

A fractional factorial design can be made in R using the FrF2() function. Below is an 2^6-3 design with 8 experimental runs.

##   airbagA airbagB driveA driveB mantrans origin
## 1       1      -1     -1     -1       -1      1
## 2      -1       1     -1     -1        1     -1
## 3       1       1      1      1        1      1
## 4      -1      -1      1      1       -1     -1
## 5      -1      -1     -1      1        1      1
## 6      -1       1      1     -1       -1      1
## 7       1       1     -1      1       -1     -1
## 8       1      -1      1     -1        1     -1
## class=design, type= FrF2

The function aliasprint was used to observe the aliasing that will occur with this design. As you can see, there is significant aliasing since it is resolution III. Only the main effects will be reliably observed from this experiment.

## Call:
## FrF2(8, nfactors = 6, factor.names = c("airbagA", "airbagB", 
##     "driveA", "driveB", "mantrans", "origin"))
## 
## Experimental design of type  FrF2 
## 8  runs
## 
## Factor settings (scale ends):
##   airbagA airbagB driveA driveB mantrans origin
## 1      -1      -1     -1     -1       -1     -1
## 2       1       1      1      1        1      1
## 
## Design generating information:
## $legend
## [1] A=airbagA  B=airbagB  C=driveA   D=driveB   E=mantrans F=origin  
## 
## $generators
## [1] D=AB E=AC F=BC
## 
## 
## Alias structure:
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD
## 
## 
## The design itself:
##   airbagA airbagB driveA driveB mantrans origin
## 1       1      -1     -1     -1       -1      1
## 2      -1       1     -1     -1        1     -1
## 3       1       1      1      1        1      1
## 4      -1      -1      1      1       -1     -1
## 5      -1      -1     -1      1        1      1
## 6      -1       1      1     -1       -1      1
## 7       1       1     -1      1       -1     -1
## 8       1      -1      1     -1        1     -1
## class=design, type= FrF2

## $legend
## [1] A=airbagA  B=airbagB  C=driveA   D=driveB   E=mantrans F=origin  
## 
## $main
## [1] A=BD=CE B=AD=CF C=AE=BF D=AB=EF E=AC=DF F=BC=DE
## 
## $fi2
## [1] AF=BE=CD

ISYE 6020: Calculating Genorator I

To calculate the generator, we can start with a 2³ full factorial design. With k=6 and p=3, this is a 22^6-3 design. From here the remaining rows can be filled in using the two factor interactions AB, AC, BC.

##   A B C
## A - - -
## B + - -
## C - + -
## D + + -
## E - - +
## F + - +
## G - + +
## H + + +

##   A B C D E F
## A - - - + + +
## B + - - - - +
## C - + - - + -
## D + + - + - -
## E - - + + - -
## F + - + - + -
## G - + + - + -
## H + + + + + +

##   A B C D E F
## 1 - - - + + +
## 2 + - - - - +
## 3 - + - - + -
## 4 + + - + - -
## 5 - - + + - -
## 6 + - + - + -
## 7 - + + - - +
## 8 + + + + + +
## class=design, type= FrF2

This can be compared to a fractional factorial design created with FrF2, showing that it is correctly calculated. From here, we know that D = AB, E = AC, and F = BC. Therefore, I = ABD = ACE = BCF.

Calculating Main and Interaction Effects

Because of the aliasing mentioned previously, only the main effects and the interaction effect AF could be calculated. All of the other interaction effects and higher level interactions are aliased. Below is a summary table of the effects.

##          A
## A  -10.925
## B   -2.375
## C   -2.725
## D    3.675
## E    0.275
## F   11.075
## AB   3.675
## AC   0.275
## AD  -2.375
## AE  -2.725
## AF  -9.225
## BC  11.075
## BD -10.925
## BE  -9.225
## BF  -2.725
## CD  -9.225
## CE -10.925
## CF  -2.375
## DE  11.075
## DF   0.275
## EF   3.675

From this table, we can see that there appears to be some large effects. The factors that combine to form the 3-level factor for airbag and the factor for origin stand out in particular. Conducting ANOVA will give us further insight.

ANOVA and Model Construction

ANOVA was conducted to observe the significance that each of the factors has on price. Below is a summary of ANOVA.

##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## airbag           1   2743  2742.6  65.707 4.31e-12 ***
## drive            1    360   360.4   8.634 0.004283 ** 
## mantrans         1    391   390.9   9.364 0.002989 ** 
## origin           1    577   576.9  13.822 0.000366 ***
## airbag:drive     1    100    99.5   2.384 0.126408    
## airbag:mantrans  1     11    10.5   0.252 0.617252    
## airbag:origin    1    574   574.2  13.756 0.000377 ***
## drive:mantrans   1      7     7.4   0.177 0.674897    
## drive:origin     1    308   307.8   7.374 0.008066 ** 
## mantrans:origin  1     91    91.3   2.187 0.143015    
## Residuals       82   3423    41.7                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From this summary, it is clear that airbag and origin have the largest effects on price, which is in agreement with the main effect analysis. However, from ANOVA, it is clear that all of the main effects are significant. While there are some significant interaction effects, there were excluded from the model because of aliasing. The model can be represented by the following:

\(price = 3.848X1 - 6.189 X2 - 6.916 X3 - 5.520 X4 + 30.186\)

Model Adequacy Checking

Model adequacy checking was done to observe the fit of the model. The residuals vs. fitted plot shows that the model doesn’t have as uniform of a distribution of values as we would like to see. This might be due to the small number of observations collected, or just the nature of this type of data. This Q-Q plot has very little deviation, with the exception of a few outliers, which shows that this model fits the data well.

5. References

D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013.
https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html
ISYE 6020 class resources

6. Appendices

Appendix A: Raw Data

The Cars93 data set was used from the Ecdat package in R. More information on this data set can be found at: https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html

Appendix B: Complete R Code

#Shamus Wheeler
#Project 3

#Load the package
library("FrF2")

#show first observations in dataset
head(Cars93)

#show structure of dataset
str(Cars93)

#show summary of dataset
summary(Cars93)

#create dataframes that represent factor levels with numbers
a <- nrow(Cars93)
airbag <- data.frame(a)
drive <- data.frame(a)
mantrans = data.frame(a)
origin <- data.frame(a)

#for loop to replace factor levels with numbers 
for (i in 1:a){
  # airbags: driver & passenger = 0, driver only = 1, none = 2
  if (Cars93$airbags[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (Cars93$airbags[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (Cars93$airbags[i] == "None"){
    airbag[i,1] = 2
  }
  
  # drive: 4WD = 0, Front = 1, Rear = 2
  if (Cars93$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (Cars93$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (Cars93$drive[i] == "Rear"){
    drive[i,1] = 2
  }
  
  
  #mantrans: No = 0, Yes = 1
  if (Cars93$mantrans[i] == "No"){
    mantrans[i,1] = 0
  } else{
    mantrans[i,1] = 1
  }
  
  # origin: Non-USA = 0, USA = 1
  if (Cars93$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
}


#dataframe of column vectors with response variable 
car <- cbind(airbag, drive, mantrans, origin, Cars93$price)
colnames(car) <- c("airbag", "drive", "mantrans", "origin", "price")

head(car,10)
str(car)

#boxplot of factor 
boxplot(car$price ~ car$airbag + car$drive+car$mantrans+car$origin+car$price, xlab="airbag.drive.mantrans.origin", ylab="Price",main="Analysis of Factors")

#show full factorial design
full <- FrF2(64, nfactors = 6, estimable = formula("~A+B+C+D+A:(B+C+D)"))
print (full)

#create fractional factorial design 
FFD <- FrF2(8, nfactors = 6, factor.names = c('airbagA', 'airbagB', 'driveA', 'driveB', 'mantrans', 'origin'))
FFD
summary(FFD)
#show aliasing in dataset
aliasprint(FFD)

#2^3 matrix to start with
k <- matrix(c("-","-","-","+","-","-","-","+","-","+","+","-","-","-","+","+","-","+","-","+","+","+","+","+"),nrow = 8,byrow = T) 

# display as table
k<- as.table(k) 
k 

#full matrix
o <- matrix(c("-","-","-","+","+","+","+","-","-","-","-","+","-","+","-","-","+","-","+","+","-","+","-","-","-","-","+","+","-","-","+","-","+","-","+","-","-","+","+","-","+","-","+","+","+","+","+","+"),nrow = 8,byrow = T) 

# display as table
o = as.table(o) 
o 

## Fractional design for comparison, unrandomized 
p<- FrF2(8,6, res3 = T, default.levels = c("-","+"), randomize = F) 
p

# Subset creation for factorial design pulled randomly from the table
subseta <- subset(car, airbag == "1" & drive == "1" & mantrans == "1" & origin == "1")
subsetb <- subset(car, airbag == "1" & drive == "2" & mantrans == "1" & origin == "0")
subsetc <- subset(car, airbag == "1" & drive == "2" & mantrans == "0" & origin == "0")
subsetd <- subset(car, airbag == "1" & drive == "1" & mantrans == "0" & origin == "1")
subsete <- subset(car,airbag == "1" & drive == "1" & mantrans == "1" & origin == "0")
subsetf <- subset(car,airbag == "2" & drive == "1" & mantrans == "0" & origin == "0")
subsetg <- subset(car,airbag == "2" & drive == "0" & mantrans == "1" & origin == "1")
subseth <- subset(car,airbag == "1" & drive == "0" & mantrans == "1" & origin == "0")


#Function to get a sample of row 
func <- function (Cars93){
  a <- sample(nrow(Cars93))
  b <- a[1]
  
  return(Cars93$price[b])
}

# Use  function to get group samples
m_a <- func(subseta)
m_b <- func(subsetb)
m_c <- func(subsetc)
m_d <- func(subsetd)
m_e <- func(subsete)
m_f <- func(subsetf)
m_g <- func(subsetg)
m_h <- func(subseth)


#calculate main and interaction effects
mea <- 1/4 * ((m_e[1]+m_g[1]+m_h[1]+m_d[1])-(m_c[1]+m_f[1]+m_b[1]+m_a[1]))

meb <- 1/4 * ((m_g[1]+m_d[1]+m_f[1]+m_b[1])-(m_c[1]+m_e[1]+m_a[1]+m_h[1]))

mec <- 1/4 * ((m_b[1]+m_a[1]+m_h[1]+m_d[1])-(m_c[1]+m_e[1]+m_f[1]+m_g[1]))

med <- 1/4 * ((m_c[1]+m_a[1]+m_g[1]+m_d[1])-(m_f[1]+m_h[1]+m_b[1]+m_e[1]))

mee <- 1/4 * ((m_c[1]+m_f[1]+m_h[1]+m_d[1])-(m_a[1]+m_b[1]+m_c[1]+m_g[1]))

mef <- 1/4 * ((m_c[1]+m_b[1]+m_e[1]+m_d[1])-(m_a[1]+m_f[1]+m_g[1]+m_h[1]))

ie <- 1/4 * ((m_f[1]+m_a[1]+m_d[1]+m_e[1])-(m_c[1]+m_d[1]+m_g[1]+m_h[1]))


#create table of main effects
me_table <- matrix(c(mea,meb,mec,med,mee,mef,med,mee,meb,mec,ie, mef, mea, ie, mec, ie, mea, meb, mef, mee, med), ncol = 1)

#convert to table
me_table <- as.table(me_table)

#add names to rows
rownames(me_table) <- c("A","B","C","D","E","F","AB","AC","AD","AE","AF","BC","BD","BE","BF","CD","CE","CF","DE","DF","EF")

#print table
me_table

#perform ANOVA
anova <- aov(price~airbag + drive + mantrans + origin + airbag:drive + airbag:mantrans + airbag:origin + drive:mantrans + drive:origin + mantrans:origin, data=car)

#display summary of ANOVA
summary(anova)

#find coefficients for model
fit <- lm(price~airbag+drive+mantrans+origin, data=car)
fit

#Anova for model
anova2 <- aov(price~airbag + drive + mantrans + origin + airbag:origin +drive:origin, data=car)

#Residuals and QQ
plot(anova2, which = c(1:2))

Project #3: Effect of Car Features on Price

Shamus Wheeler | RPI

December 10, 2016 | V1