1. Setting

The data being used in the experiment is the Cars93 dataset from the Ecdat package. This dataset contains 93 observations of 23 variables. This dataset is a collection of attributes of vehicles that were for sale in the United States in 1993.

This experiment is a continuation of Project #3. For this experiment, we are interested in the effect of 4 factors on the price of the vehicle. As per the design requirements, we are interested in examining two 2-level factors and 2 3-level factors. The analysis is conducted by expressing the two 3-level factors as combinations of 2-level factors. A Taguchi design will be created and carried out to analyze the main effects. The results obtained from the Taguchi design will be compared to the results obtained from the Fractional Factorial Design in Project #3.

Cars93 <- read.delim("C:/Users/wheels/Desktop/Design of Experiments/Project #3/Cars93.txt")

Factors and Levels

mantrans is a 2-factor categorical variable that states whether or not a car can have a manual transmission. The two levels are No and Yes.

origin is a 2-factor categorical variable that states whether or not a car was produced in the United States. The two levels are non-USA and USA.

airbag is a 3-factor categorical variable that lists the type of airbags that the car has. The three levels are None, Driver only and Driver & Passenger.

drive is a 3-factor categorical variable that lists the type of drive train that the car has. The three levels are Rear, Front, and 4WD.

Continuous and Response Variable

Since this experiment is interested in the effect of certain vehicle features on the price of the vehicle, the response variable is Price. Price is a continuous dependent variable.

Overview of Data

head(Cars93)
##   mantrans  origin            airbags drive price
## 1      Yes non-USA               None Front  15.9
## 2      Yes non-USA Driver & Passenger Front  33.9
## 3      Yes non-USA        Driver only Front  29.1
## 4      Yes non-USA Driver & Passenger Front  37.7
## 5      Yes non-USA        Driver only  Rear  30.0
## 6       No     USA        Driver only Front  15.7
str(Cars93)
## 'data.frame':    93 obs. of  5 variables:
##  $ mantrans: Factor w/ 2 levels "No","Yes": 2 2 2 2 2 1 1 1 1 1 ...
##  $ origin  : Factor w/ 2 levels "non-USA","USA": 1 1 1 1 1 2 2 2 2 2 ...
##  $ airbags : Factor w/ 3 levels "Driver & Passenger",..: 3 1 2 1 2 2 2 2 2 2 ...
##  $ drive   : Factor w/ 3 levels "4WD","Front",..: 2 2 2 2 3 2 2 3 2 2 ...
##  $ price   : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...
summary(Cars93)
##  mantrans     origin                 airbags     drive        price      
##  No :32   non-USA:45   Driver & Passenger:16   4WD  :10   Min.   : 7.40  
##  Yes:61   USA    :48   Driver only       :43   Front:67   1st Qu.:12.20  
##                        None              :34   Rear :16   Median :17.70  
##                                                           Mean   :19.51  
##                                                           3rd Qu.:23.30  
##                                                           Max.   :61.90

To be able to carry out a Taguchi design, the categorical variables need to be replaced with character factors to represent levels of high and low. The levels were assigned to each factor as follows:

mantrans: No= 0, Yes= 1

origin: Non-USA= 0, USA= 1

airbag: Driver & Passenger= 0, Driver Only= 1, None= 2

drive: 4WD= 0, Front= 1, Rear= 2

This manipulated dataset can be seen here:

##    mantrans origin airbag drive price
## 1         1      0      2     1  15.9
## 2         1      0      0     1  33.9
## 3         1      0      1     1  29.1
## 4         1      0      0     1  37.7
## 5         1      0      1     2  30.0
## 6         0      1      1     1  15.7
## 7         0      1      1     1  20.8
## 8         0      1      1     2  23.7
## 9         0      1      1     1  26.3
## 10        0      1      1     1  34.7
## 'data.frame':    93 obs. of  5 variables:
##  $ mantrans: num  1 1 1 1 1 0 0 0 0 0 ...
##  $ origin  : num  0 0 0 0 0 1 1 1 1 1 ...
##  $ airbag  : num  2 0 1 0 1 1 1 1 1 1 ...
##  $ drive   : num  1 1 1 1 2 1 1 2 1 1 ...
##  $ price   : num  15.9 33.9 29.1 37.7 30 15.7 20.8 23.7 26.3 34.7 ...

2. Experimental Design

This experiment was conducted to observe the effects of several vehicle features on the price of the vehicle. A Taguchi design was used to conduct this experiment, which allows for the reduction of experimental runs, while still calculating the main effects.

How will the experiment be organized and conducted?

This experiment will be a Taguchi design and will be more efficient way to look at the main effects than a full factorial design. The 3-level factors will be deconstructed into two 2-level factors and, upon calculation, the sum of the 2-level factors will represent the data that was stored in the 3-level factor. Finally ANOVA will be conducted and the final model will be represented.

What is the rationale for this design?

This Taguchi design is used to reduce the resources necessary to conduct an analysis, compared to a full factorial design. With a full factorial design, this experiment would take 64 runs if it was decomposed like it is now, or 36 runs without decomposition. A Taguchi design will reduce the number of runs and still provide an accurate measurement of the main effects.

Randomization, Replication, Blocking

Randomization was utilized in this experiment. While we can’t comment on the data collection method, we can use randomization by randomly ordering the 8 experiments and randomly selecting a sample.This experiment is not going to use replication. There is also no blocking in this experiment.

3. Statistical Analysis

Exploratory Data Analysis

A boxplot showing the combinations of the independent factors and their effect on the response variable price is shown below.

As you can see from this boxplot, there are some outliers, but overall, there are no large issues that would stop us from moving forward with the experiment.

Testing

The hypothesis for this test will be:

Null Hypothesis - There is no statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.

Alternate Hypothesis - There is a statistically significant difference between the prices of the vehicles due to the changing factor levels of the independent variables.

A Taguchi design will be used to analyze the main effects, and then ANOVA will be used further analyze the main effects and build a model.

Treatment Structure

As previously mentioned, it would take 64 runs to complete this experiment with the decomposed factors and 36 runs to complete this experiment with the original data with a full factorial design. The decomposed factors were chosen for the Taguchi design because it helps to reduce the number of runs needed. taguchiChoose() was used to determine the possible design options. Since L8_2 had the least number of runs, it was used for this experiment.

## Warning: package 'qualityTools' was built under R version 3.3.2
## Loading required package: Rsolnp
## Warning: package 'Rsolnp' was built under R version 3.3.2
## Loading required package: MASS
## 
## Attaching package: 'MASS'
## The following object is masked _by_ '.GlobalEnv':
## 
##     Cars93
## 
## Attaching package: 'qualityTools'
## The following object is masked from 'package:stats':
## 
##     sigma
## 6 factors on 2 levels and 0 factors on 0 levels with 0 desired interactions to be estimated
## 
## Possible Designs:
## 
## L8_2 L12_2 L16_2 L32_2
## 
## Use taguchiDesign("L8_2") or different to create a taguchi design object
## [1] "L8_2"  "L12_2" "L16_2" "L32_2"

The chosen design was input into the taguchiDesign function to create a design matrix, as shown below:

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated

## Warning in `[<-`(`*tmp*`, i, value = <S4 object of class
## structure("taguchiFactor", package = "qualityTools")>): implicit list
## embedding of S4 objects is deprecated
##   StandOrder RunOrder Replicate A B C D E F G  y
## 1          5        1         1 2 1 2 1 2 1 2 NA
## 2          7        2         1 2 2 1 1 2 2 1 NA
## 3          1        3         1 1 1 1 1 1 1 1 NA
## 4          3        4         1 1 2 2 1 1 2 2 NA
## 5          4        5         1 1 2 2 2 2 1 1 NA
## 6          8        6         1 2 2 1 2 1 1 2 NA
## 7          6        7         1 2 1 2 2 1 2 1 NA
## 8          2        8         1 1 1 1 2 2 2 2 NA

Since there are only 6 factors being tested, the last column will be dropped. A subset of the dataset will be created for each of the treatment levels shown.

Calculating Main Effects

The means are calculated for each run, using the level combinations mentioned above, and put into an array. This array is put into the Taguchi design and the main effects are calculated.

##   StandOrder RunOrder Replicate A B C D E F G means
## 1          5        1         1 2 1 2 1 2 1 2  15.7
## 2          7        2         1 2 2 1 1 2 2 1   9.1
## 3          1        3         1 1 1 1 1 1 1 1  20.8
## 4          3        4         1 1 2 2 1 1 2 2  47.9
## 5          4        5         1 1 2 2 2 2 1 1  17.7
## 6          8        6         1 2 2 1 2 1 1 2  19.9
## 7          6        7         1 2 1 2 2 1 2 1  16.3
## 8          2        8         1 1 1 1 2 2 2 2  19.0

The results of the main effects calculations are shown below in both a table and effect plots.

##       A
## A -5.15
## B 10.40
## C 11.55
## D  6.35
## E  3.90
## F -8.55

This allows us to easily see the main effects of the factors on the response variable. These main effects can be compared to the main effects that were calculated on project #3:

##   Taguchi     FFD
## A   -5.15   0.275
## B   10.40  11.075
## C   11.55 -10.925
## D    6.35  -2.375
## E    3.90  -2.725
## F   -8.55   3.675

The main effects from the Taguchi design are vary somewhat from the main effects that were calculated with the fractional factorial design. Effect B from both designs seems to be similar, while others have different signs, and some of them have different magnitudes.

ANOVA and Model Construction

ANOVA was conducted to observe the significance that each of the factors has on price. Below is a summary of ANOVA.

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## mantrans     1    915   915.1  17.843 5.83e-05 ***
## origin       1    611   611.1  11.915 0.000858 ***
## airbag       1   2198  2198.4  42.865 3.79e-09 ***
## drive        1    346   346.1   6.748 0.011000 *  
## Residuals   88   4513    51.3                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From ANOVA, it is clear that all of the main effects are significant. This is in agreement with the full factorial design from Project #3 and indicates that we should reject the null hypothesis for each variable. The model can be represented by the following:

\(price = 3.848X1 - 6.189 X2 - 6.916 X3 - 5.520 X4 + 30.186\)

It is difficult to interpret this model because of the different signs and magnitudes that were calculated for the main effects from project #3. However, we can still check the adequacy of this model.

Model Adequacy Checking

Model adequacy checking was done to observe the fit of the model. The residuals vs. fitted plot shows that the model doesn’t have as uniform of a distribution of values as we would like to see. This might be due to the small number of observations collected, or just the nature of this type of data. This Q-Q plot has very little deviation, with the exception of a few outliers, which shows that this model fits the data well. This shows that there isn’t much change from project #3 in the model fit.

Conclusions

This Taguchi design produced different main effect calculations than the fractional factorial design. The Taguchi design and the fractional factorial design use the same number of runs to calculate the main effects. This experiment seems to reasonably be accurate, and it would probably help it the dataset had more observations and replicates in each subset. We can’t determine which design is more accurate without repeating the experiment with a full factorial design. This Taguchi design seems to be a decent approximation of the main effects.

5. References

  1. D. C. Montgomery, Design and Analysis of Experiments, 8th ed. Hoboken, NJ: John Wiley & Sons, Inc., 2013.

  2. https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html

  3. ISYE 6020 class resources

6. Appendices

Appendix A: Raw Data

The Cars93 data set was used from the Ecdat package in R. More information on this data set can be found at: https://vincentarelbundock.github.io/Rdatasets/doc/MASS/Cars93.html

Appendix B: Complete R Code

#Shamus Wheeler
#Project 4

#load qualityTools package
library(qualityTools)

#show first observations in dataset
head(Cars93)

#show structure of dataset
str(Cars93)

#show summary of dataset
summary(Cars93)

#create dataframes that represent factor levels with numbers
a <- nrow(Cars93)
mantrans = data.frame(a)
origin <- data.frame(a)
airbag <- data.frame(a)
drive <- data.frame(a)


#for loop to replace factor levels with numbers 
for (i in 1:a){
  
  #mantrans: No = 0, Yes = 1
  if (Cars93$mantrans[i] == "No"){
    mantrans[i,1] = 0
  } else{
    mantrans[i,1] = 1
  }
  
  # origin: Non-USA = 0, USA = 1
  if (Cars93$origin[i] == "non-USA"){
    origin[i,1] = 0
  } else{
    origin[i,1] = 1
  }
  
  # airbags: driver & passenger = 0, driver only = 1, none = 2
  if (Cars93$airbags[i] =="Driver & Passenger"){
    airbag[i,1] = 0
  }
  if (Cars93$airbags[i] == "Driver only"){
    airbag[i,1] = 1
  }
  if (Cars93$airbags[i] == "None"){
    airbag[i,1] = 2
  }
  
  # drive: 4WD = 0, Front = 1, Rear = 2
  if (Cars93$drive[i] =="4WD"){
    drive[i,1] = 0
  }
  if (Cars93$drive[i] == "Front"){
    drive[i,1] = 1
  }
  if (Cars93$drive[i] == "Rear"){
    drive[i,1] = 2
  }
}


#dataframe of column vectors with response variable 
car <- cbind( mantrans, origin, airbag, drive, Cars93$price)
colnames(car) <- c( "mantrans", "origin", "airbag", "drive", "price")

#show head of new dataset
head(car,10)

#show structure of new dataset
str(car)

#boxplot of factor 
boxplot(car$price ~ car$mantrans+car$origin+car$airbag+car$drive+car$price, xlab="mantrans.origin.airbag.drive", ylab="Price",main="Analysis of Factors")

#set seed fpr project results
set.seed(1)
# find correct Taguchi design for data set
t <- taguchiChoose(6,0,2,0)
print(t)

#show  structure of Taguchi design matrix
t <- taguchiDesign("L8_2")
print(t)

# Subset creation for factorial design pulled randomly from the table
subseta <- subset(car, mantrans == "1" & origin == "0" & airbag == "1" & drive == "1")
subsetb <- subset(car, mantrans == "1" & origin == "0" & airbag == "2" & drive == "1")
subsetc <- subset(car, mantrans == "0" & origin == "1" & airbag == "1" & drive == "1")
subsetd <- subset(car, mantrans == "0" & origin == "0" & airbag == "1" & drive == "2")
subsete <- subset(car, mantrans == "1" & origin == "1" & airbag == "0" & drive == "2")
subsetf <- subset(car, mantrans == "1" & origin == "1" & airbag == "1" & drive == "0")
subsetg <- subset(car, mantrans == "0" & origin == "1" & airbag == "2" & drive == "1")
subseth <- subset(car, mantrans == "0" & origin == "1" & airbag == "1" & drive == "0")

#Function to get a sample of row 
func <- function (Cars93){
  a <- sample(nrow(Cars93))
  b <- a[1]
  
  return(Cars93$price[b])
}

# Use  function to get group samples
m_a <- func(subseta)
m_b <- func(subsetb)
m_c <- func(subsetc)
m_d <- func(subsetd)
m_e <- func(subsete)
m_f <- func(subsetf)
m_g <- func(subsetg)
m_h <- func(subseth)

#create vector
means_vec <- c(m_a[1], m_b[1], m_c[1], m_d[1], m_e[1], m_f[1], m_g[1], m_h[1])

#convert to matrix
means <- as.matrix(means_vec)

#insert response into Taguchi matrix
response(t) <- means

#show matrix with responses
print(t)

#calculate main effects
mea <- 1/4 * ((m_e[1]+m_f[1]+m_g[1]+m_h[1])-(m_b[1]+m_a[1]+m_c[1]+m_d[1]))

meb <- 1/4 * ((m_c[1]+m_d[1]+m_g[1]+m_h[1])-(m_a[1]+m_e[1]+m_f[1]+m_b[1]))

mec <- 1/4 * ((m_c[1]+m_d[1]+m_e[1]+m_f[1])-(m_b[1]+m_a[1]+m_g[1]+m_h[1]))

med <- 1/4 * ((m_b[1]+m_d[1]+m_f[1]+m_h[1])-(m_a[1]+m_c[1]+m_e[1]+m_g[1]))

mee <- 1/4 * ((m_b[1]+m_d[1]+m_e[1]+m_g[1])-(m_a[1]+m_c[1]+m_f[1]+m_h[1]))

mef <- 1/4 * ((m_b[1]+m_c[1]+m_f[1]+m_g[1])-(m_a[1]+m_d[1]+m_h[1]+m_e[1]))

#create main effect vector
me_vec <- matrix(c(mea,meb,mec,med,mee,mef),ncol=1)

#convert to table
me_table <- as.table(me_vec)

#add names to rows
rownames(me_table) <- c("A","B","C","D","E","F")

#display table
me_table

#create main effect plots
effectPlot(t, factors = c("A","B","C","D","E","F"))

#create compare table
me_compare_table <- matrix(c(mea,meb,mec,med,mee,mef,0.275,11.075,-10.925,-2.375,-2.725,3.675), ncol = 2)

#add names to rows
colnames(me_compare_table) <- c("Taguchi", "FFD")

#add names to rows
rownames(me_compare_table) <- c("A","B","C","D","E","F")

#display table
me_compare_table

#perform ANOVA
anova <- aov(price~mantrans + origin + airbag + drive, data=car)

#display summary of ANOVA
summary(anova)

#find coefficients for model
fit <- lm(price~mantrans+origin+airbag+drive, data=car)
fit

#Anova for model
anova2 <- aov(price~ mantrans + origin + airbag + drive, data=car)

#model diagnostics
plot(anova2)