Taguchi Design

Hongyu Chen

RPI, RIN:661405156

Dec 4th

1. Setting

System under test

Data is obtained from the ‘fueleconomy’ dataset. 5 possible factors that might influence city fuel economy (in mpg) are analyzed using Taguchi design. A quick view of the dataset is as below.

remove(list=ls())
#Loading raw data

library("fueleconomy")
## Warning: package 'fueleconomy' was built under R version 3.1.2
#Quick view and data summary
head(vehicles)
##      id       make               model year                       class
## 1 27550 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5  1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6  1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
##             trans            drive cyl displ    fuel hwy cty
## 1 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 2 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 3 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 4 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 5 Automatic 3-spd Rear-Wheel Drive   4   2.5 Regular  17  16
## 6 Automatic 3-spd Rear-Wheel Drive   6   4.2 Regular  13  13
tail(vehicles)
##          id  make                             model year       class
## 33437 31064 smart   fortwo electric drive cabriolet 2011 Two Seaters
## 33438 33305 smart fortwo electric drive convertible 2013 Two Seaters
## 33439 34393 smart fortwo electric drive convertible 2014 Two Seaters
## 33440 31065 smart       fortwo electric drive coupe 2011 Two Seaters
## 33441 33306 smart       fortwo electric drive coupe 2013 Two Seaters
## 33442 34394 smart       fortwo electric drive coupe 2014 Two Seaters
##                trans            drive cyl displ        fuel hwy cty
## 33437 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  79  94
## 33438 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33439 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33440 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  79  94
## 33441 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33442 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
summary(vehicles)
##        id            make              model                year     
##  Min.   :    1   Length:33442       Length:33442       Min.   :1984  
##  1st Qu.: 8361   Class :character   Class :character   1st Qu.:1991  
##  Median :16724   Mode  :character   Mode  :character   Median :1999  
##  Mean   :17038                                         Mean   :1999  
##  3rd Qu.:25265                                         3rd Qu.:2008  
##  Max.   :34932                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl       
##  Length:33442       Length:33442       Length:33442       Min.   : 2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.00  
##                                                           Mean   : 5.77  
##                                                           3rd Qu.: 6.00  
##                                                           Max.   :16.00  
##                                                           NA's   :58     
##      displ          fuel                hwy             cty       
##  Min.   :0.00   Length:33442       Min.   :  9.0   Min.   :  6.0  
##  1st Qu.:2.30   Class :character   1st Qu.: 19.0   1st Qu.: 15.0  
##  Median :3.00   Mode  :character   Median : 23.0   Median : 17.0  
##  Mean   :3.35                      Mean   : 23.6   Mean   : 17.5  
##  3rd Qu.:4.30                      3rd Qu.: 27.0   3rd Qu.: 20.0  
##  Max.   :8.40                      Max.   :109.0   Max.   :138.0  
##  NA's   :57

Factors and Levels

There are five factors that may influence vehicle city fuel economy, namely model year (year), drive train (drive), vehicle size class (class), fuel type (fuel) and number of cylinders (cyl) in this analysis.

unique(vehicles$year)
##  [1] 1984 1985 1987 1997 1998 1999 1995 1996 2001 2002 2003 2000 2004 2013
## [15] 2014 1986 1988 1989 1990 1991 1992 1993 1994 2005 2006 2007 2008 2009
## [29] 2010 2011 2012 2015
unique(vehicles$drive)
## [1] "2-Wheel Drive"              "Rear-Wheel Drive"          
## [3] "Front-Wheel Drive"          "4-Wheel or All-Wheel Drive"
## [5] "All-Wheel Drive"            "4-Wheel Drive"             
## [7] "Part-time 4-Wheel Drive"
unique(vehicles$class)
##  [1] "Special Purpose Vehicle 2WD"       
##  [2] "Midsize Cars"                      
##  [3] "Subcompact Cars"                   
##  [4] "Compact Cars"                      
##  [5] "Sport Utility Vehicle - 4WD"       
##  [6] "Small Sport Utility Vehicle 2WD"   
##  [7] "Small Sport Utility Vehicle 4WD"   
##  [8] "Two Seaters"                       
##  [9] "Sport Utility Vehicle - 2WD"       
## [10] "Special Purpose Vehicles"          
## [11] "Special Purpose Vehicle 4WD"       
## [12] "Small Station Wagons"              
## [13] "Minicompact Cars"                  
## [14] "Midsize-Large Station Wagons"      
## [15] "Midsize Station Wagons"            
## [16] "Large Cars"                        
## [17] "Standard Sport Utility Vehicle 4WD"
## [18] "Standard Sport Utility Vehicle 2WD"
## [19] "Minivan - 4WD"                     
## [20] "Minivan - 2WD"                     
## [21] "Vans"                              
## [22] "Vans, Cargo Type"                  
## [23] "Vans, Passenger Type"              
## [24] "Standard Pickup Trucks 2WD"        
## [25] "Standard Pickup Trucks"            
## [26] "Standard Pickup Trucks/2wd"        
## [27] "Small Pickup Trucks 2WD"           
## [28] "Standard Pickup Trucks 4WD"        
## [29] "Small Pickup Trucks 4WD"           
## [30] "Small Pickup Trucks"               
## [31] "Vans Passenger"                    
## [32] "Special Purpose Vehicle"           
## [33] "Special Purpose Vehicles/2wd"      
## [34] "Special Purpose Vehicles/4wd"
unique(vehicles$fuel)
##  [1] "Regular"                     "Premium"                    
##  [3] "Diesel"                      "Premium or E85"             
##  [5] "Electricity"                 "Gasoline or E85"            
##  [7] "Premium Gas or Electricity"  "Gasoline or natural gas"    
##  [9] "CNG"                         "Midgrade"                   
## [11] "Regular Gas and Electricity" "Gasoline or propane"        
## [13] "Premium and Electricity"
unique(vehicles$cyl)
##  [1]  4  6  5  8 12 10 NA 16  3  2

Originally there are too many levels in each factor, for convenience and further Taguchi design, we convert these factors and levels into categorical factors by appropriate categorization.

Years are categorized based on different decades, which is a typical way when considering the time when a vehicle is manufactured.

#Categories of years
vehicles$year[as.numeric(vehicles$year) >= 1984 & as.numeric(vehicles$year) <= 1999] = "1"
vehicles$year[as.numeric(vehicles$year) >= 2000 & as.numeric(vehicles$year) <= 2009] = "2"
vehicles$year[as.numeric(vehicles$year) >= 2010 & as.numeric(vehicles$year) <= 2015] = "3"

unique(vehicles$year)
## [1] "1" "2" "3"

Drive train is categorized into 2-wheel drive, 4-wheel drive and all-wheel drive as below. It is also a typical categorization when we want to buy a vehicle.

#Categories of drive train
vehicles$drive[ (vehicles$drive) == "2-Wheel Drive" |  (vehicles$drive) == "Rear-Wheel Drive" |  (vehicles$drive) == "Front-Wheel Drive"] = "1"
vehicles$drive[ (vehicles$drive) == "Part-time 4-Wheel Drive" |  (vehicles$drive) == "4-Wheel Drive"] = "2"
vehicles$drive[ (vehicles$drive) == "All-Wheel Drive" | (vehicles$drive) == "4-Wheel or All-Wheel Drive"] = "3"

unique(vehicles$drive)
## [1] "1" "3" "2"

Vehicle size class is classified into 3 categories, namely commonly used class, for passenger and cargo, and for special purpose.

#Categories of class
#Commonly used classes - "1"
vehicles$class[ (vehicles$class) == "Compact Cars" | (vehicles$class) == "Subcompact Cars" | (vehicles$class) == "Two Seaters" | (vehicles$class) == "Small Station Wagons" |  (vehicles$class) == "Minicompact Cars" | (vehicles$class) == "Sport Utility Vehicle - 4WD" |  (vehicles$class) == "Small Sport Utility Vehicle 2WD" |  (vehicles$class) == "Small Sport Utility Vehicle 4WD"|  (vehicles$class) == "Sport Utility Vehicle - 2WD"|  (vehicles$class) == "Standard Sport Utility Vehicle 4WD"|  (vehicles$class) == "Standard Sport Utility Vehicle 2WD" | (vehicles$class) == "Midsize Cars"] = "1"

#For passenger and cargo - "2"
vehicles$class[(vehicles$class) == "Vans, Cargo Type"|  (vehicles$class) == "Standard Pickup Trucks 2WD"|  (vehicles$class) == "Standard Pickup Trucks"|  (vehicles$class) == "Standard Pickup Trucks/2wd"|  (vehicles$class) == "Standard Pickup Trucks 4WD" | (vehicles$class) == "Midsize-Large Station Wagons"|  (vehicles$class) == "Midsize Station Wagons"|  (vehicles$class) == "Large Cars"|  (vehicles$class) == "Minivan - 4WD"|  (vehicles$class) == "Minivan - 2WD"|  (vehicles$class) == "Vans"|  (vehicles$class) == "Vans, Passenger Type"|  (vehicles$class) == "Small Pickup Trucks 2WD"|  (vehicles$class) == "Small Pickup Trucks 4WD"|  (vehicles$class) == "Small Pickup Trucks"| (vehicles$class) == "Vans Passenger"] = "2"

#Special purpose - "3"
vehicles$class[(vehicles$class) == "Special Purpose Vehicle 2WD" | (vehicles$class) == "Special Purpose Vehicles" | (vehicles$class) == "Special Purpose Vehicle 4WD" | (vehicles$class) == "Special Purpose Vehicle" | (vehicles$class) == "Special Purpose Vehicles/2wd" | (vehicles$class) == "Special Purpose Vehicles/4wd"] = "3"

unique(vehicles$class)
## [1] "3" "1" "2"

Fuel types are divided into 4 categories. One thing to point out is that there are some overlaps, for example, ‘gasoline or natural gas’ is categorized into ‘1’ instead of ‘4’. It is purely arbitrary when considering these levels, and different categorization ways can possibly have different results.

#Categories of fuel types
#Common gasoline - "1"
vehicles$fuel[(vehicles$fuel) == "Gasoline or E85" | (vehicles$fuel) == "Gasoline or natural gas" | (vehicles$fuel) == "Gasoline or propane" | (vehicles$fuel) == "Midgrade" | (vehicles$fuel) == "Regular" | (vehicles$fuel) == "Regular Gas and Electricity"] = "1"

#Premium gasoline - "2"
vehicles$fuel[(vehicles$fuel) == "Premium" | (vehicles$fuel) == "Premium or E85"] = "2"

#Electricity - "3"
vehicles$fuel[(vehicles$fuel) == "Premium and Electricity" | (vehicles$fuel) == "Premium Gas or Electricity" | (vehicles$fuel) == "Electricity"] = "3"

#Others - '4'
vehicles$fuel[(vehicles$fuel) == "CNG" | (vehicles$fuel) == "Diesel" ] = "4"

unique(vehicles$fuel)
## [1] "1" "2" "4" "3"

Number for cylinders has 3 catergories as below, which could reflect power of a vehicle.

#Categories of numbers of cylinder

vehicles$cyl[as.numeric(vehicles$cyl) >= 2 & as.numeric(vehicles$cyl) <= 4] = "1"
vehicles$cyl[as.numeric(vehicles$cyl) > 4 & as.numeric(vehicles$cyl) < 8] = "2"
vehicles$cyl[as.numeric(vehicles$cyl) >=8 & as.numeric(vehicles$cyl) <= 16] = "3"

unique(vehicles$cyl)
## [1] "1" "2" "3" NA

Continuous variables (if any)

All factors are categorical variables, and the response variable ‘cty’, which means city fuel economy in mpg (miles per galon) is a continuous variable which can be any integer from 6 to 138.

The Data: How is it organized and what does it look like?

Detailed information is provided in the ‘fueleconomy’ dataset, including almost everything to evaluate an auto. 5 factors are seleced in this analysis, they are model year (year), drive train (drive), vehicle size class (class), fuel type (fuel) and number of cylinders (cyl). We convert these factors into categorical factors, each with 3 levels except for fuel, which has 4 levels.

str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame':    33442 obs. of  12 variables:
##  $ id   : int  27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
##  $ make : chr  "AM General" "AM General" "AM General" "AM General" ...
##  $ model: chr  "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
##  $ year : chr  "1" "1" "1" "1" ...
##  $ class: chr  "3" "3" "3" "3" ...
##  $ trans: chr  "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
##  $ drive: chr  "1" "1" "1" "1" ...
##  $ cyl  : chr  "1" "1" "2" "2" ...
##  $ displ: num  2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
##  $ fuel : chr  "1" "1" "1" "1" ...
##  $ hwy  : int  17 17 13 13 17 13 21 26 28 26 ...
##  $ cty  : int  18 18 13 13 16 13 14 20 22 18 ...

Randomization

Fuel economy data are the result of vehicle testing done at the Environmental Protection Agency’s National Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan, and by vehicle manufacturers with oversight by EPA. Therefore it is reasonalbe to assume these data are complete randomized.

Replication/repeated measures

There is not replication or repeated measures in raw data or this analysis.

Block

Blocks are set in all five factors as above by appropriate categorization.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

Obtain data from ‘fueleconomy’ dataset, categorize selected factors and then do exploratory data analysis and ANOVA to test effect of individual factors and levels on vehicle fuel economy. Followed is creating orthogonal array for taguchi design using oa.design in DoE package. Find the optimized situation and finally do the model adequacy checking. Therefore, null hypothesis is: H0: variance in vehicle city fuel economy can only be explained by randomization. Alternative hypothesis: Ha: variance in vehicle city fuel economy can be explained by anything other than randomization (one or more of these 5 factors).

What is the rationale for this design?

The Taguchi method is a structured approach for determining the best combination of inputs to produce a product or service. Through Taguchi design it is possible to find the most optmized combination using only several runs. The aim in this design is to find certain combinations of factors that can have the best city fuel economy using Taguchi method.

3. (Statistical) Analysis

Exploratory Data Analysis

#Histogram of response variable
hist(vehicles$cty, ylab="City Fuel Economy(mpg)")

plot of chunk unnamed-chunk-9

#Boxplot
boxplot(cty~year, data=vehicles,xlab='year',ylab='city fuel economy',main='Made year-city fuel economy boxplot')

plot of chunk unnamed-chunk-9

boxplot(cty~drive, data=vehicles,xlab='drive',ylab='city fuel economy',main='Drive train-city fuel economy boxplot')

plot of chunk unnamed-chunk-9

boxplot(cty~class, data=vehicles,xlab='size class',ylab='city fuel economy',main='Size class-city fuel economy boxplot')

plot of chunk unnamed-chunk-9

boxplot(cty~fuel, data=vehicles, xlab='fuel type',ylab='city fuel economy',main='Fuel type-city fuel economy boxplot')

plot of chunk unnamed-chunk-9

boxplot(cty~cyl, data=vehicles, xlab='number of cylinders',ylab='city fuel economy',main='Number of cylinders-city fuel economy boxplot')

plot of chunk unnamed-chunk-9

From plots above, influence from most factors are not obvious, except for fuel type and number of cylinders

ANOVA for raw data

#Create linear model and conduct ANOVA
model1=lm(cty~year+drive+class+fuel+cyl,data=vehicles)
anova(model1)
## Analysis of Variance Table
## 
## Response: cty
##              Df Sum Sq Mean Sq F value Pr(>F)    
## year          2  21887   10943    1462 <2e-16 ***
## drive         2  46327   23164    3095 <2e-16 ***
## class         2  78989   39494    5277 <2e-16 ***
## fuel          3  54626   18209    2433 <2e-16 ***
## cyl           2 249119  124560   16644 <2e-16 ***
## Residuals 33372 249742       7                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA result is different to boxplot, all 5 factor return p-values equal to 0 in ANOVA, probably because there are too many observations therefore difference cannot be obviously found in boxplot. From ANOVA result we can reject the null and accept the alternative hypothesis, which is the variance of city fuel economy can be explained by something other than randomization.

Taguchi design

#Loading packages
library(qualityTools)
## Warning: package 'qualityTools' was built under R version 3.1.2
library(DoE.base)
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design
#Create orthogonal array for Taguchi design
oa = oa.design(factor.names=c("year","drive","class","fuel","cyl"), nlevels=c(3,3,3,4,3), column="min3")
#When column='min3', aliasing between main effects and 2-factor interactions is kept to a minimal degree
oa
##    year drive class fuel cyl
## 1     1     1     2    3   2
## 2     1     3     1    4   2
## 3     1     3     2    3   1
## 4     1     3     3    3   3
## 5     3     1     1    2   1
## 6     3     2     2    3   3
## 7     2     1     3    1   2
## 8     3     3     2    2   2
## 9     3     1     2    1   3
## 10    3     2     1    1   2
## 11    2     3     1    1   3
## 12    2     1     1    3   3
## 13    2     1     3    3   1
## 14    1     3     2    1   2
## 15    1     1     1    2   3
## 16    3     1     3    4   3
## 17    2     3     2    4   3
## 18    1     2     2    2   1
## 19    2     2     3    3   2
## 20    3     3     1    3   2
## 21    3     2     3    4   2
## 22    2     2     2    1   1
## 23    2     1     2    4   2
## 24    1     1     3    2   2
## 25    2     2     1    2   2
## 26    1     2     3    1   3
## 27    2     3     3    2   1
## 28    1     1     1    1   1
## 29    2     2     2    2   3
## 30    2     3     1    4   1
## 31    3     2     1    3   1
## 32    1     2     1    4   3
## 33    3     1     2    4   1
## 34    3     3     3    2   3
## 35    1     2     3    4   1
## 36    3     3     3    1   1
## class=design, type= oa

There are several methods to create orthogonal array for Taguchi design, including param.design, oa.design and taguchiDesign. At first I wanted to use taguchiDesign, however, only certain types of array can be created, unfortunately, array with five three-level factors is not included.

Pickup data according to orthogonal array

#Create new dataset according to orthogonal array 
vehicles1 = merge(oa, vehicles, by=c("year","drive","class","fuel","cyl"), all=FALSE)
head(vehicles1)
##   year drive class fuel cyl    id       make        model           trans
## 1    1     1     1    1   1 11739 Mitsubishi       Mirage    Manual 5-spd
## 2    1     1     1    1   1  6484     Nissan Sentra Coupe    Manual 5-spd
## 3    1     1     1    1   1 15022 Oldsmobile        Alero Automatic 4-spd
## 4    1     1     1    1   1  6526      Honda        Civic    Manual 4-spd
## 5    1     1     1    1   1  1958  Chevrolet     Cavalier Automatic 3-spd
## 6    1     1     1    1   1 29090 Mitsubishi       Mirage    Manual 5-spd
##   displ hwy cty
## 1   1.8  30  23
## 2   1.6  33  24
## 3   2.4  28  19
## 4   1.5  33  28
## 5   2.0  26  20
## 6   1.8  27  22
tail(vehicles1)
##      year drive class fuel cyl    id     make                    model
## 6357    3     3     2    2   2 33852 Cadillac                  XTS AWD
## 6358    3     3     2    2   2 28701     Audi         A6 Avant quattro
## 6359    3     3     2    2   2 33491      BMW             X1 xDrive35i
## 6360    3     3     2    2   2 33096      BMW             740Li xDrive
## 6361    3     3     2    2   2 33091      BMW 535i xDrive Gran Turismo
## 6362    3     3     2    2   2 34355  Porsche               Panamera 4
##               trans displ hwy cty
## 6357 Automatic (S6)   3.6  24  16
## 6358 Automatic (S6)   3.0  26  18
## 6359 Automatic (S6)   3.0  27  18
## 6360 Automatic (S8)   3.0  28  19
## 6361 Automatic (S8)   3.0  26  18
## 6362    Auto(AM-S7)   3.6  27  18

Then remove data with the same combination, and only keep one left.

#Create unique combination
removal=unique(vehicles1[,1:5])
removal
##      year drive class fuel cyl
## 1       1     1     1    1   1
## 4089    1     1     1    2   3
## 4603    1     1     3    2   2
## 4611    1     3     2    1   2
## 5223    2     3     1    1   3
## 5561    2     3     1    4   1
## 5563    2     3     2    4   3
## 5565    3     1     1    2   1
## 5982    3     1     2    1   3
## 6231    3     1     3    4   3
## 6233    3     2     1    1   2
## 6336    3     3     2    2   2
rownames(removal)
##  [1] "1"    "4089" "4603" "4611" "5223" "5561" "5563" "5565" "5982" "6231"
## [11] "6233" "6336"

Create new orthogonal array based on selected rownames.

newcty=vehicles1$cty[c(1,4089,4603,4611,5223,5561,5563,5565,5982,6231,6233,6336)]
newoa=cbind(removal,newcty)
newoa
##      year drive class fuel cyl newcty
## 1       1     1     1    1   1     23
## 4089    1     1     1    2   3     12
## 4603    1     1     3    2   2     15
## 4611    1     3     2    1   2     15
## 5223    2     3     1    1   3     12
## 5561    2     3     1    4   1     19
## 5563    2     3     2    4   3      8
## 5565    3     1     1    2   1     25
## 5982    3     1     2    1   3     12
## 6231    3     1     3    4   3     11
## 6233    3     2     1    1   2     16
## 6336    3     3     2    2   2     19

Signal to Noise ratio is calculated as below. In order to get a higher city fuel economy, use the formula: -10log(1/Yi^2), where Y is the response variable.

#Calculate S/N
SN=-10*log10(1/newoa$newcty^2)
SN
##  [1] 27.23 21.58 23.52 23.52 21.58 25.58 18.06 27.96 21.58 20.83 24.08
## [12] 25.58
#Find optimized combination
optimal=which(SN==max(-10*log10(1/newoa$newcty^2)))
optimal
## [1] 8
newoa[optimal,]
##      year drive class fuel cyl newcty
## 5565    3     1     1    2   1     25

According to result above, the highest S/N appears at the 8th combination, which is also the most optimized one (5565), representing vehicles made after 2010,2-wheel, front or rear-wheel drive, commonly used car, use premium gasoline, and have 2-4 cylinders.

ANOVA to test main effect main effect of each factor in Taguchi design.

model2=lm(newcty~year+drive+class+fuel+cyl, data=newoa)
anova(model2)
## Analysis of Variance Table
## 
## Response: newcty
##           Df Sum Sq Mean Sq F value Pr(>F)  
## year       2   27.0    13.5   13.48  0.189  
## drive      2    1.1     0.6    0.56  0.687  
## class      2  131.3    65.7   65.66  0.087 .
## fuel       2    0.8     0.4    0.40  0.747  
## cyl        2  123.7    61.9   61.86  0.090 .
## Residuals  1    1.0     1.0                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Only year, class and number of cylinder return a p-value smaller than 0.05, indicating main effect is significant. However this does not correspond to ANOVA result from raw data.

Parameter estimation

#shapiro test
shapiro.test(newoa$newcty)
## 
##  Shapiro-Wilk normality test
## 
## data:  newoa$newcty
## W = 0.9464, p-value = 0.5849

Shapiro test reveals a p-value = 0.13 which indicates null hypothesis of shapiro test cannot be rejected, namely response variable is normally distributed.

Diagnostics/model adequacy checking

#qqplot
qqnorm(residuals(model2))
qqline(residuals(model2))

plot of chunk unnamed-chunk-18

#fitted-residual plot
plot(fitted(model2),residuals(model2))

plot of chunk unnamed-chunk-18

As expected from ANOVA result, Q-Q plot and Q-Q line of residuals exhibit near perfect linear pattern of residuals, especially at the middle part, which means the new modelis valid.

Fitted and residuals model also has a very good result with most points appearing at zero, indicating the model is adequate to explian the variance of cty.

4. References to the literature

None

5. Appendices

Note:

1.In this analysis, the most optimized combination also corresponds our common sense, therefore this analysis is both statistically significant and practically significant.

2.According to the result, it is suggested that Taguchi method can be a good guide. However ANOVA of Taguchi design returns a different result to ANOVA of raw data, probably the way levels categorized, the way to remove unnecessary combinations when new dataset is obtained from orthogonal array may influence experiment result. After all a perfect fitted model will not appear under practicle conditions.

A summary of, or pointer to, the raw data

http://www.fueleconomy.gov/feg/download.shtml