Recipe 10: Taguchi Design for Vehicles Dataset

Design of Experiments

ISYE 6020

Trevor Manzanares

Rensselaer Polytechnic Institute

12/8/14

The dataset under analysis includes data on vehicles from the EPA. This experiment will seek to understand which factors have an effect on highway fuel efficiency (miles per gallon).

remove(list=ls())
library("fueleconomy", lib.loc="~/R/win-library/3.1") 

#view first few lines
head(vehicles)
##      id       make               model year                       class
## 1 27550 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5  1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6  1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
##             trans            drive cyl displ    fuel hwy cty
## 1 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 2 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 3 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 4 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 5 Automatic 3-spd Rear-Wheel Drive   4   2.5 Regular  17  16
## 6 Automatic 3-spd Rear-Wheel Drive   6   4.2 Regular  13  13
#observe the structure of the data
str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame':    33442 obs. of  12 variables:
##  $ id   : int  27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
##  $ make : chr  "AM General" "AM General" "AM General" "AM General" ...
##  $ model: chr  "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
##  $ year : int  1984 1984 1984 1984 1985 1985 1987 1997 1997 1997 ...
##  $ class: chr  "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" ...
##  $ trans: chr  "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
##  $ drive: chr  "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" ...
##  $ cyl  : int  4 4 6 6 4 6 6 4 4 6 ...
##  $ displ: num  2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
##  $ fuel : chr  "Regular" "Regular" "Regular" "Regular" ...
##  $ hwy  : int  17 17 13 13 17 13 21 26 28 26 ...
##  $ cty  : int  18 18 13 13 16 13 14 20 22 18 ...
#33442 observations of 12 variables

The data are tabluated into 12 columns, with detailed information on vehicle id, make, model, year, class, transmission, drive, cylinders, displacement, fuel type, and highway and city mileage.

summary(vehicles)
##        id            make              model                year     
##  Min.   :    1   Length:33442       Length:33442       Min.   :1984  
##  1st Qu.: 8361   Class :character   Class :character   1st Qu.:1991  
##  Median :16724   Mode  :character   Mode  :character   Median :1999  
##  Mean   :17038                                         Mean   :1999  
##  3rd Qu.:25265                                         3rd Qu.:2008  
##  Max.   :34932                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl       
##  Length:33442       Length:33442       Length:33442       Min.   : 2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.00  
##                                                           Mean   : 5.77  
##                                                           3rd Qu.: 6.00  
##                                                           Max.   :16.00  
##                                                           NA's   :58     
##      displ          fuel                hwy             cty       
##  Min.   :0.00   Length:33442       Min.   :  9.0   Min.   :  6.0  
##  1st Qu.:2.30   Class :character   1st Qu.: 19.0   1st Qu.: 15.0  
##  Median :3.00   Mode  :character   Median : 23.0   Median : 17.0  
##  Mean   :3.35                      Mean   : 23.6   Mean   : 17.5  
##  3rd Qu.:4.30                      3rd Qu.: 27.0   3rd Qu.: 20.0  
##  Max.   :8.40                      Max.   :109.0   Max.   :138.0  
##  NA's   :57

Randomization is not applicable because the data were collected by the Environmental Protection Agency (EPA) from 1985 to 2015, containing the above listed variables. Each vehicle has complete data.

The null hypothesis is that each of the 6 predictor variables has no effect on the response, highway gas mileage (hwy). Analysis of Variance (ANOVA) will be used to understand only the main effects of year (32 levels), class (34 levels), drive (7 levels), cyl (9 levels), and displacement (64 levels) on the response. In order to fit a Taguchi orthogonal array, the variables of interest were compressed to only 3 levels apiece for all but “year”, which was compressed to 5 levels.

For year, vehicles were categorized by year as follows: 1984:1990 - “1”; 1991:1996 - “2”; 1997:2002 - “3”; 2003:2008 - “4”; 2009:2015 - “5”. For vehicle class, vehicles were categorized subjectively according to “small”, “midsize”, and “fullsize” as “1”, “2”, and “3”, respectively. For vehicle drive, vehicles were categorized according to “2WD”, “4WD”, and “AWD” as “1”, “2”, and “3”, respectively. For # of cylinders, vehicles with at least 2 cylinders and less than 4 cylinders were categorized as “1”, vehicles with 4 to 6 cylinders (inclusive) were categorized as “2”, and vehicles with 8 to 16 cylinders (inclusive) were categorized as “3”. For engine displacement, vehicles with 0 to 2.3 litre engines (inclusive) were categorized as “1”, vehicles with 2.3 to 4.3 litre engines (exclusive) were categorized as “2”, and vehicles with 4.3 or more litre engines were categorized as “3”. These thresholds were chosen because they represent the 25th and 75th quartiles. The reason “year” was condensed to 5 levels instead of 3 like the other variables was because having only 3 levels for “year” produced an L18 orthogonal array which resulted in too few corresponding unique row value combinations in the vehicles dataset and the ANOVA test was unreliable because the model was essentially a perfect fit.

For this experiment, there are replicates but not repeated measures. There are multiple observations for the same make and model vehicles but one vehicle will only have one predetermined set of static attributes, therefore there are no repeated measures.

Blocking will not be utilized here for simplicity’s sake and because the objective is to focus on the design of the experiment.

Condense variables of interest into 3 levels apiece (5 levels for year):

vehicles$year[ as.numeric(vehicles$year)>= 1984 &  as.numeric(vehicles$year) <= 1990 ] = "1"
vehicles$year[ as.numeric(vehicles$year)>= 1991 &  as.numeric(vehicles$year) <= 1996 ] = "2"
vehicles$year[ as.numeric(vehicles$year)>= 1997 &  as.numeric(vehicles$year) <= 2002 ] = "3"
vehicles$year[ as.numeric(vehicles$year)>= 2003 &  as.numeric(vehicles$year) <= 2008 ] = "4"
vehicles$year[ as.numeric(vehicles$year)>= 2009] = "5"
unique(vehicles$year)
## [1] "1" "3" "2" "4" "5"
unique(vehicles$class)
##  [1] "Special Purpose Vehicle 2WD"       
##  [2] "Midsize Cars"                      
##  [3] "Subcompact Cars"                   
##  [4] "Compact Cars"                      
##  [5] "Sport Utility Vehicle - 4WD"       
##  [6] "Small Sport Utility Vehicle 2WD"   
##  [7] "Small Sport Utility Vehicle 4WD"   
##  [8] "Two Seaters"                       
##  [9] "Sport Utility Vehicle - 2WD"       
## [10] "Special Purpose Vehicles"          
## [11] "Special Purpose Vehicle 4WD"       
## [12] "Small Station Wagons"              
## [13] "Minicompact Cars"                  
## [14] "Midsize-Large Station Wagons"      
## [15] "Midsize Station Wagons"            
## [16] "Large Cars"                        
## [17] "Standard Sport Utility Vehicle 4WD"
## [18] "Standard Sport Utility Vehicle 2WD"
## [19] "Minivan - 4WD"                     
## [20] "Minivan - 2WD"                     
## [21] "Vans"                              
## [22] "Vans, Cargo Type"                  
## [23] "Vans, Passenger Type"              
## [24] "Standard Pickup Trucks 2WD"        
## [25] "Standard Pickup Trucks"            
## [26] "Standard Pickup Trucks/2wd"        
## [27] "Small Pickup Trucks 2WD"           
## [28] "Standard Pickup Trucks 4WD"        
## [29] "Small Pickup Trucks 4WD"           
## [30] "Small Pickup Trucks"               
## [31] "Vans Passenger"                    
## [32] "Special Purpose Vehicle"           
## [33] "Special Purpose Vehicles/2wd"      
## [34] "Special Purpose Vehicles/4wd"
vehicles$class[ as.character(vehicles$class) == "Compact Cars" |  as.character(vehicles$class) == "Subcompact Cars" |  as.character(vehicles$class) == "Two Seaters" |  as.character(vehicles$class) == "Small Station Wagons" |  as.character(vehicles$class) == "Minicompact Cars"] = "1"

vehicles$class[ (vehicles$class) == "Special Purpose Vehicle 2WD" |  (vehicles$class) == "Midsize Cars" |  (vehicles$class) == "Special Purpose Vehicles" | (vehicles$class) == "Special Purpose Vehicle 4WD"|  (vehicles$class) == "Midsize-Large Station Wagons"|  (vehicles$class) == "Midsize Station Wagons"|  (vehicles$class) == "Large Cars"|  (vehicles$class) == "Minivan - 4WD"|  (vehicles$class) == "Minivan - 2WD"|  (vehicles$class) == "Vans"|  (vehicles$class) == "Vans, Passenger Type"|  (vehicles$class) == "Small Pickup Trucks 2WD"|  (vehicles$class) == "Small Pickup Trucks 4WD"|  (vehicles$class) == "Small Pickup Trucks"|  (vehicles$class) == "Vans Passenger"|  (vehicles$class) == "Special Purpose Vehicle"|  (vehicles$class) == "Special Purpose Vehicles/2wd"|  (vehicles$class) == "Special Purpose Vehicles/4wd"] = "2"

vehicles$class[ (vehicles$class) == "Sport Utility Vehicle - 4WD" |  (vehicles$class) == "Small Sport Utility Vehicle 2WD" |  (vehicles$class) == "Small Sport Utility Vehicle 4WD"|  (vehicles$class) == "Sport Utility Vehicle - 2WD"|  (vehicles$class) == "Standard Sport Utility Vehicle 4WD"|  (vehicles$class) == "Standard Sport Utility Vehicle 2WD"|  (vehicles$class) == "Vans, Cargo Type"|  (vehicles$class) == "Standard Pickup Trucks 2WD"|  (vehicles$class) == "Standard Pickup Trucks"|  (vehicles$class) == "Standard Pickup Trucks/2wd"|  (vehicles$class) == "Standard Pickup Trucks 4WD"] = "3"
unique(vehicles$class)
## [1] "2" "1" "3"
unique(vehicles$drive)
## [1] "2-Wheel Drive"              "Rear-Wheel Drive"          
## [3] "Front-Wheel Drive"          "4-Wheel or All-Wheel Drive"
## [5] "All-Wheel Drive"            "4-Wheel Drive"             
## [7] "Part-time 4-Wheel Drive"
vehicles$drive[ (vehicles$drive) == "2-Wheel Drive" |  (vehicles$drive) == "Rear-Wheel Drive" |  (vehicles$drive) == "Front-Wheel Drive"] = "1"

vehicles$drive[ (vehicles$drive) == "Part-time 4-Wheel Drive" |  (vehicles$drive) == "4-Wheel Drive"] = "2"

vehicles$drive[ (vehicles$drive) == "All-Wheel Drive" | (vehicles$drive) == "4-Wheel or All-Wheel Drive"] = "3"
unique(vehicles$drive)
## [1] "1" "3" "2"
unique(vehicles$cyl)
##  [1]  4  6  5  8 12 10 NA 16  3  2
vehicles$cyl[ as.numeric(vehicles$cyl) >= 2 & as.numeric(vehicles$cyl) < 4 | (vehicles$cyl) == "NA"] = "1"

vehicles$cyl[ as.numeric(vehicles$cyl) >= 4 & as.numeric(vehicles$cyl) <= 6] = "2"

vehicles$cyl[ as.numeric(vehicles$cyl) >= 8 & as.numeric(vehicles$cyl) <= 16] = "3"
unique(vehicles$cyl)
## [1] "2" "3" NA  "1"
unique(vehicles$displ)
##  [1] 2.5 4.2 3.8 2.2 3.0 2.3 3.2 3.5 2.0 2.4 1.5 1.6 1.8 1.7 2.7 3.7 4.7
## [18] 5.9 5.3 4.3 2.8 2.1 3.1 4.0 6.0 6.3 3.6 5.2 4.9 5.0  NA 1.9 3.4 4.4
## [35] 4.8 5.4 5.6 4.6 6.7 6.8 3.9 8.0 3.3 5.7 4.1 1.4 4.5 6.2 6.5 7.4 7.0
## [52] 2.9 1.0 1.3 1.2 6.4 6.1 2.6 8.3 8.4 5.5 5.8 0.0 1.1 6.6
summary(vehicles$displ)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##    0.00    2.30    3.00    3.35    4.30    8.40      57
vehicles$displ[as.numeric(vehicles$displ) >= 0 & as.numeric(vehicles$displ) <= 2.3] = "1"

vehicles$displ[as.numeric(vehicles$displ) > 2.3 & as.numeric(vehicles$displ) < 4.3] = "2"

vehicles$displ[as.numeric(vehicles$displ) >= 4.3] = "3"
unique(vehicles$displ)
## [1] "2" "1" "3" NA

Convert variables of interest to factors:

vehicles$year=as.factor(vehicles$year)
vehicles$class=as.factor(vehicles$class)
vehicles$drive=as.factor(vehicles$drive)
vehicles$cyl=as.factor(vehicles$cyl)
vehicles$displ=as.factor(vehicles$displ)
str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame':    33442 obs. of  12 variables:
##  $ id   : int  27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
##  $ make : chr  "AM General" "AM General" "AM General" "AM General" ...
##  $ model: chr  "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
##  $ year : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 3 3 3 ...
##  $ class: Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 1 1 1 ...
##  $ trans: chr  "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
##  $ drive: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
##  $ cyl  : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
##  $ displ: Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 1 1 2 ...
##  $ fuel : chr  "Regular" "Regular" "Regular" "Regular" ...
##  $ hwy  : int  17 17 13 13 17 13 21 26 28 26 ...
##  $ cty  : int  18 18 13 13 16 13 14 20 22 18 ...

Graph showing trends in the data:

par(mfrow=c(1,1))
plot(vehicles$hwy~vehicles$year,xlab="year",ylab="Highway Miles Per Gallon")

plot of chunk unnamed-chunk-5

plot(vehicles$hwy~vehicles$class,xlab="class",ylab="Highway Miles Per Gallon")

plot of chunk unnamed-chunk-5

plot(vehicles$hwy~vehicles$drive,xlab="drive",ylab="Highway Miles Per Gallon")

plot of chunk unnamed-chunk-5

plot(vehicles$hwy~vehicles$cyl,xlab="# of cylinders",ylab="Highway Miles Per Gallon")

plot of chunk unnamed-chunk-5

plot(vehicles$hwy~vehicles$displ,xlab="engine displacement",ylab="Highway Miles Per Gallon")

plot of chunk unnamed-chunk-5

This project seeks to discover if year, class, drive, number of cylinders, and engine displacement may have an individual effect on highway gas mileage. Prior to using Taguchi design methods, a linear model will be designed and analysis of variance conducted to determine if the individual effects are statistically significant. This will give us a baseline to determine if the Taguchi Design significantly alters the results.

Create Initial ANOVA model:

model1=lm(vehicles$hwy~vehicles$year+vehicles$class+vehicles$drive+vehicles$cyl+vehicles$displ, data=vehicles)
anova(model1)
## Analysis of Variance Table
## 
## Response: vehicles$hwy
##                   Df Sum Sq Mean Sq F value Pr(>F)    
## vehicles$year      4  58426   14606    1270 <2e-16 ***
## vehicles$class     2 292150  146075   12700 <2e-16 ***
## vehicles$drive     2  32538   16269    1414 <2e-16 ***
## vehicles$cyl       2 215337  107669    9361 <2e-16 ***
## vehicles$displ     2 111917   55959    4865 <2e-16 ***
## Residuals      33371 383824      12                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In this particular model based on a full factorial design, we reject the H0 that each of the factors have no effect on the response. Thus, variation in the response can be attributed to the first order effects of year, class, drive, cyl, and displ. The percentage contributions are 5.3%, 26.7%, 2.9% 19.7%, and 10.2%,respectively. Residual error accounted for 35%, which is not surprising because the R^2 value was relatively low at 65%.

Next we contruct a Taguchi design using an orthogonal array based on the highest number of levels contained by the chosen factors. We have condensed each of the factors down to 3 levels, with the exception of “year” (condensed to 5 levels) so the Taguchi orthogonal array is L45. Usually this design would produce an “L25” orthogonal array, but the argument “min3” is used because optimum column allocation is not switched on by default, expanding the array. With the “min3” option, aliasing between main effects and 2-factor interactions is kept to a minimal degree.

library(qualityTools)
## Warning: package 'qualityTools' was built under R version 3.1.2
library(DoE.base)
## Loading required package: grid
## Loading required package: conf.design
## 
## Attaching package: 'DoE.base'
## 
## The following objects are masked from 'package:stats':
## 
##     aov, lm
## 
## The following object is masked from 'package:graphics':
## 
##     plot.design
array = oa.design(factor.names=c("year","class","drive","cyl","displ"), nlevels=c(5,3,3,3,3),columns="min3")
array
##    year class drive cyl displ
## 1     1     1     3   3     2
## 2     1     2     2   2     1
## 3     4     3     1   3     2
## 4     4     2     2   3     3
## 5     2     3     2   3     2
## 6     2     1     1   3     1
## 7     2     2     1   2     2
## 8     3     1     3   3     3
## 9     3     3     3   1     2
## 10    3     3     2   2     3
## 11    5     3     3   2     3
## 12    4     1     1   2     3
## 13    5     2     3   3     2
## 14    1     3     2   2     2
## 15    2     2     1   3     3
## 16    4     3     3   1     3
## 17    4     3     1   2     1
## 18    5     1     2   2     2
## 19    2     3     2   1     3
## 20    4     2     3   2     2
## 21    4     2     3   1     1
## 22    3     1     1   2     2
## 23    3     1     2   2     1
## 24    3     2     3   3     1
## 25    2     3     3   2     1
## 26    1     1     1   1     1
## 27    1     3     1   3     3
## 28    3     2     1   1     3
## 29    5     3     2   3     1
## 30    1     2     1   1     2
## 31    4     1     2   1     2
## 32    5     1     3   1     1
## 33    5     2     1   2     1
## 34    1     1     2   1     3
## 35    4     1     2   3     1
## 36    2     1     3   1     2
## 37    3     3     1   1     1
## 38    3     2     2   3     2
## 39    1     2     3   2     3
## 40    1     3     3   3     1
## 41    2     1     3   2     3
## 42    5     2     2   1     3
## 43    2     2     2   1     1
## 44    5     1     1   3     3
## 45    5     3     1   1     2
## class=design, type= oa

Here we use the orthogonal array to select only the experimental runs from the vehicles dataset that correspond with the exact row value combinations found in the orthogonal array:

newdataset = merge(array, vehicles, by=c("year","class","drive","cyl","displ"), all = FALSE)
head(newdataset)
##   year class drive cyl displ   id    make     model           trans
## 1    1     1     1   1     1 5276   Mazda      RX-7    Manual 5-spd
## 2    1     1     1   1     1 5275   Mazda      RX-7 Automatic 4-spd
## 3    1     1     1   1     1   22   Mazda      RX-7    Manual 5-spd
## 4    1     1     1   1     1 6583 Pontiac   Firefly    Manual 5-spd
## 5    1     1     1   1     1 2921   Mazda      RX-7    Manual 5-spd
## 6    1     1     1   1     1 5382     Geo Metro LSI Automatic 3-spd
##      fuel hwy cty
## 1 Regular  23  15
## 2 Regular  21  15
## 3 Regular  21  15
## 4 Regular  45  38
## 5 Regular  21  15
## 6 Regular  36  32
#remove duplicate rows with same value combinations existing in range of columns 1:5
unique = unique(newdataset[ , 1:5])
unique
##      year class drive cyl displ
## 1       1     1     1   1     1
## 92      1     2     3   2     3
## 134     1     3     1   3     3
## 603     2     2     1   2     2
## 1615    2     2     1   3     3
## 2050    2     3     3   2     1
## 2057    3     1     1   2     2
## 2641    3     1     3   3     3
## 2647    4     2     3   2     2
## 2833    4     3     1   2     1
## 2940    5     1     1   3     3
## 3318    5     1     2   2     2
## 3394    5     2     1   2     1
## 3667    5     2     3   3     2
## 3693    5     3     3   2     3
#find values of response variable corresponding to the indices of unique row values
rownames(unique)
##  [1] "1"    "92"   "134"  "603"  "1615" "2050" "2057" "2641" "2647" "2833"
## [11] "2940" "3318" "3394" "3667" "3693"
hwy = newdataset$hwy[index=c(1,92,134,603,1615,2050,2057,2641,2647,2833,2940,3318,3394,3667,3693)]
hwy
##  [1] 23 17 18 23 16 24 27 13 24 27 24 28 47 27 18
#append values of response variables to appropriate unique rows
neworthogonalarray = cbind(unique,hwy)
neworthogonalarray
##      year class drive cyl displ hwy
## 1       1     1     1   1     1  23
## 92      1     2     3   2     3  17
## 134     1     3     1   3     3  18
## 603     2     2     1   2     2  23
## 1615    2     2     1   3     3  16
## 2050    2     3     3   2     1  24
## 2057    3     1     1   2     2  27
## 2641    3     1     3   3     3  13
## 2647    4     2     3   2     2  24
## 2833    4     3     1   2     1  27
## 2940    5     1     1   3     3  24
## 3318    5     1     2   2     2  28
## 3394    5     2     1   2     1  47
## 3667    5     2     3   3     2  27
## 3693    5     3     3   2     3  18

Next we compute the Signal to Noise ratio for each experimental run in the orthogonal array according to a “higher is better” mentality (higher gas mileage). This formula is given by: -10log(1/Yi^2), where Y is the response value of the i-th run.

sn = -10*log10(1/neworthogonalarray$hwy^2)
sn
##  [1] 27.23 24.61 25.11 27.23 24.08 27.60 28.63 22.28 27.60 28.63 27.60
## [12] 28.94 33.44 28.63 25.11

Then we find the “optimal” factor assignments by choosing the experimental run from the orthogonal array with the highest Signal to Noise ratio.

index = which(sn==max(-10*log10(1/neworthogonalarray$hwy^2)))
index
## [1] 13
neworthogonalarray[index, ]
##      year class drive cyl displ hwy
## 3394    5     2     1   2     1  47

The highest signal to noise ratio corresponds to a certain row of the orthogonal array and its given factor level assignments for “year”, “class”, “drive”, “cyl”, and “displacement”. This means that in order to get the highest highway gas mileage, a vehicle must be manufactured in 2009 or later, midsize, 2-wheel drive, 4 to 6 cylinders, and have less than a 2.3 litre engine.

Create Secondary ANOVA model testing main factor effects of each of the variables of interest:

model2=lm(neworthogonalarray$hwy~neworthogonalarray$year+neworthogonalarray$class+neworthogonalarray$drive+neworthogonalarray$cyl+neworthogonalarray$displ, data=neworthogonalarray)
anova(model2)
## Analysis of Variance Table
## 
## Response: neworthogonalarray$hwy
##                          Df Sum Sq Mean Sq F value Pr(>F)
## neworthogonalarray$year   4  243.0    60.7    3.75   0.22
## neworthogonalarray$class  2   32.0    16.0    0.99   0.50
## neworthogonalarray$drive  2  164.9    82.5    5.09   0.16
## neworthogonalarray$cyl    2  165.0    82.5    5.09   0.16
## neworthogonalarray$displ  2  241.6   120.8    7.45   0.12
## Residuals                 2   32.4    16.2

In this particular model based on a Taguchi L45 orthogonal array, we fail to reject the H0 that “year”, “class”, drive“,”cyl“, and”displ" have no effect on the response. Thus, variation in the response cannot be attributed to anything other than randomization. These results are much different than the original full factorial design of the vehicles datatset, which leads us to conclude that a Taguchi design with the chosen number of levels per factor may be too highly fractionalized for this experiment. A good direction for future research would be to choose different variables and manipulate the number of levels in an effort to achieve similar results as the full factorial design.

Estimate Parameters using Shapiro-Wilk test of normality

shapiro.test(neworthogonalarray$hwy)
## 
##  Shapiro-Wilk normality test
## 
## data:  neworthogonalarray$hwy
## W = 0.8301, p-value = 0.009201

We reject the H0 that the response variable, highway miles per gallon, is normally distributed.

Diagnostics and Model Adequacy Checking:

par(mfrow=c(2,2))
plot(model2)
## Warning: not plotting observations with leverage one:
##   1, 11, 12, 15
## Warning: not plotting observations with leverage one:
##   1, 11, 12, 15

plot of chunk unnamed-chunk-13

The model appears to have a very good fit to the data since the points are evenly dispersed across the Cartesian plane of residual error and the fitted model.

References:

These data are one of the original Hadley Wickham datasets compiled by the US EPA.

https://github.com/hadley/fueleconomy