The dataset under analysis includes data on vehicles from the EPA. This experiment will seek to understand which factors have an effect on highway fuel efficiency (miles per gallon).
remove(list=ls())
library("fueleconomy", lib.loc="~/R/win-library/3.1")
#view first few lines
head(vehicles)
## id make model year class
## 1 27550 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5 1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6 1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
## trans drive cyl displ fuel hwy cty
## 1 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 2 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 3 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 4 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 5 Automatic 3-spd Rear-Wheel Drive 4 2.5 Regular 17 16
## 6 Automatic 3-spd Rear-Wheel Drive 6 4.2 Regular 13 13
#observe the structure of the data
str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame': 33442 obs. of 12 variables:
## $ id : int 27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
## $ make : chr "AM General" "AM General" "AM General" "AM General" ...
## $ model: chr "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
## $ year : int 1984 1984 1984 1984 1985 1985 1987 1997 1997 1997 ...
## $ class: chr "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" ...
## $ trans: chr "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
## $ drive: chr "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" ...
## $ cyl : int 4 4 6 6 4 6 6 4 4 6 ...
## $ displ: num 2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
## $ fuel : chr "Regular" "Regular" "Regular" "Regular" ...
## $ hwy : int 17 17 13 13 17 13 21 26 28 26 ...
## $ cty : int 18 18 13 13 16 13 14 20 22 18 ...
#33442 observations of 12 variables
The data are tabluated into 12 columns, with detailed information on vehicle id, make, model, year, class, transmission, drive, cylinders, displacement, fuel type, and highway and city mileage.
summary(vehicles)
## id make model year
## Min. : 1 Length:33442 Length:33442 Min. :1984
## 1st Qu.: 8361 Class :character Class :character 1st Qu.:1991
## Median :16724 Mode :character Mode :character Median :1999
## Mean :17038 Mean :1999
## 3rd Qu.:25265 3rd Qu.:2008
## Max. :34932 Max. :2015
##
## class trans drive cyl
## Length:33442 Length:33442 Length:33442 Min. : 2.00
## Class :character Class :character Class :character 1st Qu.: 4.00
## Mode :character Mode :character Mode :character Median : 6.00
## Mean : 5.77
## 3rd Qu.: 6.00
## Max. :16.00
## NA's :58
## displ fuel hwy cty
## Min. :0.00 Length:33442 Min. : 9.0 Min. : 6.0
## 1st Qu.:2.30 Class :character 1st Qu.: 19.0 1st Qu.: 15.0
## Median :3.00 Mode :character Median : 23.0 Median : 17.0
## Mean :3.35 Mean : 23.6 Mean : 17.5
## 3rd Qu.:4.30 3rd Qu.: 27.0 3rd Qu.: 20.0
## Max. :8.40 Max. :109.0 Max. :138.0
## NA's :57
Randomization is not applicable because the data were collected by the Environmental Protection Agency (EPA) from 1985 to 2015, containing the above listed variables. Each vehicle has complete data.
The null hypothesis is that each of the 6 predictor variables has no effect on the response, highway gas mileage (hwy). Analysis of Variance (ANOVA) will be used to understand only the main effects of year (32 levels), class (34 levels), drive (7 levels), cyl (9 levels), and displacement (64 levels) on the response. In order to fit a Taguchi orthogonal array, the variables of interest were compressed to only 3 levels apiece for all but “year”, which was compressed to 5 levels.
For year, vehicles were categorized by year as follows: 1984:1990 - “1”; 1991:1996 - “2”; 1997:2002 - “3”; 2003:2008 - “4”; 2009:2015 - “5”. For vehicle class, vehicles were categorized subjectively according to “small”, “midsize”, and “fullsize” as “1”, “2”, and “3”, respectively. For vehicle drive, vehicles were categorized according to “2WD”, “4WD”, and “AWD” as “1”, “2”, and “3”, respectively. For # of cylinders, vehicles with at least 2 cylinders and less than 4 cylinders were categorized as “1”, vehicles with 4 to 6 cylinders (inclusive) were categorized as “2”, and vehicles with 8 to 16 cylinders (inclusive) were categorized as “3”. For engine displacement, vehicles with 0 to 2.3 litre engines (inclusive) were categorized as “1”, vehicles with 2.3 to 4.3 litre engines (exclusive) were categorized as “2”, and vehicles with 4.3 or more litre engines were categorized as “3”. These thresholds were chosen because they represent the 25th and 75th quartiles. The reason “year” was condensed to 5 levels instead of 3 like the other variables was because having only 3 levels for “year” produced an L18 orthogonal array which resulted in too few corresponding unique row value combinations in the vehicles dataset and the ANOVA test was unreliable because the model was essentially a perfect fit.
For this experiment, there are replicates but not repeated measures. There are multiple observations for the same make and model vehicles but one vehicle will only have one predetermined set of static attributes, therefore there are no repeated measures.
Blocking will not be utilized here for simplicity’s sake and because the objective is to focus on the design of the experiment.
Condense variables of interest into 3 levels apiece (5 levels for year):
vehicles$year[ as.numeric(vehicles$year)>= 1984 & as.numeric(vehicles$year) <= 1990 ] = "1"
vehicles$year[ as.numeric(vehicles$year)>= 1991 & as.numeric(vehicles$year) <= 1996 ] = "2"
vehicles$year[ as.numeric(vehicles$year)>= 1997 & as.numeric(vehicles$year) <= 2002 ] = "3"
vehicles$year[ as.numeric(vehicles$year)>= 2003 & as.numeric(vehicles$year) <= 2008 ] = "4"
vehicles$year[ as.numeric(vehicles$year)>= 2009] = "5"
unique(vehicles$year)
## [1] "1" "3" "2" "4" "5"
unique(vehicles$class)
## [1] "Special Purpose Vehicle 2WD"
## [2] "Midsize Cars"
## [3] "Subcompact Cars"
## [4] "Compact Cars"
## [5] "Sport Utility Vehicle - 4WD"
## [6] "Small Sport Utility Vehicle 2WD"
## [7] "Small Sport Utility Vehicle 4WD"
## [8] "Two Seaters"
## [9] "Sport Utility Vehicle - 2WD"
## [10] "Special Purpose Vehicles"
## [11] "Special Purpose Vehicle 4WD"
## [12] "Small Station Wagons"
## [13] "Minicompact Cars"
## [14] "Midsize-Large Station Wagons"
## [15] "Midsize Station Wagons"
## [16] "Large Cars"
## [17] "Standard Sport Utility Vehicle 4WD"
## [18] "Standard Sport Utility Vehicle 2WD"
## [19] "Minivan - 4WD"
## [20] "Minivan - 2WD"
## [21] "Vans"
## [22] "Vans, Cargo Type"
## [23] "Vans, Passenger Type"
## [24] "Standard Pickup Trucks 2WD"
## [25] "Standard Pickup Trucks"
## [26] "Standard Pickup Trucks/2wd"
## [27] "Small Pickup Trucks 2WD"
## [28] "Standard Pickup Trucks 4WD"
## [29] "Small Pickup Trucks 4WD"
## [30] "Small Pickup Trucks"
## [31] "Vans Passenger"
## [32] "Special Purpose Vehicle"
## [33] "Special Purpose Vehicles/2wd"
## [34] "Special Purpose Vehicles/4wd"
vehicles$class[ as.character(vehicles$class) == "Compact Cars" | as.character(vehicles$class) == "Subcompact Cars" | as.character(vehicles$class) == "Two Seaters" | as.character(vehicles$class) == "Small Station Wagons" | as.character(vehicles$class) == "Minicompact Cars"] = "1"
vehicles$class[ (vehicles$class) == "Special Purpose Vehicle 2WD" | (vehicles$class) == "Midsize Cars" | (vehicles$class) == "Special Purpose Vehicles" | (vehicles$class) == "Special Purpose Vehicle 4WD"| (vehicles$class) == "Midsize-Large Station Wagons"| (vehicles$class) == "Midsize Station Wagons"| (vehicles$class) == "Large Cars"| (vehicles$class) == "Minivan - 4WD"| (vehicles$class) == "Minivan - 2WD"| (vehicles$class) == "Vans"| (vehicles$class) == "Vans, Passenger Type"| (vehicles$class) == "Small Pickup Trucks 2WD"| (vehicles$class) == "Small Pickup Trucks 4WD"| (vehicles$class) == "Small Pickup Trucks"| (vehicles$class) == "Vans Passenger"| (vehicles$class) == "Special Purpose Vehicle"| (vehicles$class) == "Special Purpose Vehicles/2wd"| (vehicles$class) == "Special Purpose Vehicles/4wd"] = "2"
vehicles$class[ (vehicles$class) == "Sport Utility Vehicle - 4WD" | (vehicles$class) == "Small Sport Utility Vehicle 2WD" | (vehicles$class) == "Small Sport Utility Vehicle 4WD"| (vehicles$class) == "Sport Utility Vehicle - 2WD"| (vehicles$class) == "Standard Sport Utility Vehicle 4WD"| (vehicles$class) == "Standard Sport Utility Vehicle 2WD"| (vehicles$class) == "Vans, Cargo Type"| (vehicles$class) == "Standard Pickup Trucks 2WD"| (vehicles$class) == "Standard Pickup Trucks"| (vehicles$class) == "Standard Pickup Trucks/2wd"| (vehicles$class) == "Standard Pickup Trucks 4WD"] = "3"
unique(vehicles$class)
## [1] "2" "1" "3"
unique(vehicles$drive)
## [1] "2-Wheel Drive" "Rear-Wheel Drive"
## [3] "Front-Wheel Drive" "4-Wheel or All-Wheel Drive"
## [5] "All-Wheel Drive" "4-Wheel Drive"
## [7] "Part-time 4-Wheel Drive"
vehicles$drive[ (vehicles$drive) == "2-Wheel Drive" | (vehicles$drive) == "Rear-Wheel Drive" | (vehicles$drive) == "Front-Wheel Drive"] = "1"
vehicles$drive[ (vehicles$drive) == "Part-time 4-Wheel Drive" | (vehicles$drive) == "4-Wheel Drive"] = "2"
vehicles$drive[ (vehicles$drive) == "All-Wheel Drive" | (vehicles$drive) == "4-Wheel or All-Wheel Drive"] = "3"
unique(vehicles$drive)
## [1] "1" "3" "2"
unique(vehicles$cyl)
## [1] 4 6 5 8 12 10 NA 16 3 2
vehicles$cyl[ as.numeric(vehicles$cyl) >= 2 & as.numeric(vehicles$cyl) < 4 | (vehicles$cyl) == "NA"] = "1"
vehicles$cyl[ as.numeric(vehicles$cyl) >= 4 & as.numeric(vehicles$cyl) <= 6] = "2"
vehicles$cyl[ as.numeric(vehicles$cyl) >= 8 & as.numeric(vehicles$cyl) <= 16] = "3"
unique(vehicles$cyl)
## [1] "2" "3" NA "1"
unique(vehicles$displ)
## [1] 2.5 4.2 3.8 2.2 3.0 2.3 3.2 3.5 2.0 2.4 1.5 1.6 1.8 1.7 2.7 3.7 4.7
## [18] 5.9 5.3 4.3 2.8 2.1 3.1 4.0 6.0 6.3 3.6 5.2 4.9 5.0 NA 1.9 3.4 4.4
## [35] 4.8 5.4 5.6 4.6 6.7 6.8 3.9 8.0 3.3 5.7 4.1 1.4 4.5 6.2 6.5 7.4 7.0
## [52] 2.9 1.0 1.3 1.2 6.4 6.1 2.6 8.3 8.4 5.5 5.8 0.0 1.1 6.6
summary(vehicles$displ)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.00 2.30 3.00 3.35 4.30 8.40 57
vehicles$displ[as.numeric(vehicles$displ) >= 0 & as.numeric(vehicles$displ) <= 2.3] = "1"
vehicles$displ[as.numeric(vehicles$displ) > 2.3 & as.numeric(vehicles$displ) < 4.3] = "2"
vehicles$displ[as.numeric(vehicles$displ) >= 4.3] = "3"
unique(vehicles$displ)
## [1] "2" "1" "3" NA
Convert variables of interest to factors:
vehicles$year=as.factor(vehicles$year)
vehicles$class=as.factor(vehicles$class)
vehicles$drive=as.factor(vehicles$drive)
vehicles$cyl=as.factor(vehicles$cyl)
vehicles$displ=as.factor(vehicles$displ)
str(vehicles)
## Classes 'tbl_df', 'tbl' and 'data.frame': 33442 obs. of 12 variables:
## $ id : int 27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
## $ make : chr "AM General" "AM General" "AM General" "AM General" ...
## $ model: chr "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
## $ year : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 3 3 3 ...
## $ class: Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 1 1 1 ...
## $ trans: chr "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
## $ drive: Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ cyl : Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 2 2 2 ...
## $ displ: Factor w/ 3 levels "1","2","3": 2 2 2 2 2 2 2 1 1 2 ...
## $ fuel : chr "Regular" "Regular" "Regular" "Regular" ...
## $ hwy : int 17 17 13 13 17 13 21 26 28 26 ...
## $ cty : int 18 18 13 13 16 13 14 20 22 18 ...
Graph showing trends in the data:
par(mfrow=c(1,1))
plot(vehicles$hwy~vehicles$year,xlab="year",ylab="Highway Miles Per Gallon")
plot(vehicles$hwy~vehicles$class,xlab="class",ylab="Highway Miles Per Gallon")
plot(vehicles$hwy~vehicles$drive,xlab="drive",ylab="Highway Miles Per Gallon")
plot(vehicles$hwy~vehicles$cyl,xlab="# of cylinders",ylab="Highway Miles Per Gallon")
plot(vehicles$hwy~vehicles$displ,xlab="engine displacement",ylab="Highway Miles Per Gallon")
This project seeks to discover if year, class, drive, number of cylinders, and engine displacement may have an individual effect on highway gas mileage. Prior to using Taguchi design methods, a linear model will be designed and analysis of variance conducted to determine if the individual effects are statistically significant. This will give us a baseline to determine if the Taguchi Design significantly alters the results.
Create Initial ANOVA model:
model1=lm(vehicles$hwy~vehicles$year+vehicles$class+vehicles$drive+vehicles$cyl+vehicles$displ, data=vehicles)
anova(model1)
## Analysis of Variance Table
##
## Response: vehicles$hwy
## Df Sum Sq Mean Sq F value Pr(>F)
## vehicles$year 4 58426 14606 1270 <2e-16 ***
## vehicles$class 2 292150 146075 12700 <2e-16 ***
## vehicles$drive 2 32538 16269 1414 <2e-16 ***
## vehicles$cyl 2 215337 107669 9361 <2e-16 ***
## vehicles$displ 2 111917 55959 4865 <2e-16 ***
## Residuals 33371 383824 12
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this particular model based on a full factorial design, we reject the H0 that each of the factors have no effect on the response. Thus, variation in the response can be attributed to the first order effects of year, class, drive, cyl, and displ. The percentage contributions are 5.3%, 26.7%, 2.9% 19.7%, and 10.2%,respectively. Residual error accounted for 35%, which is not surprising because the R^2 value was relatively low at 65%.
Next we contruct a Taguchi design using an orthogonal array based on the highest number of levels contained by the chosen factors. We have condensed each of the factors down to 3 levels, with the exception of “year” (condensed to 5 levels) so the Taguchi orthogonal array is L45. Usually this design would produce an “L25” orthogonal array, but the argument “min3” is used because optimum column allocation is not switched on by default, expanding the array. With the “min3” option, aliasing between main effects and 2-factor interactions is kept to a minimal degree.
library(qualityTools)
## Warning: package 'qualityTools' was built under R version 3.1.2
library(DoE.base)
## Loading required package: grid
## Loading required package: conf.design
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
array = oa.design(factor.names=c("year","class","drive","cyl","displ"), nlevels=c(5,3,3,3,3),columns="min3")
array
## year class drive cyl displ
## 1 1 1 3 3 2
## 2 1 2 2 2 1
## 3 4 3 1 3 2
## 4 4 2 2 3 3
## 5 2 3 2 3 2
## 6 2 1 1 3 1
## 7 2 2 1 2 2
## 8 3 1 3 3 3
## 9 3 3 3 1 2
## 10 3 3 2 2 3
## 11 5 3 3 2 3
## 12 4 1 1 2 3
## 13 5 2 3 3 2
## 14 1 3 2 2 2
## 15 2 2 1 3 3
## 16 4 3 3 1 3
## 17 4 3 1 2 1
## 18 5 1 2 2 2
## 19 2 3 2 1 3
## 20 4 2 3 2 2
## 21 4 2 3 1 1
## 22 3 1 1 2 2
## 23 3 1 2 2 1
## 24 3 2 3 3 1
## 25 2 3 3 2 1
## 26 1 1 1 1 1
## 27 1 3 1 3 3
## 28 3 2 1 1 3
## 29 5 3 2 3 1
## 30 1 2 1 1 2
## 31 4 1 2 1 2
## 32 5 1 3 1 1
## 33 5 2 1 2 1
## 34 1 1 2 1 3
## 35 4 1 2 3 1
## 36 2 1 3 1 2
## 37 3 3 1 1 1
## 38 3 2 2 3 2
## 39 1 2 3 2 3
## 40 1 3 3 3 1
## 41 2 1 3 2 3
## 42 5 2 2 1 3
## 43 2 2 2 1 1
## 44 5 1 1 3 3
## 45 5 3 1 1 2
## class=design, type= oa
Here we use the orthogonal array to select only the experimental runs from the vehicles dataset that correspond with the exact row value combinations found in the orthogonal array:
newdataset = merge(array, vehicles, by=c("year","class","drive","cyl","displ"), all = FALSE)
head(newdataset)
## year class drive cyl displ id make model trans
## 1 1 1 1 1 1 5276 Mazda RX-7 Manual 5-spd
## 2 1 1 1 1 1 5275 Mazda RX-7 Automatic 4-spd
## 3 1 1 1 1 1 22 Mazda RX-7 Manual 5-spd
## 4 1 1 1 1 1 6583 Pontiac Firefly Manual 5-spd
## 5 1 1 1 1 1 2921 Mazda RX-7 Manual 5-spd
## 6 1 1 1 1 1 5382 Geo Metro LSI Automatic 3-spd
## fuel hwy cty
## 1 Regular 23 15
## 2 Regular 21 15
## 3 Regular 21 15
## 4 Regular 45 38
## 5 Regular 21 15
## 6 Regular 36 32
#remove duplicate rows with same value combinations existing in range of columns 1:5
unique = unique(newdataset[ , 1:5])
unique
## year class drive cyl displ
## 1 1 1 1 1 1
## 92 1 2 3 2 3
## 134 1 3 1 3 3
## 603 2 2 1 2 2
## 1615 2 2 1 3 3
## 2050 2 3 3 2 1
## 2057 3 1 1 2 2
## 2641 3 1 3 3 3
## 2647 4 2 3 2 2
## 2833 4 3 1 2 1
## 2940 5 1 1 3 3
## 3318 5 1 2 2 2
## 3394 5 2 1 2 1
## 3667 5 2 3 3 2
## 3693 5 3 3 2 3
#find values of response variable corresponding to the indices of unique row values
rownames(unique)
## [1] "1" "92" "134" "603" "1615" "2050" "2057" "2641" "2647" "2833"
## [11] "2940" "3318" "3394" "3667" "3693"
hwy = newdataset$hwy[index=c(1,92,134,603,1615,2050,2057,2641,2647,2833,2940,3318,3394,3667,3693)]
hwy
## [1] 23 17 18 23 16 24 27 13 24 27 24 28 47 27 18
#append values of response variables to appropriate unique rows
neworthogonalarray = cbind(unique,hwy)
neworthogonalarray
## year class drive cyl displ hwy
## 1 1 1 1 1 1 23
## 92 1 2 3 2 3 17
## 134 1 3 1 3 3 18
## 603 2 2 1 2 2 23
## 1615 2 2 1 3 3 16
## 2050 2 3 3 2 1 24
## 2057 3 1 1 2 2 27
## 2641 3 1 3 3 3 13
## 2647 4 2 3 2 2 24
## 2833 4 3 1 2 1 27
## 2940 5 1 1 3 3 24
## 3318 5 1 2 2 2 28
## 3394 5 2 1 2 1 47
## 3667 5 2 3 3 2 27
## 3693 5 3 3 2 3 18
Next we compute the Signal to Noise ratio for each experimental run in the orthogonal array according to a “higher is better” mentality (higher gas mileage). This formula is given by: -10log(1/Yi^2), where Y is the response value of the i-th run.
sn = -10*log10(1/neworthogonalarray$hwy^2)
sn
## [1] 27.23 24.61 25.11 27.23 24.08 27.60 28.63 22.28 27.60 28.63 27.60
## [12] 28.94 33.44 28.63 25.11
Then we find the “optimal” factor assignments by choosing the experimental run from the orthogonal array with the highest Signal to Noise ratio.
index = which(sn==max(-10*log10(1/neworthogonalarray$hwy^2)))
index
## [1] 13
neworthogonalarray[index, ]
## year class drive cyl displ hwy
## 3394 5 2 1 2 1 47
The highest signal to noise ratio corresponds to a certain row of the orthogonal array and its given factor level assignments for “year”, “class”, “drive”, “cyl”, and “displacement”. This means that in order to get the highest highway gas mileage, a vehicle must be manufactured in 2009 or later, midsize, 2-wheel drive, 4 to 6 cylinders, and have less than a 2.3 litre engine.
Create Secondary ANOVA model testing main factor effects of each of the variables of interest:
model2=lm(neworthogonalarray$hwy~neworthogonalarray$year+neworthogonalarray$class+neworthogonalarray$drive+neworthogonalarray$cyl+neworthogonalarray$displ, data=neworthogonalarray)
anova(model2)
## Analysis of Variance Table
##
## Response: neworthogonalarray$hwy
## Df Sum Sq Mean Sq F value Pr(>F)
## neworthogonalarray$year 4 243.0 60.7 3.75 0.22
## neworthogonalarray$class 2 32.0 16.0 0.99 0.50
## neworthogonalarray$drive 2 164.9 82.5 5.09 0.16
## neworthogonalarray$cyl 2 165.0 82.5 5.09 0.16
## neworthogonalarray$displ 2 241.6 120.8 7.45 0.12
## Residuals 2 32.4 16.2
In this particular model based on a Taguchi L45 orthogonal array, we fail to reject the H0 that “year”, “class”, drive“,”cyl“, and”displ" have no effect on the response. Thus, variation in the response cannot be attributed to anything other than randomization. These results are much different than the original full factorial design of the vehicles datatset, which leads us to conclude that a Taguchi design with the chosen number of levels per factor may be too highly fractionalized for this experiment. A good direction for future research would be to choose different variables and manipulate the number of levels in an effort to achieve similar results as the full factorial design.
Estimate Parameters using Shapiro-Wilk test of normality
shapiro.test(neworthogonalarray$hwy)
##
## Shapiro-Wilk normality test
##
## data: neworthogonalarray$hwy
## W = 0.8301, p-value = 0.009201
We reject the H0 that the response variable, highway miles per gallon, is normally distributed.
Diagnostics and Model Adequacy Checking:
par(mfrow=c(2,2))
plot(model2)
## Warning: not plotting observations with leverage one:
## 1, 11, 12, 15
## Warning: not plotting observations with leverage one:
## 1, 11, 12, 15
The model appears to have a very good fit to the data since the points are evenly dispersed across the Cartesian plane of residual error and the fitted model.
References:
These data are one of the original Hadley Wickham datasets compiled by the US EPA.