This experiment uses a Taguchi Design to study the effect of 5 factors/independent variables on a vehicle’s city mileage.It is a 2 phase experiment in which we first analyze the effect of the factors on the response using multivariate ANOVA. In the second phase we generate an orthogonal array based on the new dataset that we create (of limited number of levels) and then perform ANOVA on the new dataset/array.
The Research Question: Study the effect of 5 factors (Transmission,drive,year,fuel and make) on the city mileage of a vehicle (response variable). In order to study this problem we can formulate the null hypothesis as:
‘The variance in the type of transmission, the type of drive, the year of manufacture,the fuel type and the make of vehicle (taken separately-main effect) has no significant impact on the variance in city mileage’
The alternate hypothesis is therefore stated as:‘The variance in the 5 factors have a significant effect on the variance in city mileage’
Extracting data using the ‘fueleconomy’ package and reading the .csv file.In this section we perform an exploratory data analysis on the original data set. Also, we conduct this experiment on a subset of the original data set since the dataset consists of around 36000 records.
#vehicles<- (read.csv(file.choose(), header=T))
install.packages("fueleconomy", repos='http://cran.us.r-project.org')
## Installing package into 'C:/Users/uzma/Documents/R/win-library/3.1'
## (as 'lib' is unspecified)
## package 'fueleconomy' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\uzma\AppData\Local\Temp\Rtmpuuk2E2\downloaded_packages
library("fueleconomy", lib.loc="C:\\Users\\uzma\\Documents\\R\\win-library\\3.1")
## Warning: package 'fueleconomy' was built under R version 3.1.2
x<-vehicles
head(x)
## id make model year class
## 1 27550 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5 1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6 1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
## trans drive cyl displ fuel hwy cty
## 1 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 2 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 3 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 4 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 5 Automatic 3-spd Rear-Wheel Drive 4 2.5 Regular 17 16
## 6 Automatic 3-spd Rear-Wheel Drive 6 4.2 Regular 13 13
# Taking a subset of the original data due to limitations of system requirements
L<-x[1:2000,]
Here is a sumary of the data set used:
str(L)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2000 obs. of 12 variables:
## $ id : int 27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
## $ make : chr "AM General" "AM General" "AM General" "AM General" ...
## $ model: chr "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
## $ year : int 1984 1984 1984 1984 1985 1985 1987 1997 1997 1997 ...
## $ class: chr "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" ...
## $ trans: chr "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" "Automatic 3-spd" ...
## $ drive: chr "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" "2-Wheel Drive" ...
## $ cyl : int 4 4 6 6 4 6 6 4 4 6 ...
## $ displ: num 2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
## $ fuel : chr "Regular" "Regular" "Regular" "Regular" ...
## $ hwy : int 17 17 13 13 17 13 21 26 28 26 ...
## $ cty : int 18 18 13 13 16 13 14 20 22 18 ...
head(L)
## id make model year class
## 1 27550 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5 1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6 1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
## trans drive cyl displ fuel hwy cty
## 1 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 2 Automatic 3-spd 2-Wheel Drive 4 2.5 Regular 17 18
## 3 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 4 Automatic 3-spd 2-Wheel Drive 6 4.2 Regular 13 13
## 5 Automatic 3-spd Rear-Wheel Drive 4 2.5 Regular 17 16
## 6 Automatic 3-spd Rear-Wheel Drive 6 4.2 Regular 13 13
tail(L)
## id make model year class trans
## 1995 32968 BMW 550i 2013 Midsize Cars Automatic (S8)
## 1996 32969 BMW 550i 2013 Midsize Cars Manual 6-spd
## 1997 34174 BMW 550i 2014 Midsize Cars Automatic (S8)
## 1998 30677 BMW 550i Gran Turismo 2011 Large Cars Automatic (S8)
## 1999 32096 BMW 550i Gran Turismo 2012 Large Cars Automatic (S8)
## 2000 33092 BMW 550i Gran Turismo 2013 Large Cars Automatic (S8)
## drive cyl displ fuel hwy cty
## 1995 Rear-Wheel Drive 8 4.4 Premium 25 17
## 1996 Rear-Wheel Drive 8 4.4 Premium 22 15
## 1997 Rear-Wheel Drive 8 4.4 Premium 25 17
## 1998 Rear-Wheel Drive 8 4.4 Premium 22 15
## 1999 Rear-Wheel Drive 8 4.4 Premium 22 15
## 2000 Rear-Wheel Drive 8 4.4 Premium 24 16
summary(L)
## id make model year
## Min. : 1 Length:2000 Length:2000 Min. :1984
## 1st Qu.:14051 Class :character Class :character 1st Qu.:1997
## Median :20839 Mode :character Mode :character Median :2005
## Mean :20538 Mean :2003
## 3rd Qu.:29300 3rd Qu.:2010
## Max. :34931 Max. :2015
##
## class trans drive cyl
## Length:2000 Length:2000 Length:2000 Min. : 4.000
## Class :character Class :character Class :character 1st Qu.: 5.000
## Mode :character Mode :character Mode :character Median : 6.000
## Mean : 6.015
## 3rd Qu.: 6.000
## Max. :12.000
## NA's :2
## displ fuel hwy cty
## Min. :1.500 Length:2000 Min. : 9.00 Min. : 7.0
## 1st Qu.:2.400 Class :character 1st Qu.:22.00 1st Qu.:16.0
## Median :3.000 Mode :character Median :25.00 Median :17.0
## Mean :3.015 Mean :24.67 Mean :17.2
## 3rd Qu.:3.200 3rd Qu.:27.00 3rd Qu.:18.0
## Max. :6.300 Max. :62.00 Max. :62.0
## NA's :2
The response variable i.e. city mileage is a continuous variable. City mileage (cty) measured in MPG (miles per gallon) is the only response variable.
The given data set is the fuel economy data from the EPA. It ranges from the year 1985 to 2015 for various car models and each row has a detailed specification of the vehicle. There are 74 variables and 34632 records in the original data set.
We can safely assume the data to be randomized because it is a result of vehicle testing done at the Environmental Protection Agency National Vehicle and Fuel Emissions Laboratory. Since almost every vehicle needs to clear this testing therefore data is a true representative of the population as a whole.
If we are able to determine the within group and between group variability on the performance of a vehicle based on certain factors, then that can be a useful study for performance enhancement.There are no repeated measures since we are using data from a study conducted by EPA.
#Create a histogram
par(mfrow=c(1,1))
hist(L$cty, xlim=c(0,100), ylab = "City Mileage (MPG)")
# Boxplots for the 3 factors
boxplot(cty~trans,data=L,xlab="Transmission", ylab="city mileage (MPG)")
title("Boxplot 'Transmission'")
boxplot(cty~drive,data=L,xlab="Type of drive", ylab="city mileage (MPG)")
title("Boxplot 'Drive type'")
boxplot(cty~fuel,data=L,xlab="Fuel Type", ylab="city mileage (MPG)")
title("Boxplot 'Fuel Type'")
boxplot(cty~make,data=L,xlab="Make", ylab="city mileage (MPG)")
title("Boxplot 'Make'")
boxplot(cty~year,data=L,xlab="Year", ylab="city mileage (MPG)")
title("Boxplot 'Year'")
Since the factors/independent variables in the experiment have multiple levels, in this section we reduce the number of levels used in this experiment to the following number:
Year: 5 levels Make: 3 levels Transmission: 4 levels Drive: 3 levels Fuel: 3 levels
Because of this complex structure of this experiment we will have an L50 Orthogonal array in the Taguchi design.Next we define the new created variables as factors in order to test the null hypothesis.
L$year[ as.numeric(L$year)>= 1984 & as.numeric(L$year) <= 1988 ] = "1"
L$year[ as.numeric(L$year)>= 1989 & as.numeric(L$year) <= 1993 ] = "2"
L$year[ as.numeric(L$year)>= 1994 & as.numeric(L$year) <= 1998 ] = "3"
L$year[ as.numeric(L$year)>= 1999 & as.numeric(L$year) <= 2003 ] = "4"
L$year[ as.numeric(L$year)>= 2004 & as.numeric(L$year) <= 2016 ] = "5"
unique(L$year)
## [1] "1" "3" "4" "5" "2"
unique(L$make)
## [1] "AM General" "ASC Incorporated"
## [3] "Acura" "Alfa Romeo"
## [5] "American Motors Corporation" "Aston Martin"
## [7] "Audi" "Aurora Cars Ltd"
## [9] "Autokraft Limited" "Azure Dynamics"
## [11] "BMW"
L$make[ as.character(L$make) == "AM General" | as.character(L$make) == "Acura" | as.character(L$make ) == "American Motors Corporation" ] = "1"
L$make[as.character(L$make) == "Audi" | as.character(L$make) == "Autokraft Limited" | as.character(L$make) == "BMW" | as.character(L$make) == "ASC Incorporated" ] = "2"
L$make[as.character(L$make) == "Alfa Romeo" | as.character(L$make) == "Aston Martin" | as.character(L$make) == "Aurora Cars Ltd" | as.character(L$make) == "Azure Dynamics" ] = "3"
unique(L$make)
## [1] "1" "2" "3"
unique(L$trans)
## [1] "Automatic 3-spd" "Automatic 4-spd"
## [3] "Manual 5-spd" "Automatic (S5)"
## [5] "Manual 6-spd" "Automatic 5-spd"
## [7] "Auto(AV-S7)" "Automatic (S6)"
## [9] "Automatic (S4)" "Automatic (S7)"
## [11] "Automatic 6-spd" "Manual 4-spd"
## [13] "Auto(AM6)" "Auto(AM7)"
## [15] "Auto(AM-S6)" "Automatic (variable gear ratios)"
## [17] "Automatic (AV)" "Auto(AV-S8)"
## [19] "Automatic (S8)" "Automatic (AM6)"
## [21] "Auto(AM-S7)" "Automatic (A1)"
## [23] "Automatic 8-spd"
L$trans[as.character(L$trans) == "Automatic 3-spd" | as.character(L$trans) == "Manual 5-spd" | as.character(L$trans) == "Manual 6-spd" | as.character(L$trans) == "Auto(AV-S7)" | as.character(L$trans) == "Automatic (S4)" | as.character(L$trans) == "Automatic 6-spd"] = "1"
L$trans[as.character(L$trans) == "Auto(AM6)" | as.character(L$trans) == "Auto(AM-S6)" | as.character(L$trans) == "Automatic (AV)" | as.character(L$trans) == "Automatic (S8)" | as.character(L$trans) == "Auto(AM-S7)" | as.character(L$trans) == "Automatic 8-spd"] = "2"
L$trans[as.character(L$trans) == "Automatic 4-spd" | as.character(L$trans) == "Automatic (S5)" | as.character(L$trans) == "Automatic 5-spd" | as.character(L$trans) == "Automatic (S6)" | as.character(L$trans) == "Automatic (S7)" | as.character(L$trans) == "Manual 4-spd"] = "3"
L$trans[as.character(L$trans) == "Auto(AM7)" | as.character(L$trans) == "Automatic (variable gear ratios)" | as.character(L$trans) == "Auto(AV-S8)" | as.character(L$trans) == "Automatic (AM6)" | as.character(L$trans) == "Automatic (A1)"] = "4"
unique(L$trans)
## [1] "1" "3" "2" "4"
unique(L$drive)
## [1] "2-Wheel Drive" "Rear-Wheel Drive"
## [3] "Front-Wheel Drive" "4-Wheel or All-Wheel Drive"
## [5] "All-Wheel Drive"
L$drive[as.character(L$drive) == "2-Wheel Drive" | as.character(L$drive) == "Front-Wheel Drive" ] ="1"
L$drive[as.character(L$drive) == "All-Wheel Drive"] = "2"
L$drive[as.character(L$drive) == "Rear-Wheel Drive" | as.character(L$drive) == "4-Wheel or All-Wheel Drive"] = "3"
unique(L$drive)
## [1] "1" "3" "2"
unique(L$fuel)
## [1] "Regular" "Premium" "Diesel" "Premium or E85"
## [5] "Electricity"
L$fuel[ as.character(L$fuel) == "Regular" | as.character(L$fuel) == "Premium"] = "1"
L$fuel[ as.character(L$fuel) == "Diesel" | as.character(L$fuel) == "Premium or E85"] = "2"
L$fuel[ as.character(L$fuel) == "Electricity"] = "3"
unique(L$fuel)
## [1] "1" "2" "3"
L$year=as.factor(L$year)
L$make=as.factor(L$make)
L$drive=as.factor(L$drive)
L$trans=as.factor(L$trans)
L$fuel=as.factor(L$fuel)
str(L)
## Classes 'tbl_df', 'tbl' and 'data.frame': 2000 obs. of 12 variables:
## $ id : int 27550 28426 27549 28425 1032 1033 3347 13309 13310 13311 ...
## $ make : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 2 1 1 1 ...
## $ model: chr "DJ Po Vehicle 2WD" "DJ Po Vehicle 2WD" "FJ8c Post Office" "FJ8c Post Office" ...
## $ year : Factor w/ 5 levels "1","2","3","4",..: 1 1 1 1 1 1 1 3 3 3 ...
## $ class: chr "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" "Special Purpose Vehicle 2WD" ...
## $ trans: Factor w/ 4 levels "1","2","3","4": 1 1 1 1 1 1 3 3 1 3 ...
## $ drive: Factor w/ 3 levels "1","2","3": 1 1 1 1 3 3 3 1 1 1 ...
## $ cyl : int 4 4 6 6 4 6 6 4 4 6 ...
## $ displ: num 2.5 2.5 4.2 4.2 2.5 4.2 3.8 2.2 2.2 3 ...
## $ fuel : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ hwy : int 17 17 13 13 17 13 21 26 28 26 ...
## $ cty : int 18 18 13 13 16 13 14 20 22 18 ...
This model tests the initial dataframe for our earlier defined null hypothesis.We are trying to find the main effect of all the 5 factors on the response. As is evident from the results, there is a significant main effect of all the factors on the city mileage (measured in MPG) and so we rejuct the null hypothesis.
model1=lm(L$cty~L$year+L$make+L$drive+L$trans+L$fuel, data=L)
anova(model1)
## Analysis of Variance Table
##
## Response: L$cty
## Df Sum Sq Mean Sq F value Pr(>F)
## L$year 4 816.8 204.20 37.929 < 2.2e-16 ***
## L$make 2 2794.1 1397.07 259.503 < 2.2e-16 ***
## L$drive 2 1998.4 999.22 185.603 < 2.2e-16 ***
## L$trans 3 949.6 316.54 58.798 < 2.2e-16 ***
## L$fuel 2 4699.5 2349.77 436.466 < 2.2e-16 ***
## Residuals 1986 10691.9 5.38
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we construct a Taguchi design in order to analyze the main effects of the factors under study. Basically this design strategy considers the interaction effects as confounding and therefore the results show the main effects of the factors under study.The goal of this experiment is to make a process less variable (more robust) in the face of variation over which we have little or no control. Essentially,instead of testing all possible combinations of variables, we can test all pairs of combinations in some more efficient way. We choose our orthogonal array design based on the factor with the highest number of levels i.e. year (with 5 levels). Next we try to find the experiment with the maximum signal to noise ratio and consider it as the ‘optimal’ factor assignment. Based on this test, we can find the ‘best’ values for each factor of interest. Further, we perform ANOVA on the new array (with each unique record mapped to the corresponding response variable value). The results of ANOVA indicate that the effect of make and transmission type used in vehicles have a somewhat lower impact on the city mileage of the vehicle while the impact of year, drive and fuel is highly significant. Based on these findings we again reject the null hypothesis.
library(qualityTools)
## Warning: package 'qualityTools' was built under R version 3.1.2
library(DoE.base)
## Warning: package 'DoE.base' was built under R version 3.1.2
## Loading required package: grid
## Loading required package: conf.design
## Warning: package 'conf.design' was built under R version 3.1.2
##
## Attaching package: 'DoE.base'
##
## The following objects are masked from 'package:stats':
##
## aov, lm
##
## The following object is masked from 'package:graphics':
##
## plot.design
array = oa.design(factor.names=c("year","make","drive","trans","fuel"), nlevels=c(5,3,3,4,3),columns="min3")
## creating full factorial with 540 runs ...
array
## year make drive trans fuel
## 1 1 3 2 2 2
## 2 5 1 3 1 3
## 3 1 2 1 1 2
## 4 3 2 1 2 3
## 5 5 2 2 1 1
## 6 5 1 2 2 3
## 7 3 2 3 4 1
## 8 3 3 1 3 1
## 9 2 2 2 1 3
## 10 3 2 2 3 1
## 11 5 1 2 4 1
## 12 3 2 2 1 3
## 13 2 2 3 2 1
## 14 3 2 1 1 3
## 15 4 3 1 4 1
## 16 1 3 1 1 1
## 17 3 2 1 3 1
## 18 2 2 1 2 1
## 19 1 1 2 2 2
## 20 2 1 1 4 3
## 21 3 2 1 4 3
## 22 5 3 2 4 2
## 23 5 3 3 2 1
## 24 2 1 3 1 3
## 25 5 2 1 4 2
## 26 2 2 2 4 2
## 27 3 2 2 3 3
## 28 2 1 1 1 2
## 29 4 2 1 3 3
## 30 4 2 2 1 2
## 31 5 2 2 4 3
## 32 5 3 1 3 3
## 33 3 3 3 3 3
## 34 1 1 3 2 3
## 35 1 1 1 4 2
## 36 2 3 3 4 1
## 37 1 1 1 2 2
## 38 5 2 3 4 2
## 39 4 2 1 4 1
## 40 1 2 2 1 1
## 41 2 2 1 4 2
## 42 5 3 2 3 1
## 43 1 1 3 3 2
## 44 1 2 1 4 3
## 45 1 1 2 1 3
## 46 2 3 1 2 2
## 47 4 1 2 4 3
## 48 5 2 3 1 1
## 49 1 1 2 3 1
## 50 5 3 3 4 2
## 51 1 2 2 3 3
## 52 3 2 1 1 1
## 53 4 1 2 2 2
## 54 5 2 3 3 1
## 55 5 1 1 4 3
## 56 4 3 2 3 1
## 57 5 3 3 4 3
## 58 1 2 1 1 1
## 59 4 3 2 1 1
## 60 3 3 3 3 1
## 61 4 3 3 1 3
## 62 3 2 2 1 2
## 63 4 3 1 2 3
## 64 5 2 1 2 1
## 65 3 2 2 4 1
## 66 1 2 3 2 1
## 67 4 1 2 4 2
## 68 5 3 1 4 2
## 69 5 3 1 1 3
## 70 1 2 3 1 2
## 71 3 1 2 3 3
## 72 5 3 2 1 1
## 73 5 2 3 2 2
## 74 3 2 2 4 2
## 75 1 3 3 4 3
## 76 3 3 2 4 1
## 77 1 3 3 3 1
## 78 3 2 2 2 1
## 79 3 3 3 2 3
## 80 2 1 1 4 1
## 81 2 2 2 2 3
## 82 1 1 3 1 2
## 83 3 2 3 4 3
## 84 4 1 2 1 3
## 85 1 2 2 3 1
## 86 3 3 2 1 1
## 87 4 3 1 3 1
## 88 1 2 3 4 1
## 89 3 1 1 2 2
## 90 4 1 2 1 2
## 91 3 3 2 4 3
## 92 5 3 2 2 2
## 93 2 2 1 2 2
## 94 1 3 1 3 2
## 95 2 2 3 2 3
## 96 2 3 3 4 2
## 97 4 1 3 4 2
## 98 1 2 2 4 2
## 99 4 3 2 2 3
## 100 2 2 2 3 2
## 101 3 3 1 3 2
## 102 5 3 1 1 2
## 103 4 2 1 2 2
## 104 4 2 2 2 2
## 105 2 2 3 4 3
## 106 4 2 3 2 2
## 107 4 2 1 2 3
## 108 4 3 1 4 3
## 109 5 2 1 1 2
## 110 5 2 2 4 2
## 111 5 2 3 1 2
## 112 4 1 1 2 2
## 113 5 2 2 2 3
## 114 3 3 3 1 3
## 115 5 2 3 1 3
## 116 5 1 1 3 3
## 117 4 2 3 4 3
## 118 1 1 2 2 3
## 119 2 1 1 3 1
## 120 3 1 3 4 1
## 121 5 3 1 4 1
## 122 4 2 1 2 1
## 123 1 1 2 1 2
## 124 4 3 1 1 3
## 125 1 1 1 1 1
## 126 3 1 3 3 3
## 127 3 3 3 4 2
## 128 1 3 1 4 2
## 129 4 2 2 4 1
## 130 5 2 3 2 3
## 131 2 3 1 3 2
## 132 2 3 2 3 3
## 133 4 2 3 2 3
## 134 1 2 3 2 2
## 135 3 2 1 3 3
## 136 4 1 2 1 1
## 137 2 2 2 2 2
## 138 4 2 1 4 2
## 139 3 1 2 4 3
## 140 4 2 2 3 3
## 141 4 1 3 1 2
## 142 3 2 3 1 2
## 143 4 3 3 3 2
## 144 3 1 2 2 1
## 145 4 1 1 4 2
## 146 1 3 1 2 2
## 147 4 1 3 3 2
## 148 4 3 2 4 3
## 149 4 1 2 4 1
## 150 3 2 3 3 3
## 151 1 3 1 3 1
## 152 2 1 3 4 2
## 153 2 1 3 3 3
## 154 3 1 3 1 1
## 155 2 3 3 1 3
## 156 1 3 2 3 3
## 157 3 1 1 1 2
## 158 4 3 2 2 2
## 159 3 3 3 1 2
## 160 5 1 3 4 2
## 161 2 1 3 4 1
## 162 1 3 2 1 3
## 163 2 2 2 4 1
## 164 4 2 1 1 3
## 165 2 3 1 2 1
## 166 1 1 2 4 3
## 167 1 3 2 2 3
## 168 2 1 2 4 3
## 169 1 3 3 1 1
## 170 3 1 2 1 1
## 171 1 3 2 1 2
## 172 3 3 1 4 1
## 173 3 3 1 4 2
## 174 5 2 2 3 3
## 175 3 1 1 1 1
## 176 5 3 2 2 3
## 177 3 3 1 2 3
## 178 1 3 3 4 1
## 179 5 2 2 1 2
## 180 4 1 2 3 3
## 181 5 1 3 4 1
## 182 4 3 1 3 3
## 183 1 2 2 4 1
## 184 1 2 2 2 1
## 185 5 3 3 1 3
## 186 3 2 3 2 1
## 187 4 1 1 1 2
## 188 3 2 1 1 2
## 189 4 1 1 3 2
## 190 3 1 1 4 1
## 191 4 3 2 4 2
## 192 2 1 2 4 2
## 193 3 1 2 3 1
## 194 3 2 3 3 2
## 195 4 2 3 1 1
## 196 2 1 2 2 3
## 197 5 1 2 4 2
## 198 4 3 1 2 2
## 199 3 3 3 4 3
## 200 5 1 3 3 2
## 201 4 1 2 2 3
## 202 2 1 2 1 3
## 203 1 3 2 3 2
## 204 1 3 1 3 3
## 205 3 1 2 1 2
## 206 2 3 1 3 3
## 207 4 3 2 2 1
## 208 3 3 2 1 3
## 209 4 3 3 4 2
## 210 5 1 3 1 2
## 211 4 1 3 3 3
## 212 5 2 3 3 2
## 213 3 1 3 1 3
## 214 3 1 3 4 3
## 215 4 3 2 3 3
## 216 4 2 1 3 1
## 217 1 2 1 4 2
## 218 1 1 3 4 2
## 219 5 3 3 3 3
## 220 2 1 2 1 2
## 221 4 1 1 3 1
## 222 4 2 3 4 2
## 223 5 1 3 2 3
## 224 5 1 1 3 1
## 225 3 3 1 1 3
## 226 5 3 1 2 3
## 227 1 3 3 2 1
## 228 3 3 3 1 1
## 229 2 3 3 4 3
## 230 5 1 3 3 1
## 231 4 1 2 3 1
## 232 3 1 2 2 3
## 233 1 3 3 2 2
## 234 5 3 2 2 1
## 235 2 3 3 2 3
## 236 3 3 3 3 2
## 237 5 2 1 2 3
## 238 4 2 3 1 3
## 239 2 1 3 4 3
## 240 1 3 1 2 3
## 241 2 3 2 1 2
## 242 2 2 1 4 3
## 243 2 1 2 3 1
## 244 5 1 1 2 2
## 245 2 3 3 2 2
## 246 2 2 3 3 2
## 247 1 3 2 4 3
## 248 5 1 2 3 2
## 249 1 1 1 3 1
## 250 1 1 3 4 3
## 251 3 1 1 3 1
## 252 1 2 3 3 1
## 253 2 2 3 4 1
## 254 2 1 2 2 1
## 255 3 3 3 2 1
## 256 5 2 1 3 3
## 257 3 3 1 4 3
## 258 2 1 3 2 1
## 259 3 1 2 4 1
## 260 5 3 3 1 2
## 261 4 1 3 4 1
## 262 5 3 1 1 1
## 263 5 2 3 4 3
## 264 5 2 1 3 1
## 265 1 2 1 3 1
## 266 5 2 2 4 1
## 267 5 2 1 1 1
## 268 3 2 1 2 1
## 269 2 3 1 1 1
## 270 4 2 3 3 1
## 271 2 1 1 1 1
## 272 2 2 1 1 2
## 273 4 1 1 2 3
## 274 5 2 2 3 2
## 275 1 2 1 2 2
## 276 3 2 2 2 3
## 277 3 1 2 1 3
## 278 5 1 2 2 2
## 279 2 1 3 2 2
## 280 2 1 2 2 2
## 281 2 1 1 3 3
## 282 2 3 1 2 3
## 283 5 1 3 4 3
## 284 2 3 1 4 1
## 285 5 1 1 1 2
## 286 5 2 3 3 3
## 287 2 1 1 3 2
## 288 2 2 1 3 3
## 289 3 1 1 4 2
## 290 2 3 1 1 3
## 291 4 3 1 3 2
## 292 2 2 2 1 2
## 293 1 2 2 1 2
## 294 5 3 3 2 3
## 295 2 3 3 3 2
## 296 2 3 1 4 2
## 297 1 2 1 2 3
## 298 1 1 3 3 1
## 299 2 1 2 1 1
## 300 3 1 3 4 2
## 301 4 2 1 3 2
## 302 4 3 3 4 3
## 303 1 2 3 3 2
## 304 1 2 2 3 2
## 305 1 1 1 1 3
## 306 3 2 2 4 3
## 307 1 1 1 4 1
## 308 1 2 1 1 3
## 309 1 2 2 4 3
## 310 5 2 1 2 2
## 311 5 2 3 2 1
## 312 5 3 2 3 2
## 313 5 1 2 1 3
## 314 2 1 1 2 3
## 315 5 1 2 2 1
## 316 2 3 1 3 1
## 317 3 3 2 1 2
## 318 4 2 3 3 3
## 319 4 3 3 3 1
## 320 1 1 1 1 2
## 321 2 3 1 1 2
## 322 5 3 3 4 1
## 323 5 3 1 2 1
## 324 2 2 3 3 1
## 325 2 1 3 3 1
## 326 4 1 3 2 2
## 327 1 3 3 2 3
## 328 1 1 2 2 1
## 329 3 1 1 2 1
## 330 2 2 2 2 1
## 331 3 1 3 1 2
## 332 2 2 3 1 2
## 333 2 2 2 3 1
## 334 1 3 3 3 3
## 335 3 1 3 3 2
## 336 5 3 3 3 1
## 337 3 1 1 3 2
## 338 2 3 2 1 1
## 339 2 2 3 1 1
## 340 5 2 1 4 3
## 341 2 2 3 4 2
## 342 4 2 3 1 2
## 343 4 2 1 4 3
## 344 2 3 2 1 3
## 345 2 2 1 3 1
## 346 3 3 3 2 2
## 347 3 2 2 1 1
## 348 4 1 3 2 1
## 349 1 3 3 4 2
## 350 2 1 1 1 3
## 351 3 2 3 3 1
## 352 3 2 3 2 2
## 353 4 3 2 4 1
## 354 2 2 2 3 3
## 355 2 1 1 2 2
## 356 1 3 2 3 1
## 357 2 3 2 2 2
## 358 2 2 3 2 2
## 359 3 2 3 1 1
## 360 3 3 1 3 3
## 361 4 2 2 2 3
## 362 3 1 3 2 1
## 363 3 3 1 1 2
## 364 2 3 2 4 2
## 365 3 2 3 2 3
## 366 2 3 2 4 1
## 367 4 2 1 1 2
## 368 4 3 3 2 2
## 369 5 1 3 1 1
## 370 4 3 3 1 1
## 371 2 3 2 2 3
## 372 4 2 2 3 1
## 373 1 3 1 1 2
## 374 1 2 2 2 3
## 375 1 1 1 3 3
## 376 1 3 3 1 3
## 377 5 2 2 3 1
## 378 3 3 2 3 1
## 379 5 2 2 2 1
## 380 1 1 3 1 3
## 381 5 3 2 1 3
## 382 2 2 1 2 3
## 383 1 1 3 3 3
## 384 3 3 3 4 1
## 385 1 3 1 1 3
## 386 4 3 3 3 3
## 387 1 1 3 2 1
## 388 5 1 2 1 1
## 389 5 3 2 3 3
## 390 5 1 1 1 3
## 391 3 1 2 4 2
## 392 4 2 2 1 3
## 393 4 2 2 3 2
## 394 4 3 2 3 2
## 395 3 1 3 3 1
## 396 2 2 3 1 3
## 397 1 1 2 3 3
## 398 3 2 1 2 2
## 399 1 3 2 4 1
## 400 1 1 1 2 1
## 401 4 2 2 1 1
## 402 5 3 2 1 2
## 403 1 2 1 2 1
## 404 4 3 2 1 3
## 405 3 1 1 3 3
## 406 4 3 1 1 2
## 407 1 2 3 4 2
## 408 1 2 1 4 1
## 409 1 1 2 3 2
## 410 5 1 1 3 2
## 411 1 1 2 4 1
## 412 1 1 1 3 2
## 413 2 1 2 3 2
## 414 2 3 1 4 3
## 415 1 2 3 4 3
## 416 4 2 2 4 3
## 417 4 1 3 4 3
## 418 5 3 3 3 2
## 419 4 1 2 3 2
## 420 3 3 2 2 2
## 421 2 3 3 1 1
## 422 1 2 1 3 3
## 423 1 3 1 4 3
## 424 5 2 1 1 3
## 425 1 2 2 1 3
## 426 2 1 3 1 2
## 427 5 1 2 3 3
## 428 5 1 1 4 2
## 429 3 1 1 1 3
## 430 2 3 2 3 2
## 431 1 1 3 4 1
## 432 5 1 1 4 1
## 433 3 2 1 4 2
## 434 3 3 2 3 2
## 435 3 1 3 2 3
## 436 4 1 1 4 1
## 437 2 1 3 3 2
## 438 4 1 3 1 3
## 439 1 1 3 2 2
## 440 2 2 2 4 3
## 441 5 3 3 2 2
## 442 3 2 1 4 1
## 443 3 1 2 2 2
## 444 4 1 3 1 1
## 445 5 2 2 1 3
## 446 5 2 1 3 2
## 447 4 3 1 4 2
## 448 3 1 1 4 3
## 449 1 2 1 3 2
## 450 1 1 2 1 1
## 451 1 2 3 3 3
## 452 3 2 3 4 2
## 453 2 1 2 3 3
## 454 4 3 2 1 2
## 455 2 2 3 3 3
## 456 3 2 3 1 3
## 457 5 3 1 3 1
## 458 2 3 2 3 1
## 459 1 3 1 4 1
## 460 4 3 3 2 3
## 461 4 3 3 4 1
## 462 5 1 2 1 2
## 463 1 1 1 4 3
## 464 2 3 3 3 1
## 465 5 2 2 2 2
## 466 2 3 3 3 3
## 467 5 3 2 4 1
## 468 2 3 2 4 3
## 469 2 2 1 1 1
## 470 2 1 1 2 1
## 471 1 2 3 2 3
## 472 4 3 1 1 1
## 473 4 2 1 1 1
## 474 1 3 2 4 2
## 475 5 3 1 2 2
## 476 3 3 1 1 1
## 477 4 3 1 2 1
## 478 1 2 2 2 2
## 479 3 2 2 3 2
## 480 4 3 3 2 1
## 481 5 1 3 2 1
## 482 3 2 1 3 2
## 483 4 1 3 3 1
## 484 5 1 2 4 3
## 485 4 1 3 2 3
## 486 3 3 2 4 2
## 487 3 3 2 3 3
## 488 3 3 1 2 1
## 489 2 1 3 1 1
## 490 4 3 3 1 2
## 491 1 3 3 1 2
## 492 5 2 1 4 1
## 493 2 2 2 1 1
## 494 2 3 2 2 1
## 495 3 3 1 2 2
## 496 1 2 3 1 1
## 497 2 3 3 2 1
## 498 4 1 1 4 3
## 499 4 1 1 1 3
## 500 1 3 2 1 1
## 501 2 1 1 4 2
## 502 5 1 1 1 1
## 503 5 2 3 4 1
## 504 3 1 1 2 3
## 505 3 2 2 2 2
## 506 2 3 3 1 2
## 507 4 1 1 3 3
## 508 1 3 2 2 1
## 509 1 1 2 4 2
## 510 2 2 1 4 1
## 511 3 3 2 2 3
## 512 2 1 3 2 3
## 513 5 3 1 4 3
## 514 5 1 1 2 1
## 515 1 3 3 3 2
## 516 4 2 2 2 1
## 517 4 2 2 4 2
## 518 5 1 1 2 3
## 519 4 2 3 3 2
## 520 4 2 3 4 1
## 521 5 1 3 2 2
## 522 5 3 3 1 1
## 523 2 2 1 3 2
## 524 4 1 2 2 1
## 525 3 1 3 2 2
## 526 5 1 3 3 3
## 527 2 2 1 1 3
## 528 1 1 3 1 1
## 529 5 1 2 3 1
## 530 1 2 3 1 3
## 531 4 1 1 2 1
## 532 4 1 1 1 1
## 533 1 3 1 2 1
## 534 3 3 2 2 1
## 535 3 1 2 3 2
## 536 4 2 3 2 1
## 537 1 1 1 2 3
## 538 5 3 1 3 2
## 539 2 1 2 4 1
## 540 5 3 2 4 3
## class=design, type= full factorial
finaldata = merge(array, L, by=c("year","make","drive","trans", "fuel"), all = FALSE)
head(finaldata)
## year make drive trans fuel id model
## 1 1 1 1 1 1 1834 Integra
## 2 1 1 1 1 1 4361 Legend
## 3 1 1 1 1 1 1992 Legend
## 4 1 1 1 1 1 28425 FJ8c Post Office
## 5 1 1 1 1 1 4184 Integra
## 6 1 1 1 1 1 28426 DJ Po Vehicle 2WD
## class cyl displ hwy cty
## 1 Subcompact Cars 4 1.6 28 23
## 2 Compact Cars 6 2.7 22 17
## 3 Compact Cars 6 2.5 23 18
## 4 Special Purpose Vehicle 2WD 6 4.2 13 13
## 5 Subcompact Cars 4 1.6 28 23
## 6 Special Purpose Vehicle 2WD 4 2.5 17 18
unique = unique(finaldata[ , 1:5])
unique
## year make drive trans fuel
## 1 1 1 1 1 1
## 12 1 1 1 3 1
## 19 1 1 3 1 1
## 40 1 1 3 3 1
## 48 1 2 1 1 1
## 93 1 2 3 1 1
## 128 1 2 3 3 1
## 145 1 2 3 3 2
## 147 1 3 3 1 1
## 173 2 1 1 1 1
## 188 2 1 1 3 1
## 201 2 1 3 1 1
## 204 2 1 3 3 1
## 207 2 2 1 1 1
## 229 2 2 1 3 1
## 237 2 2 3 1 1
## 294 2 2 3 3 1
## 332 2 3 1 1 1
## 338 2 3 1 3 1
## 341 2 3 3 1 1
## 360 2 3 3 3 1
## 361 3 1 1 1 1
## 378 3 1 1 3 1
## 402 3 1 3 1 1
## 408 3 1 3 3 1
## 415 3 2 1 1 1
## 423 3 2 1 3 1
## 449 3 2 3 1 1
## 500 3 2 3 3 1
## 563 3 3 1 1 1
## 565 3 3 1 3 1
## 567 3 3 3 1 1
## 572 3 3 3 3 1
## 576 4 1 1 1 1
## 588 4 1 1 3 1
## 608 4 1 3 1 1
## 616 4 1 3 3 1
## 622 4 2 1 1 1
## 634 4 2 1 3 1
## 647 4 2 1 4 1
## 655 4 2 3 1 1
## 760 4 2 3 3 1
## 896 4 3 3 1 1
## 903 4 3 3 3 1
## 910 5 1 1 1 1
## 936 5 1 1 3 1
## 983 5 1 2 1 1
## 988 5 1 2 3 1
## 1012 5 1 3 1 1
## 1016 5 1 3 3 1
## 1031 5 2 1 1 1
## 1045 5 2 1 2 1
## 1051 5 2 1 2 2
## 1052 5 2 1 3 1
## 1069 5 2 1 3 2
## 1072 5 2 1 4 1
## 1108 5 2 2 1 1
## 1166 5 2 2 2 1
## 1280 5 2 2 2 2
## 1302 5 2 2 3 1
## 1361 5 2 2 3 2
## 1363 5 2 2 4 1
## 1364 5 2 3 1 1
## 1593 5 2 3 2 1
## 1635 5 2 3 2 2
## 1637 5 2 3 3 1
## 1926 5 2 3 3 2
## 1929 5 3 1 4 3
## 1931 5 3 3 1 1
## 1961 5 3 3 2 1
## 1964 5 3 3 3 1
## 1994 5 3 3 4 1
rownames(unique)
## [1] "1" "12" "19" "40" "48" "93" "128" "145" "147" "173"
## [11] "188" "201" "204" "207" "229" "237" "294" "332" "338" "341"
## [21] "360" "361" "378" "402" "408" "415" "423" "449" "500" "563"
## [31] "565" "567" "572" "576" "588" "608" "616" "622" "634" "647"
## [41] "655" "760" "896" "903" "910" "936" "983" "988" "1012" "1016"
## [51] "1031" "1045" "1051" "1052" "1069" "1072" "1108" "1166" "1280" "1302"
## [61] "1361" "1363" "1364" "1593" "1635" "1637" "1926" "1929" "1931" "1961"
## [71] "1964" "1994"
cty = finaldata$cty[index=c(1,12,19,40,48,93,128,145,147,173,188,201,204,207,229,237,294,332,338,341,360,361,378,402,408,415,423,449,500,563,565,567,572,576,588,608,616,622,634,647,655,760,896,903,910,936,983,988,1012,1016,1031,1045,1051,1052,1069,1072,1108,1166,1280,1302,1361,1363,1364,1593,1635,1637,1926,1929,1931,1961,1964,1994)]
cty
## [1] 23 16 15 19 21 16 14 21 7 22 18 17 16 20 15 15 15 15 16 19 10 22 18
## [24] 16 16 17 17 18 16 15 14 12 13 22 21 15 16 21 16 20 18 15 10 11 21 18
## [47] 17 17 16 17 20 25 30 22 30 24 17 17 20 16 17 13 17 23 26 17 23 62 11
## [70] 14 13 14
finalorthoarray = cbind(unique,cty)
finalorthoarray
## year make drive trans fuel cty
## 1 1 1 1 1 1 23
## 12 1 1 1 3 1 16
## 19 1 1 3 1 1 15
## 40 1 1 3 3 1 19
## 48 1 2 1 1 1 21
## 93 1 2 3 1 1 16
## 128 1 2 3 3 1 14
## 145 1 2 3 3 2 21
## 147 1 3 3 1 1 7
## 173 2 1 1 1 1 22
## 188 2 1 1 3 1 18
## 201 2 1 3 1 1 17
## 204 2 1 3 3 1 16
## 207 2 2 1 1 1 20
## 229 2 2 1 3 1 15
## 237 2 2 3 1 1 15
## 294 2 2 3 3 1 15
## 332 2 3 1 1 1 15
## 338 2 3 1 3 1 16
## 341 2 3 3 1 1 19
## 360 2 3 3 3 1 10
## 361 3 1 1 1 1 22
## 378 3 1 1 3 1 18
## 402 3 1 3 1 1 16
## 408 3 1 3 3 1 16
## 415 3 2 1 1 1 17
## 423 3 2 1 3 1 17
## 449 3 2 3 1 1 18
## 500 3 2 3 3 1 16
## 563 3 3 1 1 1 15
## 565 3 3 1 3 1 14
## 567 3 3 3 1 1 12
## 572 3 3 3 3 1 13
## 576 4 1 1 1 1 22
## 588 4 1 1 3 1 21
## 608 4 1 3 1 1 15
## 616 4 1 3 3 1 16
## 622 4 2 1 1 1 21
## 634 4 2 1 3 1 16
## 647 4 2 1 4 1 20
## 655 4 2 3 1 1 18
## 760 4 2 3 3 1 15
## 896 4 3 3 1 1 10
## 903 4 3 3 3 1 11
## 910 5 1 1 1 1 21
## 936 5 1 1 3 1 18
## 983 5 1 2 1 1 17
## 988 5 1 2 3 1 17
## 1012 5 1 3 1 1 16
## 1016 5 1 3 3 1 17
## 1031 5 2 1 1 1 20
## 1045 5 2 1 2 1 25
## 1051 5 2 1 2 2 30
## 1052 5 2 1 3 1 22
## 1069 5 2 1 3 2 30
## 1072 5 2 1 4 1 24
## 1108 5 2 2 1 1 17
## 1166 5 2 2 2 1 17
## 1280 5 2 2 2 2 20
## 1302 5 2 2 3 1 16
## 1361 5 2 2 3 2 17
## 1363 5 2 2 4 1 13
## 1364 5 2 3 1 1 17
## 1593 5 2 3 2 1 23
## 1635 5 2 3 2 2 26
## 1637 5 2 3 3 1 17
## 1926 5 2 3 3 2 23
## 1929 5 3 1 4 3 62
## 1931 5 3 3 1 1 11
## 1961 5 3 3 2 1 14
## 1964 5 3 3 3 1 13
## 1994 5 3 3 4 1 14
signoise = -10*log10(1/finalorthoarray$cty^2)
signoise
## [1] 27.23456 24.08240 23.52183 25.57507 26.44439 24.08240 22.92256
## [8] 26.44439 16.90196 26.84845 25.10545 24.60898 24.08240 26.02060
## [15] 23.52183 23.52183 23.52183 23.52183 24.08240 25.57507 20.00000
## [22] 26.84845 25.10545 24.08240 24.08240 24.60898 24.60898 25.10545
## [29] 24.08240 23.52183 22.92256 21.58362 22.27887 26.84845 26.44439
## [36] 23.52183 24.08240 26.44439 24.08240 26.02060 25.10545 23.52183
## [43] 20.00000 20.82785 26.44439 25.10545 24.60898 24.60898 24.08240
## [50] 24.60898 26.02060 27.95880 29.54243 26.84845 29.54243 27.60422
## [57] 24.60898 24.60898 26.02060 24.08240 24.60898 22.27887 24.60898
## [64] 27.23456 28.29947 24.60898 27.23456 35.84783 20.82785 22.92256
## [71] 22.27887 22.92256
index = which(signoise==max(-10*log10(1/finalorthoarray$cty^2)))
index
## [1] 68
finalorthoarray[index, ]
## year make drive trans fuel cty
## 1929 5 3 1 4 3 62
model2=lm(finalorthoarray$cty~finalorthoarray$year+finalorthoarray$make+finalorthoarray$drive+finalorthoarray$fuel+finalorthoarray$trans, data=finalorthoarray)
anova(model2)
## Analysis of Variance Table
##
## Response: finalorthoarray$cty
## Df Sum Sq Mean Sq F value Pr(>F)
## finalorthoarray$year 4 282.74 70.69 14.7259 2.322e-08 ***
## finalorthoarray$make 2 67.17 33.58 6.9966 0.001897 **
## finalorthoarray$drive 2 669.30 334.65 69.7177 3.733e-16 ***
## finalorthoarray$fuel 2 1867.53 933.76 194.5311 < 2.2e-16 ***
## finalorthoarray$trans 3 57.47 19.16 3.9912 0.011852 *
## Residuals 58 278.40 4.80
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
In this section we check the adequacy of the ANOVA model.The assumptions that undelie the analysis of variance are: normality,homogeniety of variances,homogeneity of regression slopes, linearity of regression, Independence of error terms.
Model adequacy checking is done to test the validity of ANOVA given the numerous assumptions it is based on.
The shapiro test evaluates the Null hypothesis such that “the samples come from a Normal distribution” against the alternative hypothesis “the samples do not come from a Normal distribution”.
The Q-Q Norm plots also test the normality of the sample. The scatter plots of the residuals and the fitted model will check the distribution of the model over the entire dynamic range.
We also fit a model with different slopes for each factor level.
# Shapiro Test
shapiro.test(finalorthoarray$cty)
##
## Shapiro-Wilk normality test
##
## data: finalorthoarray$cty
## W = 0.6759, p-value = 2.526e-11
# Plot the new model
#Plots for checking normality
par(mfrow=c(2,2))
plot(model2)
## Warning: not plotting observations with leverage one:
## 68
## Warning: not plotting observations with leverage one:
## 68
Shapiro test gives a value of p which is less than 0.1. This shows that the data is normally distributed.
Normal Q-Q Plots:All the Normal Q-Q plots confirm to the assumption of normality. There are a few deviations from the normality (for the factor-Number of cylinders). However, for most other factors as well as their interaction plots are normal for most of the data.
Scatter plots: Since the scatter plots for the residuals show some skewedness we can say that this model could be biased.The residuals for the interaction model do show some patterns (clustering) so we cannot say that it is a very good fit. The plots do depict the presence of some outliers but the data can be considered appropriate for Annova.
The data from the fueleconomy data set is available at https://github.com/hadley/fueleconomy.