Recipe 5

Matthew Macchi

Rensselaer Polytechnic Institute

10/23/14 Version 1

1. Setting

System under test

This recipe will conduct an experiment on the fueleconomy dataset. The experiment will attempt to investigate three specific make susbets. This is an attempt to introduce the concept of Blocked Designs with multiple explanatory and nuisance factors.

options(rpubs.upload.method = "internal")
options(RCurlOptions = list(verbose = FALSE, capath = system.file("CurlSSL", "cacert.pem", package = "RCurl"), ssl.verifypeer = FALSE))
install.packages("fueleconomy", repos='http://cran.us.r-project.org')
## 
## The downloaded binary packages are in
##  /var/folders/55/ql66yz5j3jzgkn6dmnb9sk1c0000gn/T//RtmpIoiPN3/downloaded_packages
library("fueleconomy", lib.loc="/Library/Frameworks/R.framework/Versions/3.1/Resources/library")
x<-vehicles

Factors and Levels

A factor of an experiment is a controlled independent variable; a variable whose levels are set by the experimenter. In this instance, I am conducting an analysis of a blocked design with multiple explanatory and nuisance factors.

The term level is also used for categorical variables. In this case, this is a multi-level analysis.

The first factor that this experiment will examine is the amount of engine displacement in the vehicle.

The second factor that I will consider is the number of cylinders, which describes how many cylinders are in the car’s engine.

The third and final factor that I will consider is the transmission type, which describes which kind of transmission the car has.

head(x)
##      id       make               model year                       class
## 1 27550 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 2 28426 AM General   DJ Po Vehicle 2WD 1984 Special Purpose Vehicle 2WD
## 3 27549 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 4 28425 AM General    FJ8c Post Office 1984 Special Purpose Vehicle 2WD
## 5  1032 AM General Post Office DJ5 2WD 1985 Special Purpose Vehicle 2WD
## 6  1033 AM General Post Office DJ8 2WD 1985 Special Purpose Vehicle 2WD
##             trans            drive cyl displ    fuel hwy cty
## 1 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 2 Automatic 3-spd    2-Wheel Drive   4   2.5 Regular  17  18
## 3 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 4 Automatic 3-spd    2-Wheel Drive   6   4.2 Regular  13  13
## 5 Automatic 3-spd Rear-Wheel Drive   4   2.5 Regular  17  16
## 6 Automatic 3-spd Rear-Wheel Drive   6   4.2 Regular  13  13
tail(x)
##          id  make                             model year       class
## 33437 31064 smart   fortwo electric drive cabriolet 2011 Two Seaters
## 33438 33305 smart fortwo electric drive convertible 2013 Two Seaters
## 33439 34393 smart fortwo electric drive convertible 2014 Two Seaters
## 33440 31065 smart       fortwo electric drive coupe 2011 Two Seaters
## 33441 33306 smart       fortwo electric drive coupe 2013 Two Seaters
## 33442 34394 smart       fortwo electric drive coupe 2014 Two Seaters
##                trans            drive cyl displ        fuel hwy cty
## 33437 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  79  94
## 33438 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33439 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33440 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  79  94
## 33441 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
## 33442 Automatic (A1) Rear-Wheel Drive  NA    NA Electricity  93 122
summary(x)
##        id            make              model                year     
##  Min.   :    1   Length:33442       Length:33442       Min.   :1984  
##  1st Qu.: 8361   Class :character   Class :character   1st Qu.:1991  
##  Median :16724   Mode  :character   Mode  :character   Median :1999  
##  Mean   :17038                                         Mean   :1999  
##  3rd Qu.:25265                                         3rd Qu.:2008  
##  Max.   :34932                                         Max.   :2015  
##                                                                      
##     class              trans              drive                cyl       
##  Length:33442       Length:33442       Length:33442       Min.   : 2.00  
##  Class :character   Class :character   Class :character   1st Qu.: 4.00  
##  Mode  :character   Mode  :character   Mode  :character   Median : 6.00  
##                                                           Mean   : 5.77  
##                                                           3rd Qu.: 6.00  
##                                                           Max.   :16.00  
##                                                           NA's   :58     
##      displ          fuel                hwy             cty       
##  Min.   :0.00   Length:33442       Min.   :  9.0   Min.   :  6.0  
##  1st Qu.:2.30   Class :character   1st Qu.: 19.0   1st Qu.: 15.0  
##  Median :3.00   Mode  :character   Median : 23.0   Median : 17.0  
##  Mean   :3.35                      Mean   : 23.6   Mean   : 17.5  
##  3rd Qu.:4.30                      3rd Qu.: 27.0   3rd Qu.: 20.0  
##  Max.   :8.40                      Max.   :109.0   Max.   :138.0  
##  NA's   :57

Continuous variables (if any)

If a variable can take on any value between its minimum value and its maximum value, it is called a continuous variable; otherwise, it is called a discrete variable.

In this instance, only one variable can be considered continuous. Since city mpg is not a categorical variable, it is continuous.

Response variables

A response variable is defined as the outcome of a study. It is a variable you would be interested in predicting or forecasting. It is often called a dependent variable or predicted variable. In this instance, a response variable is city gas mileage, since it will attempt to describe the difference between levels of the two factors of interst.

The Data: How is it organized and what does it look like?

The data is organized initially into an 12 column table: The columns are titled as follows: id, make, model, year, class, trans, drive, cyl, displ, fuel, hwy, cty. All data is numeric minus make, model, class, trans, drive, and fuel, which are textual. Since the experiment is focusing on three different vehicle makes, the data has been subset to only look at values with that make.

y<-subset(x, x$year==2005|x$year==2010)
z<-subset(y, y$make=="Aston Martin" | y$make=="Jeep" | y$make=="Audi")
head(z)
##        id         make              model year            class
## 363 20510 Aston Martin                DB9 2005 Minicompact Cars
## 366 29686 Aston Martin                DB9 2010 Minicompact Cars
## 367 29687 Aston Martin                DB9 2010 Minicompact Cars
## 378 21613 Aston Martin   DB9 Coupe Manual 2005 Minicompact Cars
## 381 20511 Aston Martin        DB9 Volante 2005 Minicompact Cars
## 386 21614 Aston Martin DB9 Volante Manual 2005 Minicompact Cars
##              trans            drive cyl displ    fuel hwy cty
## 363 Automatic (S6) Rear-Wheel Drive  12   5.9 Premium  18  11
## 366   Manual 6-spd Rear-Wheel Drive  12   5.9 Premium  17  11
## 367 Automatic (S6) Rear-Wheel Drive  12   5.9 Premium  20  13
## 378   Manual 6-spd Rear-Wheel Drive  12   5.9 Premium  16  10
## 381 Automatic (S6) Rear-Wheel Drive  12   5.9 Premium  17  11
## 386   Manual 6-spd Rear-Wheel Drive  12   5.9 Premium  16  10
z$make=as.factor(z$make)
nlevels(z$make)
## [1] 3
z$year=as.factor(z$year)
nlevels(z$year)
## [1] 2
z$trans=as.factor(z$trans)
nlevels(z$trans)
## [1] 10
z$displ=as.factor(z$displ)
nlevels(z$displ)
## [1] 19
z$cyl=as.factor(z$cyl)
nlevels(z$cyl)
## [1] 5

Randomization

This data comes from a test conducted at the EPA’s national Vehicle and Fuel Emissions Laboratory in Ann Arbor, Michigan. Since this is the only information available in regards to background information about the data collection, it is entirely possible that this data might not be completely randomized or the experiment had a completely randomized design.

However, the experiment this analysis will examine is a completely randomized block “pseudo design” with two blocking variables and 3 factors.

2. (Experimental) Design

How will the experiment be organized and conducted to test the hypothesis?

In order to conduct this experiment, I will conduct a two-factor analysis of variance (ANOVA). It will use various attributes as factors, such as displacement, cylinders, transmission, fuel type, and make, and will measure the variation in city gas mileage among these groups.

What is the rationale for this design?

I have chosen to use this type of experimental design to demonstrate proper experimentation with a data set with at least two factors and at least two levels of each factor. The ANOVA was set up to include interatcion between the factors, with the blocking variables as make and year. Therefore, we want to explore the variation within and between groups.

Randomize: What is the Randomization Scheme?

The experiment was a completed as a completely randomized design.

Replicate: Are there replicates and/or repeated measures?

There are no replicates, but repeated measures do occur between the factors and levels.

Block: Did you use blocking in the design?

The only blocking that I performed in this experimental data analysis is seen in the blocking of vehicles by year and make.

par(mfrow=c(1,1))
boxplot(cty~year,data=z, main="Boxplot of City Fuel Economy at Different Years", xlab="Year", ylab="City Fuel Economy (mpg)")

plot of chunk unnamed-chunk-4

boxplot(cty~make,data=z, main="Boxplot of City Fuel Economy for Differnet Vehicle Makes", xlab="Make", ylab="City Fuel Economy (mpg)")

plot of chunk unnamed-chunk-4

3. (Statistical) Analysis

(Exploratory Data Analysis) Graphics and descriptive summary

boxplot(cty~displ,data=z, main="Boxplot of City Fuel Economy for Differnet Engine Displacements", xlab="Make", ylab="City Fuel Economy (mpg)")

plot of chunk unnamed-chunk-5

boxplot(cty~cyl,data=z, main="Boxplot of City Fuel Economy for Differnet Number of Cylinders", xlab="Make", ylab="City Fuel Economy (mpg)")

plot of chunk unnamed-chunk-5

boxplot(cty~trans,data=z, main="Boxplot of City Fuel Economy for Differnet Transmission Types", xlab="Make", ylab="City Fuel Economy (mpg)")

plot of chunk unnamed-chunk-5

Testing

At this point, I am introducitng the Analysis of Variance (ANOVA) test. The ANOVA test is used to analyze the differences in the mean city gas mileage of the makes with varying number of engine displacement, cylinders, and transmissions. An additional ANOVA test analyzes the interaction effect between the three factors. This is a 3 factor ANOVA with 2 blocking variables.

modelA <- aov(cty~ displ+cyl+trans+displ*cyl*trans,data=z)
summary(modelA)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   52.95 < 2e-16 ***
## trans        9     86     9.6    6.07 1.5e-06 ***
## displ:trans 18     25     1.4    0.89    0.59    
## Residuals   85    134     1.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
modelB <- aov(cty~ displ+cyl+trans+displ*cyl*drive+year, data=z)
summary(modelB)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   64.55 < 2e-16 ***
## trans        9     86     9.6    7.40 6.7e-08 ***
## drive        4      9     2.2    1.73 0.15137    
## year         1     17    16.8   13.04 0.00051 ***
## displ:drive 13     24     1.8    1.41 0.17327    
## Residuals   85    110     1.3                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
modelC <- aov(cty~ displ+cyl+trans+displ*cyl*trans+year+make, data=z)
summary(modelC)
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   61.35 < 2e-16 ***
## trans        9     86     9.6    7.04 1.7e-07 ***
## year         1     17    16.9   12.46 0.00068 ***
## make         2      9     4.7    3.49 0.03502 *  
## displ:trans 17     20     1.2    0.86 0.61574    
## Residuals   83    113     1.4                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
model5 <- aov(cty~displ, data=z)
model6 <- aov(cty~cyl, data=z)
model7 <- aov(cty~trans, data=z)
anova(modelA)
## Analysis of Variance Table
## 
## Response: cty
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   52.95 < 2e-16 ***
## trans        9     86     9.6    6.07 1.5e-06 ***
## displ:trans 18     25     1.4    0.89    0.59    
## Residuals   85    134     1.6                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modelB)
## Analysis of Variance Table
## 
## Response: cty
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   64.55 < 2e-16 ***
## trans        9     86     9.6    7.40 6.7e-08 ***
## drive        4      9     2.2    1.73 0.15137    
## year         1     17    16.8   13.04 0.00051 ***
## displ:drive 13     24     1.8    1.41 0.17327    
## Residuals   85    110     1.3                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(modelC)
## Analysis of Variance Table
## 
## Response: cty
##             Df Sum Sq Mean Sq F value  Pr(>F)    
## displ       18   1501    83.4   61.35 < 2e-16 ***
## trans        9     86     9.6    7.04 1.7e-07 ***
## year         1     17    16.9   12.46 0.00068 ***
## make         2      9     4.7    3.49 0.03502 *  
## displ:trans 17     20     1.2    0.86 0.61574    
## Residuals   83    113     1.4                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model5)
## Analysis of Variance Table
## 
## Response: cty
##            Df Sum Sq Mean Sq F value Pr(>F)    
## displ      18   1501    83.4    38.1 <2e-16 ***
## Residuals 112    245     2.2                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model6)
## Analysis of Variance Table
## 
## Response: cty
##            Df Sum Sq Mean Sq F value Pr(>F)    
## cyl         4   1352     338     108 <2e-16 ***
## Residuals 126    394       3                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(model7)
## Analysis of Variance Table
## 
## Response: cty
##            Df Sum Sq Mean Sq F value  Pr(>F)    
## trans       9    670    74.4    8.37 1.2e-09 ***
## Residuals 121   1076     8.9                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

ANOVA Results

The summary of the ANOVA gives the p-values for each individual factor, along with the p-value for the interactions between the factors. The null hypothesis states that the variation in the response variable, city gas mileage, cannot be explained by anything other than randomization. The p-values in this ANOVA support that this is not the case. The p-values for displacement type and transmission are less than an alpha of 0.05. Therefore, it is likely that the variation in city gas mileage can be explained by these factors. The interactions between year and make are both significant. However, the p-value for the interaction between drive and v class, and the interaction between all three factors are not significant. Therefore, the interaction between these two/three factors may not be cause for variation.

Tukey’s HSD test is a post-hoc test, meaning that it is performed after an analysis of variance (ANOVA) test. This means that to maintain integrity, a statistician should not perform Tukey’s HSD test unless she has first performed an ANOVA analysis. In statistics, post-hoc tests are used only for further data analysis; these types of tests are not pre-planned. In other words, you should have no plans to use Tukey’s HSD test before you collect and analyze the data once.

The purpose of Tukey’s HSD test is to determine which groups in the sample differ. While ANOVA can tell the researcher whether groups in the sample differ, it cannot tell the researcher which groups differ. That is, if the results of ANOVA are positive in the sense that they state there is a significant difference among the groups, the obvious question becomes: Which groups in this sample differ significantly? It is not likely that all groups differ when compared to each other, only that a handful have significant differences. Tukey’s HSD can clarify to the researcher which groups among the sample in specific have significant differences.

tukey1 <-TukeyHSD(aov(cty~ displ, data=z))
tukey1
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ displ, data = z)
## 
## $displ
##               diff      lwr      upr  p adj
## 2-1.8    2.939e+00   0.9985  4.88028 0.0000
## 2.4-1.8  1.939e+00  -0.2857  4.16448 0.1713
## 2.7-1.8 -4.227e+00  -8.3249 -0.12967 0.0355
## 2.8-1.8  2.727e-01  -5.2948  5.84027 1.0000
## 3-1.8   -1.573e+00  -3.7572  0.61034 0.4912
## 3.1-1.8 -1.727e+00  -5.1992  1.74470 0.9512
## 3.2-1.8 -1.127e+00  -4.0023  1.74779 0.9958
## 3.6-1.8 -4.727e+00 -10.2948  0.84027 0.2063
## 3.7-1.8 -3.477e+00  -5.7024 -1.25219 0.0000
## 3.8-1.8 -3.727e+00  -7.1992 -0.25530 0.0220
## 4-1.8   -5.727e+00  -9.8249 -1.62967 0.0003
## 4.2-1.8 -4.433e+00  -6.4958 -2.37050 0.0000
## 4.7-1.8 -5.227e+00  -8.3396 -2.11492 0.0000
## 5.2-1.8 -5.727e+00  -9.1992 -2.25530 0.0000
## 5.7-1.8 -5.394e+00  -8.0993 -2.68860 0.0000
## 5.9-1.8 -7.527e+00  -9.8563 -5.19820 0.0000
## 6-1.8   -5.727e+00 -11.2948 -0.15973 0.0367
## 6.1-1.8 -6.727e+00 -12.2948 -1.15973 0.0041
## 2.4-2   -1.000e+00  -2.8846  0.88462 0.9157
## 2.7-2   -7.167e+00 -11.0898 -3.24351 0.0000
## 2.8-2   -2.667e+00  -8.1071  2.77377 0.9573
## 3-2     -4.513e+00  -6.3485 -2.67716 0.0000
## 3.1-2   -4.667e+00  -7.9309 -1.40241 0.0002
## 3.2-2   -4.067e+00  -6.6871 -1.44621 0.0000
## 3.6-2   -7.667e+00 -13.1071 -2.22623 0.0002
## 3.7-2   -6.417e+00  -8.3013 -4.53205 0.0000
## 3.8-2   -6.667e+00  -9.9309 -3.40241 0.0000
## 4-2     -8.667e+00 -12.5898 -4.74351 0.0000
## 4.2-2   -7.373e+00  -9.0623 -5.68277 0.0000
## 4.7-2   -8.167e+00 -11.0455 -5.28786 0.0000
## 5.2-2   -8.667e+00 -11.9309 -5.40241 0.0000
## 5.7-2   -8.333e+00 -10.7664 -5.90030 0.0000
## 5.9-2   -1.047e+01 -12.4730 -8.46033 0.0000
## 6-2     -8.667e+00 -14.1071 -3.22623 0.0000
## 6.1-2   -9.667e+00 -15.1071 -4.22623 0.0000
## 2.7-2.4 -6.167e+00 -10.2379 -2.09542 0.0000
## 2.8-2.4 -1.667e+00  -7.2148  3.88151 0.9999
## 3-2.4   -3.513e+00  -5.6467 -1.37891 0.0000
## 3.1-2.4 -3.667e+00  -7.1075 -0.22583 0.0242
## 3.2-2.4 -3.067e+00  -5.9040 -0.22928 0.0202
## 3.6-2.4 -6.667e+00 -12.2148 -1.11849 0.0045
## 3.7-2.4 -5.417e+00  -7.5928 -3.24049 0.0000
## 3.8-2.4 -5.667e+00  -9.1075 -2.22583 0.0000
## 4-2.4   -7.667e+00 -11.7379 -3.59542 0.0000
## 4.2-2.4 -6.373e+00  -8.3823 -4.36275 0.0000
## 4.7-2.4 -7.167e+00 -10.2442 -4.08909 0.0000
## 5.2-2.4 -7.667e+00 -11.1075 -4.22583 0.0000
## 5.7-2.4 -7.333e+00  -9.9986 -4.66808 0.0000
## 5.9-2.4 -9.467e+00 -11.7491 -7.18428 0.0000
## 6-2.4   -7.667e+00 -13.2148 -2.11849 0.0004
## 6.1-2.4 -8.667e+00 -14.2148 -3.11849 0.0000
## 2.8-2.7  4.500e+00  -2.0285 11.02852 0.5744
## 3-2.7    2.654e+00  -1.3950  6.70266 0.6638
## 3.1-2.7  2.500e+00  -2.3661  7.36607 0.9354
## 3.2-2.7  3.100e+00  -1.3598  7.55983 0.5588
## 3.6-2.7 -5.000e-01  -7.0285  6.02852 1.0000
## 3.7-2.7  7.500e-01  -3.3212  4.82125 1.0000
## 3.8-2.7  5.000e-01  -4.3661  5.36607 1.0000
## 4-2.7   -1.500e+00  -6.8305  3.83051 0.9999
## 4.2-2.7 -2.059e-01  -4.1907  3.77892 1.0000
## 4.7-2.7 -1.000e+00  -5.6164  3.61636 1.0000
## 5.2-2.7 -1.500e+00  -6.3661  3.36607 0.9998
## 5.7-2.7 -1.167e+00  -5.5190  3.18568 1.0000
## 5.9-2.7 -3.300e+00  -7.4290  0.82900 0.3009
## 6-2.7   -1.500e+00  -8.0285  5.02852 1.0000
## 6.1-2.7 -2.500e+00  -9.0285  4.02852 0.9969
## 3-2.8   -1.846e+00  -7.3779  3.68558 0.9994
## 3.1-2.8 -2.000e+00  -8.1551  4.15515 0.9996
## 3.2-2.8 -1.400e+00  -7.2393  4.43929 1.0000
## 3.6-2.8 -5.000e+00 -12.5385  2.53848 0.6435
## 3.7-2.8 -3.750e+00  -9.2982  1.79817 0.6101
## 3.8-2.8 -4.000e+00 -10.1551  2.15515 0.6782
## 4-2.8   -6.000e+00 -12.5285  0.52852 0.1122
## 4.2-2.8 -4.706e+00 -10.1909  0.77917 0.1920
## 4.7-2.8 -5.500e+00 -11.4597  0.45970 0.1083
## 5.2-2.8 -6.000e+00 -12.1551  0.15515 0.0650
## 5.7-2.8 -5.667e+00 -11.4243  0.09095 0.0590
## 5.9-2.8 -7.800e+00 -13.3907 -2.20931 0.0003
## 6-2.8   -6.000e+00 -13.5385  1.53848 0.3080
## 6.1-2.8 -7.000e+00 -14.5385  0.53848 0.1026
## 3.1-3   -1.538e-01  -3.5681  3.26042 1.0000
## 3.2-3    4.462e-01  -2.3589  3.25126 1.0000
## 3.6-3   -3.154e+00  -8.6856  2.37789 0.8552
## 3.7-3   -1.904e+00  -4.0378  0.23007 0.1433
## 3.8-3   -2.154e+00  -5.5681  1.26042 0.7257
## 4-3     -4.154e+00  -8.2027 -0.10503 0.0378
## 4.2-3   -2.860e+00  -4.8237 -0.89576 0.0001
## 4.7-3   -3.654e+00  -6.7017 -0.60601 0.0047
## 5.2-3   -4.154e+00  -7.5681 -0.73958 0.0037
## 5.7-3   -3.821e+00  -6.4514 -1.18965 0.0001
## 5.9-3   -5.954e+00  -8.1960 -3.71171 0.0000
## 6-3     -4.154e+00  -9.6856  1.37789 0.4130
## 6.1-3   -5.154e+00 -10.6856  0.37789 0.0996
## 3.2-3.1  6.000e-01  -3.2929  4.49286 1.0000
## 3.6-3.1 -3.000e+00  -9.1551  3.15515 0.9594
## 3.7-3.1 -1.750e+00  -5.1908  1.69083 0.9407
## 3.8-3.1 -2.000e+00  -6.3523  2.35235 0.9768
## 4-3.1   -4.000e+00  -8.8661  0.86607 0.2548
## 4.2-3.1 -2.706e+00  -6.0440  0.63221 0.2773
## 4.7-3.1 -3.500e+00  -7.5712  0.57125 0.1892
## 5.2-3.1 -4.000e+00  -8.3523  0.35235 0.1122
## 5.7-3.1 -3.667e+00  -7.4359  0.10258 0.0664
## 5.9-3.1 -5.800e+00  -9.3090 -2.29103 0.0000
## 6-3.1   -4.000e+00 -10.1551  2.15515 0.6782
## 6.1-3.1 -5.000e+00 -11.1551  1.15515 0.2738
## 3.6-3.2 -3.600e+00  -9.4393  2.23929 0.7597
## 3.7-3.2 -2.350e+00  -5.1874  0.48738 0.2431
## 3.8-3.2 -2.600e+00  -6.4929  1.29286 0.6313
## 4-3.2   -4.600e+00  -9.0598 -0.14017 0.0356
## 4.2-3.2 -3.306e+00  -6.0178 -0.59400 0.0036
## 4.7-3.2 -4.100e+00  -7.6758 -0.52418 0.0091
## 5.2-3.2 -4.600e+00  -8.4929 -0.70714 0.0058
## 5.7-3.2 -4.267e+00  -7.4945 -1.03888 0.0009
## 5.9-3.2 -6.400e+00  -9.3196 -3.48036 0.0000
## 6-3.2   -4.600e+00 -10.4393  1.23929 0.3258
## 6.1-3.2 -5.600e+00 -11.4393  0.23929 0.0763
## 3.7-3.6  1.250e+00  -4.2982  6.79817 1.0000
## 3.8-3.6  1.000e+00  -5.1551  7.15515 1.0000
## 4-3.6   -1.000e+00  -7.5285  5.52852 1.0000
## 4.2-3.6  2.941e-01  -5.1909  5.77917 1.0000
## 4.7-3.6 -5.000e-01  -6.4597  5.45970 1.0000
## 5.2-3.6 -1.000e+00  -7.1551  5.15515 1.0000
## 5.7-3.6 -6.667e-01  -6.4243  5.09095 1.0000
## 5.9-3.6 -2.800e+00  -8.3907  2.79069 0.9482
## 6-3.6   -1.000e+00  -8.5385  6.53848 1.0000
## 6.1-3.6 -2.000e+00  -9.5385  5.53848 1.0000
## 3.8-3.7 -2.500e-01  -3.6908  3.19083 1.0000
## 4-3.7   -2.250e+00  -6.3212  1.82125 0.8844
## 4.2-3.7 -9.559e-01  -2.9657  1.05392 0.9677
## 4.7-3.7 -1.750e+00  -4.8276  1.32757 0.8578
## 5.2-3.7 -2.250e+00  -5.6908  1.19083 0.6678
## 5.7-3.7 -1.917e+00  -4.5819  0.74859 0.4948
## 5.9-3.7 -4.050e+00  -6.3324 -1.76761 0.0000
## 6-3.7   -2.250e+00  -7.7982  3.29817 0.9939
## 6.1-3.7 -3.250e+00  -8.7982  2.29817 0.8257
## 4-3.8   -2.000e+00  -6.8661  2.86607 0.9929
## 4.2-3.8 -7.059e-01  -4.0440  2.63221 1.0000
## 4.7-3.8 -1.500e+00  -5.5712  2.57125 0.9980
## 5.2-3.8 -2.000e+00  -6.3523  2.35235 0.9768
## 5.7-3.8 -1.667e+00  -5.4359  2.10258 0.9843
## 5.9-3.8 -3.800e+00  -7.3090 -0.29103 0.0197
## 6-3.8   -2.000e+00  -8.1551  4.15515 0.9996
## 6.1-3.8 -3.000e+00  -9.1551  3.15515 0.9594
## 4.2-4    1.294e+00  -2.6907  5.27892 0.9996
## 4.7-4    5.000e-01  -4.1164  5.11636 1.0000
## 5.2-4    0.000e+00  -4.8661  4.86607 1.0000
## 5.7-4    3.333e-01  -4.0190  4.68568 1.0000
## 5.9-4   -1.800e+00  -5.9290  2.32900 0.9865
## 6-4     -7.105e-15  -6.5285  6.52852 1.0000
## 6.1-4   -1.000e+00  -7.5285  5.52852 1.0000
## 4.7-4.2 -7.941e-01  -3.7564  2.16815 1.0000
## 5.2-4.2 -1.294e+00  -4.6322  2.04398 0.9964
## 5.7-4.2 -9.608e-01  -3.4920  1.57045 0.9972
## 5.9-4.2 -3.094e+00  -5.2185 -0.96977 0.0001
## 6-4.2   -1.294e+00  -6.7792  4.19094 1.0000
## 6.1-4.2 -2.294e+00  -7.7792  3.19094 0.9913
## 5.2-4.7 -5.000e-01  -4.5712  3.57125 1.0000
## 5.7-4.7 -1.667e-01  -3.6075  3.27417 1.0000
## 5.9-4.7 -2.300e+00  -5.4536  0.85357 0.4681
## 6-4.7   -5.000e-01  -6.4597  5.45970 1.0000
## 6.1-4.7 -1.500e+00  -7.4597  4.45970 1.0000
## 5.7-5.2  3.333e-01  -3.4359  4.10258 1.0000
## 5.9-5.2 -1.800e+00  -5.3090  1.70897 0.9362
## 6-5.2   -7.105e-15  -6.1551  6.15515 1.0000
## 6.1-5.2 -1.000e+00  -7.1551  5.15515 1.0000
## 5.9-5.7 -2.133e+00  -4.8860  0.61933 0.3549
## 6-5.7   -3.333e-01  -6.0909  5.42428 1.0000
## 6.1-5.7 -1.333e+00  -7.0909  4.42428 1.0000
## 6-5.9    1.800e+00  -3.7907  7.39069 0.9997
## 6.1-5.9  8.000e-01  -4.7907  6.39069 1.0000
## 6.1-6   -1.000e+00  -8.5385  6.53848 1.0000
tukey2 <-TukeyHSD(aov(cty~ cyl, data=z))
tukey2
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ cyl, data = z)
## 
## $cyl
##          diff     lwr     upr  p adj
## 6-4   -4.6143  -5.656 -3.5731 0.0000
## 8-4   -6.7946  -7.959 -5.6304 0.0000
## 10-4  -7.6875 -10.601 -4.7737 0.0000
## 12-4  -9.3239 -10.961 -7.6872 0.0000
## 8-6   -2.1803  -3.381 -0.9800 0.0000
## 10-6  -3.0732  -6.002 -0.1448 0.0346
## 12-6  -4.7095  -6.372 -3.0470 0.0000
## 10-8  -0.8929  -3.867  2.0815 0.9206
## 12-8  -2.5292  -4.271 -0.7870 0.0009
## 12-10 -1.6364  -4.825  1.5527 0.6159
tukey3 <-TukeyHSD(aov(cty~ trans, data=z))
tukey3
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ trans, data = z)
## 
## $trans
##                                                     diff       lwr     upr
## Automatic (AM6)-Auto(AM6)                        -0.5000 -12.27482 11.2748
## Automatic (S5)-Auto(AM6)                          2.6429  -5.06557 10.3513
## Automatic (S6)-Auto(AM6)                          3.3571  -3.60103 10.3153
## Automatic (S7)-Auto(AM6)                          4.0000  -5.61410 13.6141
## Automatic (variable gear ratios)-Auto(AM6)        7.3000   0.06277 14.5372
## Automatic 4-spd-Auto(AM6)                         1.3571  -6.35129  9.0656
## Automatic 5-spd-Auto(AM6)                         0.8125  -6.39808  8.0231
## Manual 5-spd-Auto(AM6)                            9.0714   1.36300 16.7799
## Manual 6-spd-Auto(AM6)                            2.3125  -4.69492  9.3199
## Automatic (S5)-Automatic (AM6)                    3.1429  -7.13505 13.4208
## Automatic (S6)-Automatic (AM6)                    3.8571  -5.87074 13.5850
## Automatic (S7)-Automatic (AM6)                    4.5000  -7.27482 16.2748
## Automatic (variable gear ratios)-Automatic (AM6)  7.8000  -2.12940 17.7294
## Automatic 4-spd-Automatic (AM6)                   1.8571  -8.42077 12.1351
## Automatic 5-spd-Automatic (AM6)                   1.3125  -8.59749 11.2225
## Manual 5-spd-Automatic (AM6)                      9.5714  -0.70648 19.8493
## Manual 6-spd-Automatic (AM6)                      2.8125  -6.95067 12.5757
## Automatic (S6)-Automatic (S5)                     0.7143  -3.21066  4.6392
## Automatic (S7)-Automatic (S5)                     1.3571  -6.35129  9.0656
## Automatic (variable gear ratios)-Automatic (S5)   4.6571   0.25640  9.0579
## Automatic 4-spd-Automatic (S5)                   -1.2857  -6.42467  3.8532
## Automatic 5-spd-Automatic (S5)                   -1.8304  -6.18712  2.5264
## Manual 5-spd-Automatic (S5)                       6.4286   1.28962 11.5675
## Manual 6-spd-Automatic (S5)                      -0.3304  -4.34195  3.6812
## Automatic (S7)-Automatic (S6)                     0.6429  -6.31532  7.6010
## Automatic (variable gear ratios)-Automatic (S6)   3.9429   1.05101  6.8347
## Automatic 4-spd-Automatic (S6)                   -2.0000  -5.92494  1.9249
## Automatic 5-spd-Automatic (S6)                   -2.5446  -5.36912  0.2798
## Manual 5-spd-Automatic (S6)                       5.7143   1.78934  9.6392
## Manual 6-spd-Automatic (S6)                      -1.0446  -3.30057  1.2113
## Automatic (variable gear ratios)-Automatic (S7)   3.3000  -3.93723 10.5372
## Automatic 4-spd-Automatic (S7)                   -2.6429 -10.35129  5.0656
## Automatic 5-spd-Automatic (S7)                   -3.1875 -10.39808  4.0231
## Manual 5-spd-Automatic (S7)                       5.0714  -2.63700 12.7799
## Manual 6-spd-Automatic (S7)                      -1.6875  -8.69492  5.3199
## Automatic 4-spd-Automatic (variable gear ratios) -5.9429 -10.34360 -1.5421
## Automatic 5-spd-Automatic (variable gear ratios) -6.4875  -9.94279 -3.0322
## Manual 5-spd-Automatic (variable gear ratios)     1.7714  -2.62931  6.1722
## Manual 6-spd-Automatic (variable gear ratios)    -4.9875  -7.99591 -1.9791
## Automatic 5-spd-Automatic 4-spd                  -0.5446  -4.90140  3.8121
## Manual 5-spd-Automatic 4-spd                      7.7143   2.57533 12.8532
## Manual 6-spd-Automatic 4-spd                      0.9554  -3.05624  4.9670
## Manual 5-spd-Automatic 5-spd                      8.2589   3.90217 12.6157
## Manual 6-spd-Automatic 5-spd                      1.5000  -1.44371  4.4437
## Manual 6-spd-Manual 5-spd                        -6.7589 -10.77052 -2.7473
##                                                   p adj
## Automatic (AM6)-Auto(AM6)                        1.0000
## Automatic (S5)-Auto(AM6)                         0.9833
## Automatic (S6)-Auto(AM6)                         0.8664
## Automatic (S7)-Auto(AM6)                         0.9418
## Automatic (variable gear ratios)-Auto(AM6)       0.0462
## Automatic 4-spd-Auto(AM6)                        0.9999
## Automatic 5-spd-Auto(AM6)                        1.0000
## Manual 5-spd-Auto(AM6)                           0.0085
## Manual 6-spd-Auto(AM6)                           0.9872
## Automatic (S5)-Automatic (AM6)                   0.9926
## Automatic (S6)-Automatic (AM6)                   0.9567
## Automatic (S7)-Automatic (AM6)                   0.9657
## Automatic (variable gear ratios)-Automatic (AM6) 0.2623
## Automatic 4-spd-Automatic (AM6)                  0.9999
## Automatic 5-spd-Automatic (AM6)                  1.0000
## Manual 5-spd-Automatic (AM6)                     0.0907
## Manual 6-spd-Automatic (AM6)                     0.9952
## Automatic (S6)-Automatic (S5)                    0.9999
## Automatic (S7)-Automatic (S5)                    0.9999
## Automatic (variable gear ratios)-Automatic (S5)  0.0289
## Automatic 4-spd-Automatic (S5)                   0.9984
## Automatic 5-spd-Automatic (S5)                   0.9383
## Manual 5-spd-Automatic (S5)                      0.0037
## Manual 6-spd-Automatic (S5)                      1.0000
## Automatic (S7)-Automatic (S6)                    1.0000
## Automatic (variable gear ratios)-Automatic (S6)  0.0010
## Automatic 4-spd-Automatic (S6)                   0.8244
## Automatic 5-spd-Automatic (S6)                   0.1158
## Manual 5-spd-Automatic (S6)                      0.0003
## Manual 6-spd-Automatic (S6)                      0.8926
## Automatic (variable gear ratios)-Automatic (S7)  0.9013
## Automatic 4-spd-Automatic (S7)                   0.9833
## Automatic 5-spd-Automatic (S7)                   0.9170
## Manual 5-spd-Automatic (S7)                      0.5168
## Manual 6-spd-Automatic (S7)                      0.9988
## Automatic 4-spd-Automatic (variable gear ratios) 0.0011
## Automatic 5-spd-Automatic (variable gear ratios) 0.0000
## Manual 5-spd-Automatic (variable gear ratios)    0.9524
## Manual 6-spd-Automatic (variable gear ratios)    0.0000
## Automatic 5-spd-Automatic 4-spd                  1.0000
## Manual 5-spd-Automatic 4-spd                     0.0002
## Manual 6-spd-Automatic 4-spd                     0.9989
## Manual 5-spd-Automatic 5-spd                     0.0000
## Manual 6-spd-Automatic 5-spd                     0.8244
## Manual 6-spd-Manual 5-spd                        0.0000

Diagnostics/Model Adequacy Checking

To check the adequacy of using the ANOVA as a means of analyzing this set of data I performed Quantile-Quantile (Q-Q) tests on the residual error to determine if the residuals followed a normal distribution. I also created an interaction plot to see if there was an interaction effect between the two factors.

The non- linear fit of the residuals in the first QQ plot in reference to ‘displ’ is an indication that the model is not adequate for this analysis.

The non-linear fit of the residuals in the second QQ plot in refernece to ‘cyl’ is an indication that the model is not adequate for this analysis.

The non-linear fit of the residuals in the second QQ plot in refernece to ‘trans’ is an indication that the model is not adequate for this analysis.

The interaction plot following the QQ plots shows that the two factors are interacting with eachother to create an effect in the response variable whenever there is an intersection of curves on the plot.

The third type of plot is a Residuals vs.Fits plot which is used to identify the linearity of the residual values and to detemrine if there are any outlying values.

qqnorm(residuals(modelA))
qqline(residuals(modelA))

plot of chunk unnamed-chunk-9

qqnorm(residuals(modelB))
qqline(residuals(modelB))

plot of chunk unnamed-chunk-10

qqnorm(residuals(modelC))
qqline(residuals(modelC))

plot of chunk unnamed-chunk-11

interaction.plot(z$displ, z$cyl, z$cty)

plot of chunk unnamed-chunk-12

interaction.plot(z$displ, z$trans, z$cty)

plot of chunk unnamed-chunk-12

interaction.plot(z$trans, z$cyl, z$cty)

plot of chunk unnamed-chunk-12

plot(fitted(modelA),residuals(modelA))

plot of chunk unnamed-chunk-12

plot(fitted(modelB),residuals(modelB))

plot of chunk unnamed-chunk-12

plot(fitted(modelC),residuals(modelC))

plot of chunk unnamed-chunk-12

4. References to the literature

See course canvas site. Also http://www.fueleconomy.gov/feg/download.shtml

5. Appendices

The data from the fueleconomy data set can be accessed at https://githubg.com/hadley/fueleconomy.

A summary of, or pointer to, the raw data

complete and documented R code