Recipe 5: Blocked Designs with multiple factors

Recipes for the Design of Experiments

Zoe Konrad

Rensselaer Polytechnic Institute

Fall 2014 v1

1. Setting

System under test

The purpose of this statistical analysis is to observe the effects of three different factors (drive of the car, fuel type, and number of cylinders) on a single response variable (city gas mileage) while blocking for two different nuisance factors (year and class of the car) using EPA’s fuel economy dataset.

library("fueleconomy")
str(y)
## Classes 'tbl_df', 'tbl' and 'data.frame':    9210 obs. of  12 variables:
##  $ id   : int  12674 13422 14164 16573 17489 18458 18459 15069 15871 16734 ...
##  $ make : chr  "Acura" "Acura" "Acura" "Acura" ...
##  $ model: chr  "2.5TL/3.2TL" "2.5TL/3.2TL" "2.5TL/3.2TL" "3.2CL" ...
##  $ year : int  1996 1997 1998 2001 2002 2003 2003 1999 2000 2001 ...
##  $ class: chr  "Compact Cars" "Compact Cars" "Compact Cars" "Compact Cars" ...
##  $ trans: chr  "Automatic 4-spd" "Automatic 4-spd" "Automatic 4-spd" "Automatic (S5)" ...
##  $ drive: chr  "Front-Wheel" "Front-Wheel" "Front-Wheel" "Front-Wheel" ...
##  $ cyl  : int  6 6 6 6 6 6 6 6 6 6 ...
##  $ displ: num  3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2 3.2 ...
##  $ fuel : chr  "Premium" "Premium" "Premium" "Premium" ...
##  $ hwy  : int  22 22 22 27 27 26 27 25 27 27 ...
##  $ cty  : int  17 17 17 17 17 17 17 17 17 17 ...

Factors and Levels

For this analysis we are interested in three explanatory variables: drive of the car (drive), number of cylinders (cyl), and fuel type (fuel).

summary(as.factor(drive))
##   All-Wheel Front-Wheel  Rear-Wheel 
##        2781        4299        2130
summary(as.factor(cyl))
##    4    6    8 
## 3530 3748 1932
summary(as.factor(fuel))
## Premium Regular 
##    3020    6190

We will also be blocking on two variables: year and class because they are both known and controllable nuisance variables that are not of interest to us in the analysis. We will combine them into one blocking factor in the form of class.year.

summary(as.factor(year))
## 1995-2000 2000-2005 2005-2010 2010-2015 
##      1373      2496      2841      2500
summary(as.factor(class))
## Compact   Large Midsize     SUV 
##    2677     989    2225    3319
block <- paste(class, year, sep=".")

Response variables

The response variale of this analysis is city gas milage in MPG (cty).

summary(cty)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     9.0    15.0    17.0    18.2    20.0    53.0

2. (Experimental) Design

This is a completely randomized blocked designs with multiple factors. The purpose of the experiment is to monitor the effects of fuel type, car drive, and and number of cylinders on the response variable of city gas mileage. The blocking factor class/year is used to reduce noise in the fitted model.

3. (Statistical) Analysis

Exploratory Data Analysis Graphics

boxplot(cty~drive)

plot of chunk unnamed-chunk-6

boxplot(cty~cyl)

plot of chunk unnamed-chunk-6

boxplot(cty~fuel)

plot of chunk unnamed-chunk-6

boxplot(cty~year)

plot of chunk unnamed-chunk-6

boxplot(cty~class)

plot of chunk unnamed-chunk-6

interaction.plot(drive,cyl,cty)

plot of chunk unnamed-chunk-6

interaction.plot(drive,fuel,cty)

plot of chunk unnamed-chunk-6

interaction.plot(cyl,fuel,cty)

plot of chunk unnamed-chunk-6

All of the factors appear to have a significant difference in means accross levels and there do not seem to be any major interactions between the three factors.

Testing

model <- aov(cty~drive*fuel*cyl+block) #include interaction terms just for illustration
anova(model)
## Analysis of Variance Table
## 
## Response: cty
##                  Df Sum Sq Mean Sq F value  Pr(>F)    
## drive             2  52185   26092  3486.0 < 2e-16 ***
## fuel              1   1010    1010   135.0 < 2e-16 ***
## cyl               1  45243   45243  6044.6 < 2e-16 ***
## block            15  18803    1254   167.5 < 2e-16 ***
## drive:fuel        2    370     185    24.7 2.0e-11 ***
## drive:cyl         2   1105     553    73.8 < 2e-16 ***
## fuel:cyl          1    491     491    65.6 6.1e-16 ***
## drive:fuel:cyl    2   1162     581    77.6 < 2e-16 ***
## Residuals      9183  68734       7                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From the ANOVA results we see that each of our three factors are found to have a significant difference in means, thus we can reject the null hypothesis that randomization alone can account for the variation in city gas milage.

Drive of the car, fuel type, and number of cylinders are all explanatory factors in the variation of city gas milage.

Estimation (of Parameters)

TukeyHSD(aov(cty~as.factor(drive)))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ as.factor(drive))
## 
## $`as.factor(drive)`
##                           diff     lwr     upr p adj
## Front-Wheel-All-Wheel   4.4818  4.2619  4.7018     0
## Rear-Wheel-All-Wheel   -0.6185 -0.8788 -0.3583     0
## Rear-Wheel-Front-Wheel -5.1004 -5.3399 -4.8609     0
aggregate(cty, by=list(drive), FUN=mean)
##       Group.1     x
## 1   All-Wheel 16.22
## 2 Front-Wheel 20.70
## 3  Rear-Wheel 15.60

Front-wheel drive gets the best gas milage (mean 20.70 MPG) while Rear-wheel and All-Wheel get worse, more similar gas milage (mean 15.60 and 16.22 MPG)

TukeyHSD(aov(cty~as.factor(fuel)))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ as.factor(fuel))
## 
## $`as.factor(fuel)`
##                  diff   lwr   upr p adj
## Regular-Premium 2.334 2.142 2.525     0
aggregate(cty, by=list(fuel), FUN=mean)
##   Group.1     x
## 1 Premium 16.60
## 2 Regular 18.93

Surprisingly, regular gas gets better gas milage than premium gas by a difference of 2.33 MPG.

TukeyHSD(aov(cty~as.factor(cyl)))
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = cty ~ as.factor(cyl))
## 
## $`as.factor(cyl)`
##       diff    lwr    upr p adj
## 6-4 -5.254 -5.430 -5.078     0
## 8-4 -8.106 -8.318 -7.893     0
## 8-6 -2.852 -3.062 -2.641     0
aggregate(cty, by=list(cyl), FUN=mean)
##   Group.1     x
## 1       4 22.01
## 2       6 16.75
## 3       8 13.90

Number of cylinders has the biggest differences in means with 4 cylinder cars getting over 5 MPG more than 6 cylinder cars, which get almost 3 MPG more than 8 cylinder cars.

Diagnostics/Model Adequacy Checking

qqnorm(residuals(model)); qqline(residuals(model))

plot of chunk unnamed-chunk-11

Unfortunately, the residuals of the model are not normally distributed on the high tail because city gas milage is not normally distributed. This indicates that the model may not be the most appropriate fit.

4. References to the literature