Summary

This report will examine the mtcars data set and explore the relationship between miles per gallon (MPG) and transmission type. Specifically, this project will examine: 1) Is an automatic or manual transmission better for MPG; and 2) Quantify the difference between automatic and manual transmissions.

Results indicate that vehicles with automatic transmissions have fuel mileage significantly lower than vehicles with manual transmissions. Regression analysis demonstrates that the MPG of a vehicle can be predicted given the weight, horse-power, number of cylinders and the transmission type. Based on the best-fit regression model it can be said that vehicles with manual transmissions are 1.8 times more fuel efficient than vehicles with automatic transmissions.

Exploratory Data Analysis

1. Format Data

mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

2. View Data

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
##  $ am  : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
##  $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...

3. Plot Data (MPG vs. Transmission Type)

boxplot(mpg ~ am, data = mtcars, col = c("green", "orange"), 
        ylab = "Miles per Gallon", xlab = "Transmission Type", 
        main = "MPG vs. Transmission Type")

####4. Inference: Automatic vs. Manual Transmission and MPG

aggregate(mpg~am, data = mtcars, mean)
##          am      mpg
## 1 Automatic 17.14737
## 2    Manual 24.39231
auto <- mtcars[mtcars$am == "Automatic",]
man <- mtcars[mtcars$am == "Manual",]
t.test(auto$mpg, man$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  auto$mpg and man$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

For the given data set, manual transmission vehicles had a mean MPG of 24.4 and automatic transmission vehicles a mean MPG of 17.1. The difference is statistically significant.

Regression Analysis

base_model <- lm(mpg~am, data = mtcars)
summary(base_model)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Our simple “base_model” regression indicates that transmission type explains 36% of the variation in MPG. This indicates that other variables need to be accounted for using multivariate linear regression. Examination of the pairs plot supports this conclusion with correlation being evident in a number of variables (see Appendix).

initial_model <- lm(mpg~., data = mtcars)
best_model <- step(initial_model, direction = "both")
summary(best_model)
## 
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9387 -1.2560 -0.4013  1.1253  5.0513 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 33.70832    2.60489  12.940 7.73e-13 ***
## cyl6        -3.03134    1.40728  -2.154  0.04068 *  
## cyl8        -2.16368    2.28425  -0.947  0.35225    
## hp          -0.03211    0.01369  -2.345  0.02693 *  
## wt          -2.49683    0.88559  -2.819  0.00908 ** 
## amManual     1.80921    1.39630   1.296  0.20646    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared:  0.8659, Adjusted R-squared:  0.8401 
## F-statistic: 33.57 on 5 and 26 DF,  p-value: 1.506e-10
anova(base_model, best_model)
## Analysis of Variance Table
## 
## Model 1: mpg ~ am
## Model 2: mpg ~ cyl + hp + wt + am
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     30 720.90                                  
## 2     26 151.03  4    569.87 24.527 1.688e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Using step-wise multivariate linear regression our best regression model explains 86% of the variation in MPG using the variables Cylinders, horse-power, weight and transmission type. Examination of the model diagnostics (see Residuals plot in the Appendix) indicates no discernible pattern in Residuals vs. Fitted plot and that the data is normally distributed (Normal Q-Q plot, Appendix) with constant variance (Scale-location plot, Appendix).Analysis of variance of our simple “base” and mulivariate “best” models indicate that the difference in variation explained by the two models is statistically significant which leads us to believe the multivariate model is valid and correctly explains more of the variation in MPG.
####1. Summary It can be stated that under the best fit model, manual transmission vehicles are 1.8 times more fuel efficient than automatic transmission vehicles.

Appendix

pairs(mpg~., data = mtcars)

par(mfrow = c(2,2))
plot(best_model)