Executive Summary

This report explores the relationship between the fuel efficiency of automobiles and specific characteristics. After presenting an overview of a range of possible relationships, the report turns to a focused exploration of the relationship between transmission type and fuel efficiency. The report demonstrates that a model that estimates that relationship should also include the number of carburetors and the type of engine (V8 or straight).

Introduction

You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

“Is an automatic or manual transmission better for MPG”

“Quantify the MPG difference between automatic and manual transmissions”

Correlation Analysis

First, let’s explore how mpg correlatest with the other variables. All variables can be reviewed by typing ?mtcars at the prompt.

##         wt        cyl       disp         hp       carb       qsec 
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251  0.4186840 
##       gear         am         vs       drat        mpg 
##  0.4802848  0.5998324  0.6640389  0.6811719  1.0000000
##      qsec      gear      carb        am        vs      drat        hp 
## 0.1752963 0.2306734 0.3035184 0.3597989 0.4409477 0.4639952 0.6024373 
##      disp       cyl        wt       mpg 
## 0.7183433 0.7261800 0.7528328 1.0000000

The strongest correlation is -0.87 with weight. The weakest correlation is 0.42 with 1/4 mile time. The strongest positive correlation is 0.66 with “drat” (rear axel ratio). The coefficients of determination (squared correlation coefficients), predictors can be ranked by strenght. Overall, the “mpg” predictors should be: weight, cylinders, displacement, and horsepower (in decreasing order). The plots in Appendix 1 illustrate the relationships between “mpg”" and the top 4 predictors.

Given this report’s focus, it may make sense to combine “am” with weight in model that includes at least one more variable. Before doing that, let’s look at the relationship between “mpg” and weight in more detail.

Exploring the Relationship Between MPG and Weight

As illustrated in the plots from Appendix 2, the linear model reflects the pattern of the relationsip between “mpg” and weight. Its R square coefficient is 0.74, which documents the model fits the data well. At the same time, the residual plot highlights that for some models the discrepancy between actual “mpg” and “mpg” estimated based on weight can be close to 8 miles per galon.

Exploring the Relationship Between MPG and Transmission Type

Even though the transmission type is not among the highest correlating variables for mpg, its correlation coefficient of 0.6 suggests a moderate correlation and still warrants an exploration of the relationship between “am” and “mpg”. Let’s start by plotting the two variables.

The simple linear model using “mpg” as a dependent and “am” as an independent variable suggests that both the intercept and the coefficient are significant. At the same time, the R-squared coefficient suggest a limited goodness of fit for the model. The box plot from Appendix 3 illustrates that by showing that there is some overlap between “mpg” values for manual and automatic transmission. Again, developing a multivariate model is warranted.

Estimating MPG Based on Transmission Type And Weight

However, a linear model that incorporates weight and “am” is dominated by weight. The coefficient for “am” is not significant, which means that an alternative approach should be implemented. Let’s try a model that centers around “am”, but adds the variable that is the least strongly correlated with “am”, “carb” (number of carburetors). Both coefficients and the intercept are significant and the goodness of fit test produces an R square around 0.7 (see Appendix 4).

Adding the next least correlated with “am” variable, “vs” (V-engine or straight engine) also produces a strong linear model with significant coefficients.

However, including the next candidate variable, “qsec” (1/4 mile time), renders the coeffients for “vs” and “qsec” not significant.

Overall, a manual transmission has a positive effect on fuel efficiency. The formula for the model is as follows: MPG = 19.5 + 6.8am - 1.4carb + 4.2*vs

The model can be used to estimate the confidence intervals for any vehicle (see Appendix 5). Let’s look at a fictional case, Vehicle Zen, with the most common values for each variable. These values can be calculated using the median() function taking into account that for dichotomous variables that value will be the mode (the most common value in the data).

Based on our prection, Vehicle Zen’s estimated fuel efficiency is 16.7 mpg, but can be as high as 18.7 mpg and as low as 14.6 mpg.

Therefore, the final recommendation would be to estimate fuel efficiency based on a model that combines transmission type, number of carburetors and engine type.

Appendix 1: Plotting the Relationships Between MPG And Its Potential Predictors

All four relationships can be plotted using the function multiplot:

Appendix 2: The Relationship Between MPG And Weight

## 
## Call:
## lm(formula = y ~ x)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## x            -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10
## [1] 7.238654e-14

Appendix 3: Plotting the Relationship Between MPG and Transmission Type

## (Intercept)    amManual 
##   17.147368    7.244939
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Appendix 4: Models for Estimating MPG

## (Intercept)          wt    amManual 
## 37.32155131 -5.35281145 -0.02361522
## 
## Call:
## lm(formula = mpg ~ wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5295 -2.3619 -0.1317  1.4025  6.8782 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.32155    3.05464  12.218 5.84e-13 ***
## wt          -5.35281    0.78824  -6.791 1.87e-07 ***
## amManual    -0.02362    1.54565  -0.015    0.988    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.098 on 29 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7358 
## F-statistic: 44.17 on 2 and 29 DF,  p-value: 1.579e-09
## (Intercept)    amManual        carb 
##   23.145836    7.653119   -2.191748
## 
## Call:
## lm(formula = mpg ~ am + carb, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2320 -1.7415 -0.0706  2.3939  5.6377 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  23.1458     1.2941  17.885  < 2e-16 ***
## amManual      7.6531     1.2230   6.258 7.87e-07 ***
## carb         -2.1917     0.3778  -5.801 2.75e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.392 on 29 degrees of freedom
## Multiple R-squared:  0.7037, Adjusted R-squared:  0.6832 
## F-statistic: 34.43 on 2 and 29 DF,  p-value: 2.191e-08
## (Intercept)          am        carb          vs 
##   12.719443    6.797956   -1.430783    4.195736
## 
## Call:
## lm(formula = mpg ~ am + carb + vs, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2803 -1.2308  0.4078  2.0519  4.8197 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.7194     1.9991   6.363 6.94e-07 ***
## am            6.7980     1.1015   6.172 1.15e-06 ***
## carb         -1.4308     0.4081  -3.506  0.00155 ** 
## vs            4.1957     1.3246   3.168  0.00370 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.962 on 28 degrees of freedom
## Multiple R-squared:  0.7818, Adjusted R-squared:  0.7585 
## F-statistic: 33.45 on 3 and 28 DF,  p-value: 2.138e-09
## (Intercept)          am        carb          vs        qsec 
##   4.1748928   7.3094433  -1.2984840   3.1842796   0.4423666
## 
## Call:
## lm(formula = mpg ~ am + carb + vs + qsec, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.2091 -0.9726  0.1641  2.0150  4.4173 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.1749    11.3666   0.367  0.71626    
## am            7.3094     1.2962   5.639  5.5e-06 ***
## carb         -1.2985     0.4462  -2.910  0.00715 ** 
## vs            3.1843     1.8801   1.694  0.10183    
## qsec          0.4424     0.5792   0.764  0.45160    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.984 on 27 degrees of freedom
## Multiple R-squared:  0.7865, Adjusted R-squared:  0.7548 
## F-statistic: 24.86 on 4 and 27 DF,  p-value: 1.031e-08

Appendix 5: Estimating MPG values

## [1] 1
## [1] 2
## [1] 0
##        fit      lwr      upr
## 1 9.857876 6.485195 13.23056