This report explores the relationship between the fuel efficiency of automobiles and specific characteristics. After presenting an overview of a range of possible relationships, the report turns to a focused exploration of the relationship between transmission type and fuel efficiency. The report demonstrates that a model that estimates that relationship should also include the number of carburetors and the type of engine (V8 or straight).
You work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
“Is an automatic or manual transmission better for MPG”
“Quantify the MPG difference between automatic and manual transmissions”
First, let’s explore how mpg correlatest with the other variables. All variables can be reviewed by typing ?mtcars at the prompt.
## wt cyl disp hp carb qsec
## -0.8676594 -0.8521620 -0.8475514 -0.7761684 -0.5509251 0.4186840
## gear am vs drat mpg
## 0.4802848 0.5998324 0.6640389 0.6811719 1.0000000
## qsec gear carb am vs drat hp
## 0.1752963 0.2306734 0.3035184 0.3597989 0.4409477 0.4639952 0.6024373
## disp cyl wt mpg
## 0.7183433 0.7261800 0.7528328 1.0000000
The strongest correlation is -0.87 with weight. The weakest correlation is 0.42 with 1/4 mile time. The strongest positive correlation is 0.66 with “drat” (rear axel ratio). The coefficients of determination (squared correlation coefficients), predictors can be ranked by strenght. Overall, the “mpg” predictors should be: weight, cylinders, displacement, and horsepower (in decreasing order). The plots in Appendix 1 illustrate the relationships between “mpg”" and the top 4 predictors.
Given this report’s focus, it may make sense to combine “am” with weight in model that includes at least one more variable. Before doing that, let’s look at the relationship between “mpg” and weight in more detail.
As illustrated in the plots from Appendix 2, the linear model reflects the pattern of the relationsip between “mpg” and weight. Its R square coefficient is 0.74, which documents the model fits the data well. At the same time, the residual plot highlights that for some models the discrepancy between actual “mpg” and “mpg” estimated based on weight can be close to 8 miles per galon.
Even though the transmission type is not among the highest correlating variables for mpg, its correlation coefficient of 0.6 suggests a moderate correlation and still warrants an exploration of the relationship between “am” and “mpg”. Let’s start by plotting the two variables.
The simple linear model using “mpg” as a dependent and “am” as an independent variable suggests that both the intercept and the coefficient are significant. At the same time, the R-squared coefficient suggest a limited goodness of fit for the model. The box plot from Appendix 3 illustrates that by showing that there is some overlap between “mpg” values for manual and automatic transmission. Again, developing a multivariate model is warranted.
However, a linear model that incorporates weight and “am” is dominated by weight. The coefficient for “am” is not significant, which means that an alternative approach should be implemented. Let’s try a model that centers around “am”, but adds the variable that is the least strongly correlated with “am”, “carb” (number of carburetors). Both coefficients and the intercept are significant and the goodness of fit test produces an R square around 0.7 (see Appendix 4).
Adding the next least correlated with “am” variable, “vs” (V-engine or straight engine) also produces a strong linear model with significant coefficients.
However, including the next candidate variable, “qsec” (1/4 mile time), renders the coeffients for “vs” and “qsec” not significant.
Overall, a manual transmission has a positive effect on fuel efficiency. The formula for the model is as follows: MPG = 19.5 + 6.8am - 1.4carb + 4.2*vs
The model can be used to estimate the confidence intervals for any vehicle (see Appendix 5). Let’s look at a fictional case, Vehicle Zen, with the most common values for each variable. These values can be calculated using the median() function taking into account that for dichotomous variables that value will be the mode (the most common value in the data).
Based on our prection, Vehicle Zen’s estimated fuel efficiency is 16.7 mpg, but can be as high as 18.7 mpg and as low as 14.6 mpg.
Therefore, the final recommendation would be to estimate fuel efficiency based on a model that combines transmission type, number of carburetors and engine type.
All four relationships can be plotted using the function multiplot:
##
## Call:
## lm(formula = y ~ x)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## x -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
## [1] 7.238654e-14
## (Intercept) amManual
## 17.147368 7.244939
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
## (Intercept) wt amManual
## 37.32155131 -5.35281145 -0.02361522
##
## Call:
## lm(formula = mpg ~ wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5295 -2.3619 -0.1317 1.4025 6.8782
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.32155 3.05464 12.218 5.84e-13 ***
## wt -5.35281 0.78824 -6.791 1.87e-07 ***
## amManual -0.02362 1.54565 -0.015 0.988
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.098 on 29 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7358
## F-statistic: 44.17 on 2 and 29 DF, p-value: 1.579e-09
## (Intercept) amManual carb
## 23.145836 7.653119 -2.191748
##
## Call:
## lm(formula = mpg ~ am + carb, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2320 -1.7415 -0.0706 2.3939 5.6377
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.1458 1.2941 17.885 < 2e-16 ***
## amManual 7.6531 1.2230 6.258 7.87e-07 ***
## carb -2.1917 0.3778 -5.801 2.75e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.392 on 29 degrees of freedom
## Multiple R-squared: 0.7037, Adjusted R-squared: 0.6832
## F-statistic: 34.43 on 2 and 29 DF, p-value: 2.191e-08
## (Intercept) am carb vs
## 12.719443 6.797956 -1.430783 4.195736
##
## Call:
## lm(formula = mpg ~ am + carb + vs, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2803 -1.2308 0.4078 2.0519 4.8197
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.7194 1.9991 6.363 6.94e-07 ***
## am 6.7980 1.1015 6.172 1.15e-06 ***
## carb -1.4308 0.4081 -3.506 0.00155 **
## vs 4.1957 1.3246 3.168 0.00370 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.962 on 28 degrees of freedom
## Multiple R-squared: 0.7818, Adjusted R-squared: 0.7585
## F-statistic: 33.45 on 3 and 28 DF, p-value: 2.138e-09
## (Intercept) am carb vs qsec
## 4.1748928 7.3094433 -1.2984840 3.1842796 0.4423666
##
## Call:
## lm(formula = mpg ~ am + carb + vs + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.2091 -0.9726 0.1641 2.0150 4.4173
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.1749 11.3666 0.367 0.71626
## am 7.3094 1.2962 5.639 5.5e-06 ***
## carb -1.2985 0.4462 -2.910 0.00715 **
## vs 3.1843 1.8801 1.694 0.10183
## qsec 0.4424 0.5792 0.764 0.45160
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.984 on 27 degrees of freedom
## Multiple R-squared: 0.7865, Adjusted R-squared: 0.7548
## F-statistic: 24.86 on 4 and 27 DF, p-value: 1.031e-08
## [1] 1
## [1] 2
## [1] 0
## fit lwr upr
## 1 9.857876 6.485195 13.23056