Executive Summary

Fuel economy is arguably the most important criteria for car owners. The purpose of this project is to use linear regression to analyze the mtcars dataset in an effort to understand the relationship between a set of independent predictor variables (regressors) and miles per gallon (response). A brief exploratory data analysis will be performed, followed by model selection, coefficient interpretation, and then analysis of residuals and diagnostics. The analysis will conclude by providing answers to the following two questions:

Exploratory Data Analysis

The mtcars dataset, available in the default R environment, is a 32 by 11 matrix, information about which can be obtained via ?mtcars. The 11 variables are: mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb. Since the analysis is looking at MPG differences between the two transmission types, it makes sense to split the dataset into one for automatic and another for manual transmission.

split.data <- split(mtcars, as.factor(mtcars$am))
auto.trans <- split.data$`0`
manual.trans <- split.data$`1`

The range for MPG in cars with automatic transmission is 10.4 - 24.4 and for manual it is 15 - 33.9, so it would appear that manual transmission offers better fuel economy. Scatterplots in Figure 1 show a downward trend for MPG against a single regressor for both automatic and manual transmissions. For automatic transmission, the linear line of best fit appears to have a slightly lower slope, but the loess line is more skewed. Applying linear regression techniques will help confirm this preliminary interpretation.

Model Selection and Fitting

Choosing a model depends heavily on selecting the most meaningful variables for analysis to avoid over- or under-fitting the model. As a first step, simple linear regression will be applied, using wt (weight) as the regressor, since vehicular weight is a key contributor to fuel consumption: lm(mpg ~ wt, data = auto.trans) and lm(mpg ~ wt, data = manual.trans). The formulae used result in a model of the form Y = \(\beta_0\) + \(\beta_1\)X.

Of the remaining variables, the next best choice would be hp (horsepower). The formula is modified thus for both datasets: mpg ~ wt + hp, resulting in a first-order model of the form Y = \(\beta_0\) + \(\beta_1X_1\) + \(\beta_2X_2\).

Lastly, an interaction formula is used: mpg ~ wt + hp + wt:hp, resulting in a first-order interaction model of the form Y = \(\beta_0\) + \(\beta_1X_1\) + \(\beta_2X_2\) + \(\beta_3X_1X_2\).

Coefficient Interpretation

The findings of the graphs are supported by analysis of the model coefficients for the various selected regressors. In all cases, the coefficients for weight are negative, indicating an inverse relationship between weight and fuel economy (see Tables 1 - 3). So, an increase in 1,000 lbs. of weight will have a negative \(\beta_1\) influence. The effects are mitigated somewhat by including an additional regressor and even more so when fitting a model with interaction terms. However, in all cases, the decrease in MPG is more severely seen in the case of vehicles with manual transmissions.

Having fit several models, which of those models provides the best fit and, subsequently, the best predictor? To answer that question, the \(R^{2}\) (coefficient of determination) statistics for all three models are provided in Table 4. With the exception of the first entry, all 3 models explain between 77% and 87% of the total variation in MPG, with the interactive model being the best estimator. However, an examination of the coefficient p-values for all three models resulted in a determination that only formula mpg ~ wt had any significant regressors (\(p_A\) = 0.0001246 and \(p_M\) = 0.0000169). That model will be used to evaluate the residuals.

Residuals and Diagnostics

A plot of the residuals, shown in Figure 2, is a graphical, rather than tabular, evaluation of model fit. From the plot of variations in MPG by weight for automatic transmission, there is a significant central cluster with outliers at either end; the residual plot for manual shows a more even distribution. The plots of residuals against weight show that both models, with a few exceptions, did a fairly good job of predicting the fitted values; also, the data is fairly symmetrical, indicating that \(\mathbf{E[e_i]}\) = 0.

Another method of determining model fit is through confidence intervals using the R function confint(). The results are displayed in Table 5. Based on the models, one can be 95% confident the interval [-5.4, -2.17] contains the true change in MPG for a 1,000 lb. change in car weight for automatic transmissions, and [-11.85, -6.32] for manual.

Conclusions

The purpose of this analysis was to answer two questions regarding fuel economy. The answer to the first, “Is an automatic or manual transmission better for MPG?”, is this:

Some calculations will provide an answer to the second question, “What is the quantitative difference in MPG between automatic and manual transmissions?” and also support the conclusion drawn for the first.

How much gas mileage can one expect from a new 2013 Ford Taurus? With a weight of 4,037 pounds, the estimated MPGs are 16.13 for an automatic transmission and 9.62 for a manual one. The model predicts better gas mileage for an automatic transmission and thus the analysis concludes that automatic transmissions, on the whole, provide better fuel economy than manual ones.

\newpage

Appendix of Figures and Tables

Figure 1: MPG comparison for automatic and manual transmissions, conditioned by weight


Figure 2: Residuals for lm(mpg ~ wt)

\newpage

Table 1: Coefficients regressed by weight

Transmission Intercept Weight
Automatic 31.416 -3.786
Manual 46.294 -9.084

Table 2: Coefficients regressed by weight and horsepower

Transmission Intercept Weight Horsepower
Automatic 30.704 -1.856 -0.041
Manual 44.444 -7.625 -0.013

Table 3: Coefficients regressed by weight and horsepower, interactively

Transmission Intercept Weight Horsepower HorsepowerByWeight
Automatic 40.327 -4.797 -0.089 0.014
Manual 53.164 -10.159 -0.121 0.032

Table 4: \(R^{2}\) statistics for all models

Transmission ~wt ~wt + hp ~wt + hp + wt:hp
Automatic 0.589 0.768 0.778
Manual 0.826 0.837 0.872

Table 5: Confidence Intervals for Automatic and Manual Transmissions

Transmission 2.5% 97.5%
Automatic -5.40 -2.17
Manual -11.85 -6.32