Factors that affect motor car fuel consumption

An anlaysis in R using the mtcars dataset

Executive summary

Fuel efficiency of cars, as measured by mpg (miles per gallon) was studied using the mtcars data set. Manual transmission, confers only a small, non-significant benefit of 0.75 mpg over automatic transmission, when the confounding effects of car weight, number of cylinders, number of gears, and engine displacement are accounted for. Weight and Number of cylinders together accounted for 84% of the variaation in mpg between models.

Introduction

Motor car fuel consumption, measured in miles per gallon (mpg) depends, inter alia, on the car’s weight, engine displacement, transmission (manual or automatic), and possibly on the number of gears. Can we estimate the size of the effect of these variables? Specifically to what extent does the type of transmission effect fuel efficiency.

The data set

The mtcars data set provides a useful starting point. It comes with the R statistical package and is taken from a 1974 edition of [Motor Trend magazine] and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles, all of them 1973-74 models.

Analytical methods

The data are analysed here using the R statistical package. Specifically I used the following analytical functions provided by R: plot() and pairs() for exploratory analysis, lm() for regression analysis and plot() applied to residuals for regression diagnostics.

Results

Figure 1. shows a panel of histograms of selected variables from the data set. Note that the following variables take only specific values and are best treated as factors rather than continuous variables:cyl (cylinder, 4, 6 or 8); am (transmission, 0 for manual and 1 for automatic); gear (no of gears, 3, 4 or 5); carb (carburettors, 1, 2, 3, 4, 6, or 8)

Figure 2. is produced by the pairs() function in R. It shows that mpg as the outcome variable of interest is most clearly correlated with wt (r = 0.87); cyl (r = 0.85); disp (r = 0.85); and and less so with am (r=0.60)

But this figure also shows that there is a high degree of corrleation between the explanatory variables. So for instance, the correlation co-efficient between wt and cyl is 0.78; between wt and disp it is 0.89; and between cyl and disp it is 0.90.

Single variable regression may therefore hide important effects of confounding between potential explanatory variables that might be used as regressors in a multiple regression model.

Simple regression with transmission as regressor

Figure 3 is a simple box plot to show how the mean mpg differs between manual and automatic cars. The regression coefficents from simple linear regression of mpg vs am is shown below this. This shows that before any account is taken for other potential confounding factors

  • Automatic cars give, on average,a fuel efficiency of 17.15 mpg
  • Manual cars give, on average, an additional 7.25 mpg (total 24.40)
  • this difference is signficantly different with p=0.0003
  • R-sqaured is 0.36, i.e 36% of the variace of mpg in our sample of cars is ‘explained’ by the mode of transmission.

Multiple regression model

It would appear that a model with transmission as the single regressor hides the confounding effect of other variables. As shown in Figure 2, am is negatively correlated with weight, no of cylinders and with displacement

Figure 4 shows the output from a multiple regression model with mpg as the outcome, and with wt (weight in 1000 pounds), cyl (no of cylinders as a factor variable), disp (displacement) and gear (no of gears as a factor variable) as regressors.

Inspecting the p-values of this output, we can conclude that with this model, weight and the number of cylinders are statistically significant determinants of fuel efficiency. In particular, the Beta-coefficient for weight is -3.58136; i.e. for every 1000 pounds increase in weight fuel efficiency falls by 3.58 mpg. Likewise, moving from 4 to 6 cylinders reduces mpg by 4.12 mpg (on average); and moving from 4 to 8 cylinders reduces fuel efficiency vy 6.04 mpg.

A change from automatic (am = 0) to manual transmission appears to increase fuel efficiency by 0.75 mpg but this is not statistically significant (p =0.7). Likewise a change from 3 to 4 or 5 gears has a statistically non-significant effect on fuel efficiency.

In this model, the total R-squared is 0.846; i.e 85% of the variance in MPG is ‘explained’ by the regressors selected.

Does this model stand up to the assumptions underlying regression analysis?

This is tested in Figure 5 where the residuals are plotted against wt, cyl and am (the regressors) In all three cases, the residuals show no identifiable pattern, and appear to have constant variance at all levels of the regressor variable.

Can we fit a more parsimonious model?

Fitting a model with just wt and cyl as regressors is a reasonable step since these were the only variables to show a significiant relationship with fuel efficiency.

This is shown in Figure 6. The co-efficients for wt and 6 or 8 cylinders are only marginally diffrent from the previous model, and the R=squared is 0.837, just marginally different from the earlier value (0.846).

This model may therefore be the optimal model to explain the variation seen in fuel efficiency in this sample of cars.

Appendix

Figure 1. Histograms of selected variables from mtcars data set plot of chunk unnamed-chunk-3

Figure 2. Pair-wise correlation coefficients and scatter plots of selected variables plot of chunk unnamed-chunk-4 ————————————————————————

Figure 3. Boxplot of mpg by transmission, followed by output from coef(r1) where r1 is the output from linear model of mpg vs am.

plot of chunk unnamed-chunk-5

##    (Intercept) as.factor(am)1 
##         17.147          7.245

Figure 4. Multiple regression model with mpg vs. wt, cyl, disp and gear.

##                   Estimate Std. Error t value  Pr(>|t|)
## (Intercept)      33.732197    2.97897 11.3234 4.115e-11
## as.factor(am)1    0.752130    1.91743  0.3923 6.983e-01
## wt               -3.581358    1.39064 -2.5753 1.660e-02
## as.factor(cyl)6  -4.120541    1.52403 -2.7037 1.240e-02
## as.factor(cyl)8  -6.042471    2.72488 -2.2175 3.631e-02
## disp              0.005206    0.01524  0.3416 7.356e-01
## as.factor(gear)4  0.404053    2.07484  0.1947 8.472e-01
## as.factor(gear)5 -1.489014    2.26406 -0.6577 5.170e-01
## [1] "R-Squared is: "
## [1] 0.8455

Figure 5. Residuals plotted against the regressors

plot of chunk unnamed-chunk-7

Figure 6. A parsimonious model using only cyl and wt as regressors

##                 Estimate Std. Error t value  Pr(>|t|)
## (Intercept)       33.991     1.8878  18.006 6.257e-17
## wt                -3.206     0.7539  -4.252 2.130e-04
## as.factor(cyl)6   -4.256     1.3861  -3.070 4.718e-03
## as.factor(cyl)8   -6.071     1.6523  -3.674 9.992e-04
## [1] "R-Squared is: "
## [1] 0.8374