Executive Summary

We examine the mtcars data set of the 1974 Motor Trend magazine to investigate if automatic or manual transmission might be better for mpg. By data exploration and then rigorous statistical inference and regression, we conclude that manual transmission is better, but there are other factors too affecting mileage. We present our final results in the Conclusion section and follow up with a discussion on some practical aspects. Exploratory graphs are presented in Appendix.

Data and Visual Summary

The data consists of 32 observations on 11 variables: (1) mpg, Miles/(US) gallon (2) cyl, Number of cylinders (3) disp, Displacement (cu.in.) (4) hp, Gross horsepower (5) drat, Rear axle ratio (6) wt, Weight in 1000 lbs (7) qsec, Time to 1/4 mile (8) vs, Engine type - V or Straight (9) am, Transmission type - automatic or manual (10) gear, Number of forward gears, and (11) carb, Number of carburetors. For the purpose of our analysis, we convert cyl, vs, am and gear to factor variables, and qsec to speed in miles per hour.

A violin plot of the data seggregated into automatic and manual transmission ‘violins’ (Fig 1) suggests that manual transmission gives a higher mileage. The average MPG for manual cars was 7.25 more than that of automatic transmission cars. A t-test also proved this conclusion formally.

## t = -3.7671, df = 18.332, p-value = 0.001374
## mean in group Automatic mean in group Manual
##          17.14737           24.39231

Let us however visually examine the effects of other variables on mpg also. We make a box and whisker plot (Fig 2) grouping points by transmission type, V/S, number of cylinders, gears and carburetors. This plot suggests that mpg might also be related to the number of cylinders, gears and carburetors.

We also make a scatter plot of mpg with the continuous variables (Fig 3) and this shows that all of them in fact may affect mpg. For example, weight, displacement, horsepower, axle ratio and speed have a correlation coefficient of -0.87, -0.85, -0.78, 0.68 and -0.42 respectively. However, there is also a high correlation among these variables themselves, which mean all of these variables may not be required to explain mpg. Therefore, we need a formal regression technique to identify the most important variables that may affect mpg.

Regression Modeling

Simple model: mpg with transmission type

Here is the output of fitting a simple linear regression of mpg ~ am, and the adjusted R^2.

##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## amManual     7.244939   1.764422  4.106127 2.850207e-04
## [1] 0.3384589

This model says with a high significance that manual transmission cars have a mpg of 7.245 more than automatic transmission cars, but explains only 33.85% of the variance. We therefore need to look for a better model.

Seeking the best model with multiple variables

From the discussion in the earlier section and Figures 2 and 3 in the appendix, we try to fit all variables except number of gears in a linear regression model. By looking at the output of the regression (p-values of individual variables and R^2), we try to arrive at the simplest model by eliminating, in each step, one variable that is least significant to the model. Due to space constraints, we show here only the models without each intermediate output.

Step 1: All variables except gear

lm(mpg ~ cyl + disp + hp + drat + wt + vs + am + carb + speed, data=mtcars)

Adjusted R^2 is 0.799, p-values for 2 and 4 carburetors are 0.92 and 0.68 respectively.

Step 2: Remove carb. Adjusted R^2 is 0.827, p-values for 8 and 6 cylinders are 0.95 and 0.5 respectively.

Step 3: Remove cyl. Adjusted R^2 is 0.834, p-value for VS1 is 0.94.

Step 4: Remove vs. Adjusted R^2 is 0.84, p-value for hp is 0.444.

Step 5: Remove hp. Adjusted R^2 is 0.843, p-value for drat (axle ratio) is 0.387.

Step 6: Remove drat. Adjusted R^2 is 0.844, p-value for disp is 0.323.

Step 7: Remove disp. Adjusted R^2 is 0.844, p-values for all coefficients are significant, and the overall p-value is also very satisfactory. Hence lm(mpg ~ wt + am + speed, data) is the best model.

##               Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 54.3239200 4.38164097 12.398076 6.871523e-13
## wt          -3.8497340 0.68835903 -5.592625 5.506921e-06
## amManual     3.1391425 1.37267875  2.286873 2.996742e-02
## speed       -0.4541925 0.09883842 -4.595303 8.379994e-05
## [1] 0.8440021

A residual plot of this model (Fig 4) shows that:

  • The residuals are approximately normally distributed around zero with no visible pattern. They also fit the quantiles of a normal distribution quite nicely.

  • The residuals vs. leverage plot is also very nicely flat with low Cook’s distances, indicating that there are no high leverage high residual points.

  • The scale-location plot, however, is not a flat line but slopes gently upwards. This might indicate some heteroskedasticity.

Conclusion

Manual transmission is better, our data says with the best fitted model, and saves 3.14 mpg fuel than automatic. Weight and speed also affect mileage. Every 1000 lb increase in weight reduces mpg by 3.85, and every mph increase in speed reduces mpg by 0.45.

Further Discussion

The results also show that transmission type has the least effect on the model; weight and speed have more effect. It is interesting to note the known actual effect of other parameters that we omitted from the regression model.

  • Fig 2 shows that mpg decreases with number of cylinders. From real life, we know that weight increases with number of cylinders, and the model chooses weight instead of number of cylinders.

  • Fig 3 shows a high positive correlation of horsepower with speed (0.744). That is also true from what we know. The model chooses speed and discards hp.

  • Fig 2 shows that straight engines have a higher mpg than V engines. In real life, V engines are known to deliver more horsepower and hence higher speed. Again speed is retained in our model in preference for hp and V/S.

The outcome from the above discussion is that several of the variables are correlated with each other, and we choose a model that best fits our data mathematicaly. In practice this may not be the most practical advice to follow and we need to use our domain knowledge to make the best judgement.

We also saw that transmission type is the weakest of the three explanatory variables (the other two being weight and speed) based on 1974 data. Newer age manufacturers however claim that many automatic transmission cars surpass manual ones in fuel efficiency and a test on 2015 data might be interesting to see if our model still holds today.

Appendix

Automatic vs. manual transmission

Automatic vs. manual transmission

Effect of different factor variables on mpg

Effect of different factor variables on mpg

Effect of different continuous variables on mpg

Effect of different continuous variables on mpg

Residual plot for mpg ~ wt + am + speed

Residual plot for mpg ~ wt + am + speed