Overview

In this paper we try to find out if there is a relationship between automatic or manual cars and the miles travelled per gallon consumed. We will see that although manual cars seem to travel more miles per gallon than automatic ones, when you take into account the weight of the car, that effect disappears.

Analysis

Exploring dataset

The mtcars dataset has 32 observations, car models from years 1973 and 1974, and 11 variables, which include fuel comsumption and other aspects of automobile design and performance. You can find a full description here.

As shown in Figure 1 of the Appendix, ratio of the rear axle (drat), time to travel 1/4 mile (qsec) and number of forward gears (gear), are positively related to miles traveled by gallon of fuel (mpg): as the value of these variables increases, the miles by gallon increase too. Also, cars with manual transmission (am, value 1) or with straight engines (vs, value 1) travel more miles by gallon.

On the other hand, the greater number of cylinders (cyl), carburetors (carb), horsepower (hp), weight (wt) or engine displacement (disp), the less number of miles traveled per gallon.

Regression analysis

We going to use regression techniques, which allow us to study the effect of the type of transmission in the miles traveled, taking into account the rest of the characteristics of the cars.

The statistic of the Shapiro-Wilk normality test results in 0.948, with an associated p-value of 0.123, which does not allow rejecting the null hypothesis that the sample comes from a normal population. So lets to assume that mpg is independent and normally distributed, and to run a simple linear regression between mpg and am variables.

Our model indicates that, in average, automatic cars travel 17.1 miles per gallon of fuel, while manual cars travel 7.2 miles more, that is, an average of 24.4 miles. This difference of miles traveled can be interpreted in a similar way as with an anova or t-test. In fact, as shown in the following table, the statistics are the same (F statistic is the square of t) and its p-value is exactly the same in all three tests.

Comparison between simple regression, anova and t-test.
Test Call Statistic Statistic value p-value
Regression lm(mpg ~ am, data = mtcars) t 4.106 0.000285
Anova aov(mpg ~ am, data = mtcars) F 16.86 0.000285
T-test t.test(mpg ~ am, data = mtcars, var.equal = TRUE) t -4.106 0.000285

We can create a confidence interval for the slope of our model, so we can say that with a 95% confidence, we estimate that manual transmission result in a 3.6 to 10.8 increase in milles/(US)gallon.

Let’s feed our model by adding some more variable. Figure 2 of the appendix shows a high correlation between some of the variables, which can lead to problems of multicollinearity and cause variance inflation of the regressor included in our model, so we need to be careful with this point. We will first select gross horsepower (hp), because it is closely related to consumption, but with no apparent relation to the type of transmission.

This model explains 78% of the total variance, compared to 36% that explained the model with a single variable.

Figure 3 in the Appendix shows this model graphically. The decrease of miles/(US)gallon as the power of the car increases is the same for automatic or manual cars, since we have not considered interaction between am and hp in our model.

The model estimates an expected 0.06 decrease in miles/(US)gallon for every unit increase of gross horsepower, with a 95% confidence interval between 0.07 and 0.04, regardless the type of transmission of the car. In the same way, holding the horsepower fixed, manual transmission result in a 5.2 increase in milles/(US)gallon, with a 95% confidence interval between 3.1 and 7.5.

An inspection of the diagnostic plots shown in Figure 4 of the appendix shows that there are no apparent patterns in the residues and that fit well to a normal distribution, so model works well.

The weight of the vehicle is a variable strongly related to consumption, so let’s introduce it into the model.

When we add the weight of the car in the model, the type of transmission is no longer a significant element. In fact, different models have been tested and in most cases the weight cancels out the effect of the type of transmission.

The model seems good, and explains 84% of the total variance, but am variable does not contribute anything, and complicates its interpretation. A simpler model, which only wt and hp would be more parsimonious and equal explanatory, as the anova results shown in the following table.

Therefore we can conclude that, without take into account any other aspect of the car, manual cars travel more miles per gallon of fuel than automatic cars, but when other aspects are included, its effect is softened. When you take into account the weight of the car, its effect virtually cancels.

Appendix

_This plot shows the type of relation between mpg variable and the other variables_

This plot shows the type of relation between mpg variable and the other variables

_This plot shows the Pearson correlation between all variables of mtcars dataset_

This plot shows the Pearson correlation between all variables of mtcars dataset

_The regression line of manual cars has the same slope as that of automatic cars, but a higher intercept._

The regression line of manual cars has the same slope as that of automatic cars, but a higher intercept.

_Diagnostic plots of Model 2 (mpg ~ am + hp)._

Diagnostic plots of Model 2 (mpg ~ am + hp).