An Analysis of Gas Mileage and Transmission : Is There a Relationship

Executive summary

Do vehicles with automatic or manual transmission get better gas mileage? This report attempts to find an answer, and, if one exists, quantify it. To this end, we perform a t-test, to determine if a difference exists, and then a regression analysis to identify the factors involved. The outcome of these methods indicate that, without controlling for other variables, manual cars are more fuel-efficient;on average they get about 7.24 more miles per gallon. Nevertheless, after taking into account other features, the relationship between miles per gallon and transmission type was found to be spurious, and thus the fuel efficiency of vehicles with manual transmissions is not due to their transmission type but other confounding variables.

Exploratory Analysis

The mtcar dataset is a data frame with 32 observations of 11 variables. The data “was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models).” Variables it contains are as follows :

  1. mpg : Miles/(US) gallon
  2. cyl : Number of cylinders
  3. disp : Displacement (cu.in.)
  4. hp : Gross horsepower
  5. drat : Rear axle ratio
  6. wt : Weight (lb/1000)
  7. qsec : ¼ mile time
  8. vs : V/S
  9. am : Transmission (0 = automatic, 1 = manual)
  10. gear : Number of forward gears
  11. carb : Number of carburetors

Upon loading the data set, an elementary statistical analysis was performed in order to discern the differences, if any, between transmission types with respect to the miles per gallon each gets. First, with the assumption that the underlying distribution of the mpg data is normal, a t-test was performed; the null hypothesis being that the gas mileage data of manual and automatic transmissions are from identical populations.

data(mtcars)
t.test(mpg ~ factor(am), data = mtcars)
## 
##  Welch Two Sample t-test
## 
## data:  mpg by factor(am)
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.28  -3.21
## sample estimates:
## mean in group 0 mean in group 1 
##           17.15           24.39

Given the results, the null hypothesis was rejected (p < .005) as manual transmission cars are clearly more gas-efficient than automatic transmission cars, getting about 7 more miles than an automatic for every gallon of gas (24.39 versus 17.15). To better illustrate the difference between the two populations,a barplot was generated (see Plot #1 in appendix) using the mean mpg for each transmission type and standard deviation as an error bar. Again, we see that vehicles with manual transmissions, on average, got the most miles out of every gallon of gas they burned.

Regression Analysis

Although we can be certain that there is a difference between vehicles that have automatic or manual transmissions, we cannot discern whether or not it is transmission type alone that is influencing this or some combination of confounding variables causing a spurious relationship. In order to control the influences of other variables, a multivariate regression is applied to the data set. After obtaining a model that controls for all other variables anova was used to compare it to a univariate model that only looks at the relationship between mpg and transmission.

##Step1) Create the univariate model
uni <- lm(mpg ~ factor(am), data = mtcars)
##Step2) Create the multivariate 
multi <- lm(mpg ~ factor(am) + cyl + disp + hp + drat + wt + qsec + vs + gear + carb, data = mtcars)
##Step3) Display a summary of both models
summary(uni)
## 
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -9.392 -3.092 -0.297  3.244  9.508 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    17.15       1.12   15.25  1.1e-15 ***
## factor(am)1     7.24       1.76    4.11  0.00029 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared:  0.36,   Adjusted R-squared:  0.338 
## F-statistic: 16.9 on 1 and 30 DF,  p-value: 0.000285

According to this basic model, we have a statistically significant relationship between transmission type and mpg, leading us to infer that manual transmission vehicles get approximately 7.24 more miles per gallon of gas than automatic vehicles. Nevertheless, when we compare this to the multivariate model, which controls for all other variables, we get different results.

summary(multi)
## 
## Call:
## lm(formula = mpg ~ factor(am) + cyl + disp + hp + drat + wt + 
##     qsec + vs + gear + carb, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
##  -3.45  -1.60  -0.12   1.22   4.63 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  12.3034    18.7179    0.66    0.518  
## factor(am)1   2.5202     2.0567    1.23    0.234  
## cyl          -0.1114     1.0450   -0.11    0.916  
## disp          0.0133     0.0179    0.75    0.463  
## hp           -0.0215     0.0218   -0.99    0.335  
## drat          0.7871     1.6354    0.48    0.635  
## wt           -3.7153     1.8944   -1.96    0.063 .
## qsec          0.8210     0.7308    1.12    0.274  
## vs            0.3178     2.1045    0.15    0.881  
## gear          0.6554     1.4933    0.44    0.665  
## carb         -0.1994     0.8288   -0.24    0.812  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.807 
## F-statistic: 13.9 on 10 and 21 DF,  p-value: 3.79e-07

Based on this more accurate model, manual transmission vehicles only get approximmately 2.52 more miles per gallon, and the results are no longer statistically significant. Also, Comparing the two models using anova shows them statistically different, and because the multivariate model should be the more accurate we can safely toss the univariate model aside.

##Step4) Compare both models to see if they are statistically significantly different.
anova(uni,multi)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(am) + cyl + disp + hp + drat + wt + qsec + vs + 
##     gear + carb
##   Res.Df RSS Df Sum of Sq    F  Pr(>F)    
## 1     30 721                              
## 2     21 147  9       573 9.07 1.8e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual Analysis

In order to validate this model we must examine its residuals, which are the vertical distances between data points and the graph of a regression equation. Using residual plots, we can assess whether the observed errors (residuals) are consistent with stochastic (random) errors. In other words, our model is valid if residuals are random, because if they are systematically incorrect (form a pattern), we know that the model can still be improved.

To this end, we plot residuals (please see Plot #2 and Plot #3) to investigate normality. Note that the Q-Q plot shows residual points located mostly near the line implying that the residuals are normally distributed. Also, notice that the Residuals vs. Fitted plot shows randomly scattered points above and below the 0 line; residuals appear to be consistent with stochastic error.

Diagnostics

First we look at leverage, by selecting the observations with the largest yhat values

leverage<-hatvalues(multi)
head(sort(leverage,decreasing=TRUE),3)
##       Merc 230 Ford Pantera L  Maserati Bora 
##         0.7423         0.6633         0.6428

Next, we examine influential points by finding the observations with the highest dbetas.

influential <- dfbetas(multi)

head(sort(influential[, "factor(am)1"], decreasing = T), 3)
##       Fiat 128 Toyota Corolla       Merc 230 
##         0.5721         0.5630         0.4026

Please see Plot #3 for a visual comparison.

Conclusion

Indubitably, vehicles with manual transmissions do get better gas mileage, however, this is not by virtue of their transmission type. Based on the findings above, it is clear that other variables are playing a larger role with respect to the mpg of the car. Thusly, the relationship between miles per gallon and transmission type is spurious, and further analysis is required to determine exactly what factor(s) has/have the greatest influence in terms of vehicle gas mileage.

Appendix

Plot #1

plot of chunk unnamed-chunk-8

Plot #2

plot of chunk unnamed-chunk-9

Plot #3

plot of chunk unnamed-chunk-10