Executive Summary

In this project we examine a dataset of a collection of cars to examine the relationship between a set of variables and miles per gallon (MPG), for Motor Trend magazine. In particular, we are interested in the following two questions:

  1. “Is an automatic or manual transmission better for MPG”"
  2. “Quantify the MPG difference between automatic and manual transmissions”

We performed the data analysis from mtcars dataset from the Using R library. It was found that a univariate analysis of the data may result in a misleading hypothesis that cars with manual transmission obtains better mileage compared to cars with automatic transmission in all cases, because other variables specifically weight and horsepower are confounding the result.

However, after performing a multivariate analysis including the other variables with a best-fit linear regression model it was found that cars with automatic transmission is expected to obtain better mileage compared to cars with manual transmission is only true for cars weighing approx. > 3 kilopounds (kips). In cases where cars weigh less than the threshold, manual transmission is expected to obtain better mileage.

Analysis

Overview of the Dataset

The data is extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models). The mtcars dataset is a data frame with 32 observations on 11 variables (see Fig. 1).

aggregate(mtcars$mpg,by=list(mtcars$am),FUN=mean)
##     Group.1        x
## 1 Automatic 17.14737
## 2    Manual 24.39231

A quick comparison of the mean mileage (mpg) between transmission type (am) indicates that cars with manual transmission obtains better mileage (more miles-per-gallon) than cars with automatic transmission.

However, it can also be observed in Fig. 2 that am has relatively large Pr(>|t|) value, hence its relationship to mpg warrants further investigation as it means that we cannot reject the null hypothesis that transmission type affects mileage when other factors such as weight (wt) and horsepower (hp) are also considered.

Relationship Between Transmission Type and Mileage

An initial analysis of the difference in means found with 95% certainty (p=0.001) that cars with manual transmission obtains better mileage (mpg) than cars with automatic transmission (See Fig. 4.).

However, further investigation revealed that the analysis may be incorrect, because from the dataset we found that these cars that have automatic transmission tend to be heavier and have engines with higher horsepower (Fig. 5). We have observed from Fig. 2 that weight (wt) and horsepower (hp) are more strongly correlated to mpg, and these variables confound the relationship betweem transmission type and mileage.

Quantifying the MPG Difference Between Manual & Automatic Transmissions

We can appeal to the theory of how cars work in reality and reduce the number of variables in the dataset to just mpg, wt, hp, and am with the purpose of creating the most parsimonious model of our data (i.e. using the least number of variables to explain the most amount of information).

  • cyl, disp, carb and vs are the properties of an engine, and the interactions of these variables predict hp, which is a measure of the engine’s performance. Therefore it is simpler to consider only hp for our model.

  • qsec is a measure of the time it takes for a car to traverse 1/4 miles, which is largely dependent on wt, hp, gear and drat, but does not influence mpg. Hence it is excluded from the model.

  • It is found that gear and drat are weak regressors for mpg based on the Pr(>|t|) values in Fig. 2., and these variables are excluded from the model.

  • It is reasonable to assume that as wt increases hp also increases, as more power is needed to move a heavier car.

After testing several models using ANOVA (Fig. 6), and performing diagnostics (Fig. 7) we selected a model mpg~(hp+wt)*am,data=mtcars as the best fit model.

The model’s R^2 value and p-value indicates that it explains 87.9312262% of the variance in the dataset with very high certainty. We will use this model to explain our conclusions.

Conclusion

Our chosen model shows that for automatic cars, mpg is expected to decrease by -0.0409441 for every 1% increase in hp and to decrease by -1.8559112 for every 1% increase in wt. For manual cars, mpg is expected to decrease by -0.0131505 for every 1% increase in hp and to decrease by -7.6248585 for every 1% increase in wt. Clearly mileage is strongly correlated with weight and is also influenced by the transmission type.

The relationship between transmission type and mileage is better visualised with a simpler model removing the hp regressor, mpg ~ wt * am (R^2=0.8330375) (Fig. 8). It is shown that while manual cars obtain better mileage for cars weighing approx. < 3 kilopounds, automatic cars obtain better mileage as the cars get heavier. This may explain why car manufacturers use automatic transmission for heavier cars.

Appendix

Figure 1. Dataset Variables

Variable Description
mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear axle ratio
wt Weight (lb/1000)
qsec 1/4 mile time
vs V/S (V- or Straight configuration engine)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

Figure 2. Estimating General Relationships

##                Estimate Std. Error    t value     Pr(>|t|)
## (Intercept) 29.19860367 7.22133946  4.0433778 0.0004431796
## amManual     1.58034169 1.86302859  0.8482649 0.4043394546
## wt          -2.55931331 1.07054018 -2.3906747 0.0246683636
## hp          -0.03962988 0.01342864 -2.9511468 0.0067880475
## gear4       -0.44504280 2.06503339 -0.2155136 0.8311155036
## gear5        0.27686650 2.35548118  0.1175414 0.9073703173
## drat         1.22917279 1.78243270  0.6896040 0.4967949659

Figure 3. Estimating General Relationships

Figure 4. Mileage by Transmission Type

## 
##  Welch Two Sample t-test
## 
## data:  mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

Figure 5. Weight & Horsepower vs Transmission Type

Figure 6. Model Selection using ANOVA

model.0 <- lm(mpg~wt,data=mtcars)
model.1 <- lm(mpg~wt+hp,data=mtcars)
model.2 <- lm(mpg~(wt+hp)*am,data=mtcars)

anova(model.0,model.1,model.2)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt
## Model 2: mpg ~ wt + hp
## Model 3: mpg ~ (wt + hp) * am
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1     30 278.32                                 
## 2     29 195.05  1    83.274 15.932 0.000478 ***
## 3     26 135.90  3    59.148  3.772 0.022631 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# model.2 was chosen as best fit
summary(model.2)
## 
## Call:
## lm(formula = mpg ~ (wt + hp) * am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9873 -1.4467 -0.5355  1.2614  5.5987 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.70393    2.67515  11.477 1.12e-11 ***
## wt          -1.85591    0.94511  -1.964  0.06034 .  
## hp          -0.04094    0.01363  -3.004  0.00583 ** 
## amManual    13.74000    4.22337   3.253  0.00316 ** 
## wt:amManual -5.76895    2.07201  -2.784  0.00987 ** 
## hp:amManual  0.02779    0.01921   1.447  0.15983    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.286 on 26 degrees of freedom
## Multiple R-squared:  0.8793, Adjusted R-squared:  0.8561 
## F-statistic: 37.89 on 5 and 26 DF,  p-value: 3.901e-11

Figure 7. Diagnostics of best fit model: mpg ~ (hp + wt) * am

Figure 8. Visualization of MPG vs Weight by Transmission Type

Model: mpg ~ wt * am