Executive Summary

In this analysis, the relationship of the aspects of automobile design contributing to fuel efficiency of 32 automobiles will be studied. By conducting exploratory data analysis, statistical inference and regression modeling, this study focuses in answering whether automatic or manual transmission is better for mpg (fuel efficiency) and at the same time quantify the differences. In the conclusion, it is shown that while manual transmission type may contribute to significantly better fuel efficiency in the initial statistical analysis, regression modeling shows that there are other contributing factors better in explaining variability in mpg.

Exploratory Data Analysis

The cars are models between 1973 and 1974 and the data was extracted from the 1974 Motor Trend US magazine. Fuel efficiency is measured using miles per gallon (mpg). The higher the mpg, the better the fuel efficiency. The 10 aspects of automobile design collected in the dataset are listed in Appendix 1.

Appendix 2 shows the box plot of cars’ mpg categorised by their respective transmission type. Next to the box plot is the correlation matrix of all 11 factors in the dataset.

The box plot shows clearly that manual transmission is likely to have better performance in terms of fuel efficiency as compared to automatic transmission. Manual transmission cars achieved a higher median mpg than automatic transmission cars.

t.test(mpg ~ am, paired=FALSE, var.equal=FALSE, data=mycars)$p.value
## [1] 0.001373638

A t test is conducted to check whether the means of the transmissions are significantly different from zero. Since the p-value of 0.001 is less than = 0.05, we reject the null hypothesis. At 5% level of significance, the data does provide sufficient evidence that the mean mpg by manual and automatic transmission are different.

However, if the observations made on the correlation matrix were taken into consideration, there could be other more contributing aspects of the car that would give it better performance in terms of fuel efficiency.

Regression Modeling

To begin, a linear regression on mpg vs am is performed and set as a base model.

fit1 <- lm(mpg ~ am, mycars)
summary(fit1)$coef
##              Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 17.147368   1.124603 15.247492 1.133983e-15
## am1          7.244939   1.764422  4.106127 2.850207e-04
summary(fit1)$r.squared
## [1] 0.3597989

Note that the base model has a \(R^2\) of 0.36. This means that the model is only explaining 36% of mpg total variability. The model returns a regression coefficient of 7.25 for am1. With 0 and 1 refering to automatic and manual transmission resepectively, this coefficient shows that the mean mpg of a car will increase by 7.25 if the transmission is switched from automatic to manual. Hence, better fuel efficiency for manual transmission cars.

In attempt to achieve better \(R^2\) and since mpg is highly negative correlated with cyl, disp, hp and wt, these factors will be modeled to determine whether they contribute to better fuel efficiency. A total of 5 linear regression models were tested using Anova. The test results can been seen in Appendix 3. It is observed that Model 3 and Model 5 did not significantly reduce the RSS. This implies disp and am are not really contributing to better model fits.

fitfinal <- lm(mpg ~ wt+hp+cyl, mycars)
summary(fitfinal)$coef
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 35.84599532  2.0410191 17.562793 2.670320e-16
## wt          -3.18140405  0.7196010 -4.421067 1.441756e-04
## hp          -0.02311981  0.0119522 -1.934357 6.361269e-02
## cyl6        -3.35902490  1.4016697 -2.396445 2.374718e-02
## cyl8        -3.18588444  2.1704753 -1.467828 1.537047e-01
summary(fitfinal)$r.squared
## [1] 0.8572195

Using wt, hp and cyl as the regressors, it is shown in the results above, this model has a \(R^2\) of 0.86. Also means it explains 86% of mpg total variability. The coefficients of each factor take reference from cars with 4 cylinders. Changing to 6 and 8 cylinders will decrease the mpg by 3.36 and 3.19 respectively. A 1000 lb increase in wt will decrease mpg by 3.18. A one unit increase in hp, decreases the mpg slightly by 0.02.

Appendix 4 plots the 4 diagnostic plots of the final model. No patterns like heteroskedasticity, etc in the residuals are detected. The residuals look normally distributed and there are no points that have substantial influence on the regression model.

Conclusion

It is determined that automatic or manual transmission may not necessary be better for mpg Other aspects of a car like lower weight and fewer number of cylinders contribute more significantly in terms of fuel efficiency. Should the transmission type be included in the final regression model (see results in Appendix 5), the \(R^2\) of the model did not increase significantly. This suggests a more complex model with am1 may not be necessary. However, when comparing the am1 coefficient in this model with the base model, the difference in MPG between automatic and manual transmission would drop from 7.25 in the base model to just 1.81.

Finally, this study was done on an old and small sample dataset. The results may not be reflective of the cars manufactured now. For a more thorough complete study, more data of different cars should be collected. Analysis should be done on cars with the similar specifications like similar weight, same number of cylinders, etc. Only then, will one be able to derived stronger conclusion to determine whether manual transmission bring better fuel efficiency.


Appendix

Appendix 1 - 10 aspects of automobile design collected in the data set

Column Name Class Description
wt numeric Weight (1000 lbs)
cyl factor Number of cylinders (4, 6 or 8)
disp numeric Displacement (cu.in.)
hp numeric Gross horsepower
gear factor Number of forward gears (3, 4 or 5)
carb factor Number of carburetors (1, 2, 3, 4, 6 or 8)
drat numeric Rear axle ratio
vs factor V/S - Type of Engine (0 = V-engine, 1 = Straight-engine)
am factor Transmission (0 = automatic, 1 = manual)
qsec numeric 1/4 mile time

Appendix 2 - Exploratory Data Analysis

Observations made on correlation matrix:

  • weight (wt) is highly negative correlated to mpg. The higher the wt, the fuel efficiency of the car drops.
  • wt is highly correlated with number of cylinders (cyl) and displacement (disp). This is expected because the more cyl the car has, the heavier the car is. Similarly, the larger the displacement (capacity) in the cylinders, the weight of the car will also be larger.
  • with more cyl and disp in the car, one will also expect the car’s horsepower (hp) to increase. Hence, the high correlation observed among the 3 factors: cyl, disp and hp. This leads to hp being also slightly correlated with wt.
  • qsec refers to the time taken for a stationary car to cover 1/4 mile distance. This is not an aspect of the car that will affect fuel efficiency. Hence, it should be excluded for analysis. Note that qsec is highly negative correlated to hp. This make sense. The higher the hp, the faster the car takes to cover 1/4 mile.

Appendix 3 - Anova results

fit2 <- lm(mpg ~ wt, mycars)
fit3 <- lm(mpg ~ wt+hp, mycars)
fit4 <- lm(mpg ~ wt+hp+disp, mycars)
fit5 <- lm(mpg ~ wt+hp+disp+cyl, mycars)
fit6 <- lm(mpg ~ wt+hp+disp+cyl+am, mycars)

anova(fit2, fit3, fit4, fit5, fit6)
## Analysis of Variance Table
## 
## Model 1: mpg ~ wt
## Model 2: mpg ~ wt + hp
## Model 3: mpg ~ wt + hp + disp
## Model 4: mpg ~ wt + hp + disp + cyl
## Model 5: mpg ~ wt + hp + disp + cyl + am
##   Res.Df    RSS Df Sum of Sq       F   Pr(>F)   
## 1     30 278.32                                 
## 2     29 195.05  1    83.274 13.8413 0.001012 **
## 3     28 194.99  1     0.057  0.0095 0.923183   
## 4     26 160.13  2    34.864  2.8974 0.073837 . 
## 5     25 150.41  1     9.718  1.6153 0.215451   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Appendix 4 - Final Model Diagnostic Plots

Appendix 5 - Final model with additional am as regressor

fitfinal <- lm(mpg ~ wt+hp+cyl+am, mycars)
summary(fitfinal)$coef
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 33.70832390 2.60488618 12.940421 7.733392e-13
## wt          -2.49682942 0.88558779 -2.819404 9.081408e-03
## hp          -0.03210943 0.01369257 -2.345025 2.693461e-02
## cyl6        -3.03134449 1.40728351 -2.154040 4.068272e-02
## cyl8        -2.16367532 2.28425172 -0.947214 3.522509e-01
## am1          1.80921138 1.39630450  1.295714 2.064597e-01
summary(fitfinal)$r.squared
## [1] 0.8658799