In this analysis, the relationship of the aspects of automobile design contributing to fuel efficiency of 32 automobiles will be studied. By conducting exploratory data analysis, statistical inference and regression modeling, this study focuses in answering whether automatic or manual transmission is better for mpg (fuel efficiency) and at the same time quantify the differences. In the conclusion, it is shown that while manual transmission type may contribute to significantly better fuel efficiency in the initial statistical analysis, regression modeling shows that there are other contributing factors better in explaining variability in mpg.
The cars are models between 1973 and 1974 and the data was extracted from the 1974 Motor Trend US magazine. Fuel efficiency is measured using miles per gallon (mpg). The higher the mpg, the better the fuel efficiency. The 10 aspects of automobile design collected in the dataset are listed in Appendix 1.
Appendix 2 shows the box plot of cars’ mpg categorised by their respective transmission type. Next to the box plot is the correlation matrix of all 11 factors in the dataset.
The box plot shows clearly that manual transmission is likely to have better performance in terms of fuel efficiency as compared to automatic transmission. Manual transmission cars achieved a higher median mpg than automatic transmission cars.
t.test(mpg ~ am, paired=FALSE, var.equal=FALSE, data=mycars)$p.value
## [1] 0.001373638
A t test is conducted to check whether the means of the transmissions are significantly different from zero. Since the p-value of 0.001 is less than = 0.05, we reject the null hypothesis. At 5% level of significance, the data does provide sufficient evidence that the mean mpg by manual and automatic transmission are different.
However, if the observations made on the correlation matrix were taken into consideration, there could be other more contributing aspects of the car that would give it better performance in terms of fuel efficiency.
To begin, a linear regression on mpg vs am is performed and set as a base model.
fit1 <- lm(mpg ~ am, mycars)
summary(fit1)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am1 7.244939 1.764422 4.106127 2.850207e-04
summary(fit1)$r.squared
## [1] 0.3597989
Note that the base model has a \(R^2\) of 0.36. This means that the model is only explaining 36% of mpg total variability. The model returns a regression coefficient of 7.25 for am1. With 0 and 1 refering to automatic and manual transmission resepectively, this coefficient shows that the mean mpg of a car will increase by 7.25 if the transmission is switched from automatic to manual. Hence, better fuel efficiency for manual transmission cars.
In attempt to achieve better \(R^2\) and since mpg is highly negative correlated with cyl, disp, hp and wt, these factors will be modeled to determine whether they contribute to better fuel efficiency. A total of 5 linear regression models were tested using Anova. The test results can been seen in Appendix 3. It is observed that Model 3 and Model 5 did not significantly reduce the RSS. This implies disp and am are not really contributing to better model fits.
fitfinal <- lm(mpg ~ wt+hp+cyl, mycars)
summary(fitfinal)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 35.84599532 2.0410191 17.562793 2.670320e-16
## wt -3.18140405 0.7196010 -4.421067 1.441756e-04
## hp -0.02311981 0.0119522 -1.934357 6.361269e-02
## cyl6 -3.35902490 1.4016697 -2.396445 2.374718e-02
## cyl8 -3.18588444 2.1704753 -1.467828 1.537047e-01
summary(fitfinal)$r.squared
## [1] 0.8572195
Using wt, hp and cyl as the regressors, it is shown in the results above, this model has a \(R^2\) of 0.86. Also means it explains 86% of mpg total variability. The coefficients of each factor take reference from cars with 4 cylinders. Changing to 6 and 8 cylinders will decrease the mpg by 3.36 and 3.19 respectively. A 1000 lb increase in wt will decrease mpg by 3.18. A one unit increase in hp, decreases the mpg slightly by 0.02.
Appendix 4 plots the 4 diagnostic plots of the final model. No patterns like heteroskedasticity, etc in the residuals are detected. The residuals look normally distributed and there are no points that have substantial influence on the regression model.
It is determined that automatic or manual transmission may not necessary be better for mpg Other aspects of a car like lower weight and fewer number of cylinders contribute more significantly in terms of fuel efficiency. Should the transmission type be included in the final regression model (see results in Appendix 5), the \(R^2\) of the model did not increase significantly. This suggests a more complex model with am1 may not be necessary. However, when comparing the am1 coefficient in this model with the base model, the difference in MPG between automatic and manual transmission would drop from 7.25 in the base model to just 1.81.
Finally, this study was done on an old and small sample dataset. The results may not be reflective of the cars manufactured now. For a more thorough complete study, more data of different cars should be collected. Analysis should be done on cars with the similar specifications like similar weight, same number of cylinders, etc. Only then, will one be able to derived stronger conclusion to determine whether manual transmission bring better fuel efficiency.
| Column Name | Class | Description |
|---|---|---|
| wt | numeric | Weight (1000 lbs) |
| cyl | factor | Number of cylinders (4, 6 or 8) |
| disp | numeric | Displacement (cu.in.) |
| hp | numeric | Gross horsepower |
| gear | factor | Number of forward gears (3, 4 or 5) |
| carb | factor | Number of carburetors (1, 2, 3, 4, 6 or 8) |
| drat | numeric | Rear axle ratio |
| vs | factor | V/S - Type of Engine (0 = V-engine, 1 = Straight-engine) |
| am | factor | Transmission (0 = automatic, 1 = manual) |
| qsec | numeric | 1/4 mile time |
Observations made on correlation matrix:
fit2 <- lm(mpg ~ wt, mycars)
fit3 <- lm(mpg ~ wt+hp, mycars)
fit4 <- lm(mpg ~ wt+hp+disp, mycars)
fit5 <- lm(mpg ~ wt+hp+disp+cyl, mycars)
fit6 <- lm(mpg ~ wt+hp+disp+cyl+am, mycars)
anova(fit2, fit3, fit4, fit5, fit6)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt
## Model 2: mpg ~ wt + hp
## Model 3: mpg ~ wt + hp + disp
## Model 4: mpg ~ wt + hp + disp + cyl
## Model 5: mpg ~ wt + hp + disp + cyl + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 278.32
## 2 29 195.05 1 83.274 13.8413 0.001012 **
## 3 28 194.99 1 0.057 0.0095 0.923183
## 4 26 160.13 2 34.864 2.8974 0.073837 .
## 5 25 150.41 1 9.718 1.6153 0.215451
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fitfinal <- lm(mpg ~ wt+hp+cyl+am, mycars)
summary(fitfinal)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832390 2.60488618 12.940421 7.733392e-13
## wt -2.49682942 0.88558779 -2.819404 9.081408e-03
## hp -0.03210943 0.01369257 -2.345025 2.693461e-02
## cyl6 -3.03134449 1.40728351 -2.154040 4.068272e-02
## cyl8 -2.16367532 2.28425172 -0.947214 3.522509e-01
## am1 1.80921138 1.39630450 1.295714 2.064597e-01
summary(fitfinal)$r.squared
## [1] 0.8658799