Author: Stephanie Roark
When deciding whether to purchase automobile, there are several factors one might consider including automatic or manual transmissions, horsepower, and such. Data published in 1974 Motor Trend US Magazine is comprised of fuel consumption and ten aspects of automobile design and performance for thiry-two automobiles (1973–74 models). Thirteen cars had manual transmissions while the other nineteen had automatic transmissions. Exploring the relationship between fuel consumption (measured in miles per gallon (MPG)) and transmission type and the other variables in the dataset I will examine horse power, weight of the car (in thousands of pounds (lbs)), displacement (in cubic inches), number of carburetors, number of cylinders, quarter mile time, rear axle ratio, engine style of V or Straight (V/S), and number of forward gears. As we might expect from physics, the greatest factor in determining the miles per gallon for a vehicle is the weight of the car and horsepower of the engine. Transmission type is not a significant factor in the fuel economy of the vehicles considered.
In Figure 1, the boxplot shows that when comparing only transmission type vs. miles per gallon, manual tranmissions get better gas mileage than automatic transmissions.
In Figure 2, we see that the miles per gallon decreases as the car weight increases. There seems to be seperation between automatic and manual transmisions: the manual transmission vehicles appear to weigh less in general than automatic transmission cars. This plot suggests that it is necessary to explore the effect of transmission type when controlling for the weight of the car.
In Figure 3, the plot shows the miles per gallon decreasing as the horsepower increasing. Again we see a seperation between automatic and manual transmisions: the manual transmissions tend to be in the lower horsepower automobiles. This plot suggests a relationship with mpg and hp that should also be analyzed further.
In Figure 4, a lower number of cylinders corresponds to higher miles per gallon in both automatic and manual transmission cars, but there are more manual transmission cars with a lower number of cylinders than automatic transmissions. The relationship of the transmission type and the number of cylinders could be related as a design factor and should be analyzed in relation to mpg.
From the t test of mpg~am, we find the mean value of MPG for automatic transmission is 17.15 and manual transmission is 24.39, a difference of 7.24. The p-value is 0.0013736 which significantly lower than 0.05 and therefore we reject the null hypothesis that there is no difference in MPG for automatic vs. manual transmission.
In consideration for modeling the variables effect on miles per gallon, we must consider which variables are design factors versus resulting properties of the design. Like MPG, the quarter mile time (qsec) variable is assumed to be a resulting property of the design and therefore will not be used in the modeling for analysis of MPG Another assumption about our model is that the residuals have constant variance and follow a normal distribution with mean 0.
We begin with the linear model fitted with all of the variables.
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
The model including all variables does not indicate which variables are significant.
We need to discover which variables are predictors of mpg. Using a stepwise variable selection approach where we look for minimum AIC, we examine the variables both forwards and backwards to find the significant variables and model that fits the data but does not overfit.
## Start: AIC=66.11
## mpg ~ am + wt + hp + cyl + disp + gear
##
## Df Sum of Sq RSS AIC
## - gear 1 0.062 163.12 64.120
## - am 1 4.610 167.67 65.000
## - disp 1 6.668 169.73 65.390
## <none> 163.06 66.108
## - cyl 1 14.872 177.93 66.901
## - hp 1 19.347 182.41 67.696
## - wt 1 53.250 216.31 73.151
##
## Step: AIC=64.12
## mpg ~ am + wt + hp + cyl + disp
##
## Df Sum of Sq RSS AIC
## - disp 1 6.878 170.00 63.442
## - am 1 7.325 170.44 63.526
## <none> 163.12 64.120
## - cyl 1 16.788 179.91 65.255
## - hp 1 25.306 188.43 66.735
## - wt 1 53.247 216.37 71.160
##
## Step: AIC=63.44
## mpg ~ am + wt + hp + cyl
##
## Df Sum of Sq RSS AIC
## - am 1 6.623 176.62 62.665
## - cyl 1 10.293 180.29 63.323
## <none> 170.00 63.442
## - hp 1 21.049 191.05 65.177
## - wt 1 50.555 220.55 69.773
##
## Step: AIC=62.66
## mpg ~ wt + hp + cyl
##
## Df Sum of Sq RSS AIC
## <none> 176.62 62.665
## - hp 1 14.551 191.17 63.198
## - cyl 1 18.427 195.05 63.840
## - wt 1 115.354 291.98 76.750
##
## Call:
## lm(formula = mpg ~ wt + hp + cyl, data = mtcars)
##
## Coefficients:
## (Intercept) wt hp cyl
## 38.75179 -3.16697 -0.01804 -0.94162
## Start: AIC=115.94
## mpg ~ 1
##
## Df Sum of Sq RSS AIC
## + wt 1 847.73 278.32 73.217
## + cyl 1 817.71 308.33 76.494
## + disp 1 808.89 317.16 77.397
## + hp 1 678.37 447.67 88.427
## + am 1 405.15 720.90 103.672
## + gear 1 259.75 866.30 109.552
## <none> 1126.05 115.943
##
## Step: AIC=73.22
## mpg ~ wt
##
## Df Sum of Sq RSS AIC
## + cyl 1 87.150 191.17 63.198
## + hp 1 83.274 195.05 63.840
## + disp 1 31.639 246.68 71.356
## <none> 278.32 73.217
## + gear 1 1.137 277.19 75.086
## + am 1 0.002 278.32 75.217
##
## Step: AIC=63.2
## mpg ~ wt + cyl
##
## Df Sum of Sq RSS AIC
## + hp 1 14.5514 176.62 62.665
## <none> 191.17 63.198
## + gear 1 3.0281 188.14 64.687
## + disp 1 2.6796 188.49 64.746
## + am 1 0.1249 191.05 65.177
##
## Step: AIC=62.66
## mpg ~ wt + cyl + hp
##
## Df Sum of Sq RSS AIC
## <none> 176.62 62.665
## + am 1 6.6228 170.00 63.442
## + disp 1 6.1762 170.44 63.526
## + gear 1 0.8558 175.76 64.509
##
## Call:
## lm(formula = mpg ~ wt + cyl + hp, data = mtcars)
##
## Coefficients:
## (Intercept) wt cyl hp
## 38.75179 -3.16697 -0.94162 -0.01804
The step AIC analysis showed that wt, hp and cyl are the variables which are most significant in predicting mpg. Transmission type was not selected so as a check to see if transmission type is significant we will force the stepwise variable selection process to consider the transmission type.
## Start: AIC=103.67
## mpg ~ am
##
## Df Sum of Sq RSS AIC
## + hp 1 475.46 245.44 71.194
## + cyl 1 449.53 271.36 74.407
## + wt 1 442.58 278.32 75.217
## + disp 1 420.62 300.28 77.647
## <none> 720.90 103.672
## + gear 1 0.05 720.85 105.670
##
## Step: AIC=71.19
## mpg ~ am + hp
##
## Df Sum of Sq RSS AIC
## + wt 1 65.148 180.29 63.323
## + cyl 1 24.886 220.55 69.773
## + disp 1 19.336 226.10 70.568
## <none> 245.44 71.194
## + gear 1 7.458 237.98 72.207
##
## Step: AIC=63.32
## mpg ~ am + hp + wt
##
## Df Sum of Sq RSS AIC
## <none> 180.29 63.323
## + cyl 1 10.2933 170.00 63.442
## + gear 1 0.9507 179.34 65.154
## + disp 1 0.3835 179.91 65.255
##
## Call:
## lm(formula = mpg ~ am + hp + wt, data = mtcars)
##
## Coefficients:
## (Intercept) am hp wt
## 34.00288 2.08371 -0.03748 -2.87858
After performing a step AIC which forces the am variable to be considered, we see that transmission type is not a significant predictor for mpg. Based on the results of the step process, we build models to quantify the dependencies of mpg. We begin by once again examining whether transmission type is a predictor for mpg by building three models which also include wt, hp and cyl in each successive model.
fit1 <- lm(mpg~ am + wt, data=mtcars)
summary(fit1)
fit2 <- lm(mpg~ am + wt + hp, data=mtcars)
summary(fit2)
fit3 <- lm(mpg~ am + hp + wt + cyl, data=mtcars)
summary(fit3)
Once again, in all three models transmission type is not significant when we control for wt, hp and cyl. Next we build models excluding am and examine mpg with wt, hp and cyl as predictors.
##
## Call:
## lm(formula = mpg ~ hp + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.941 -1.600 -0.182 1.050 5.854
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.22727 1.59879 23.285 < 2e-16 ***
## hp -0.03177 0.00903 -3.519 0.00145 **
## wt -3.87783 0.63273 -6.129 1.12e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8148
## F-statistic: 69.21 on 2 and 29 DF, p-value: 9.109e-12
##
## Call:
## lm(formula = mpg ~ wt + hp + cyl, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9290 -1.5598 -0.5311 1.1850 5.8986
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 38.75179 1.78686 21.687 < 2e-16 ***
## wt -3.16697 0.74058 -4.276 0.000199 ***
## hp -0.01804 0.01188 -1.519 0.140015
## cyl -0.94162 0.55092 -1.709 0.098480 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.512 on 28 degrees of freedom
## Multiple R-squared: 0.8431, Adjusted R-squared: 0.8263
## F-statistic: 50.17 on 3 and 28 DF, p-value: 2.184e-11
In the model which includes wt and hp as predictors to mpg, the variables are are both significant and have very small p-values. When the model also includes cyl, the variables hp and cyl both have p-values of 0.1400 and 0.0985 which are greater than 0.05. In this model, wt is the significant variable.
The t test indicated that there is a difference between manual and automatic transmissions with the manual transmission having 7.24 mpg better performance than automatics. However, the stepwise variable selection based on AIC resulted in wt, hp and cyl as the significant variables in the model of the mtcars dataset, recalling that we excluded qsec because it is a result of the design and not an input. Models which included transmission type with these variables also showed that transmission type is not a predictor for mpg when controling for hp and wt and cyl. In every model which includes weight, the p-value for transmission type is greater than 0.05.
The models which compare mpg to wt, hp and cyl show that when mpg is controled for all 3 variables, the hp and cyl are not significant, possibly because they are colinear. The model that performs the best describes mpg as a function of wt and hp. This model results in very low p-values for hp and wt and a Multiple \(R^2\) value of 0.8268 or 82.68%. This is the coefficient of determination which is the proportion of the variance in the data that’s explained by the model. The Adjusted R-squared is 0.8148 and this is reduced slightly to account for the number of variables in the model. The p-value for the model is 9.109e-12.
In the best fit model which models mpg as a linear combination of wt and hp, we see that the mpg decreases by 3.88 for every thousand pound (lbs) increase in weight. Also, the mpg decreases by 0.03 with each unit increase in horse power. The regression equation is mpg = 37.22727 -3.917 wt(st.error 0.63273) -0.03177hp(st.error 0.00903. The model’s residual standard error is 2.593 on 29 degrees of freedom.
We can conclude then that once you control for horsepower and weight, transmission type is not a signicant predictor of the miles per gallon fuel efficiency. The most significant predictor for mpg is the weight of the car and the horsepower of the engine. The mpg of a car is different for each transmission type, but it results from the weight of the car and not the specific transmission of the car.
Plotting the residuals:
The residuals are normally distributed and homoskedastic.
Figure1: Motor Trends Cars by MPG and Transmission
Figure 2: MPG vs. Weight by Transmission
Figure 3: MPG vs. HP by Transmission
Firgure 4: Plot of MPG vs. CYL by Transmission