The purpose of this analysis is to develop a parsimonious and useful linear model to address 2 questions about the 1974 Motor Trends fuel consumption and automotive design data for 32 automobiles: 1) Is an automatic or manual transmission better for miles per gallon? 2) Quantify the MPG difference between automatic and manual transmissions.
The data indicates, however, transmission is a poor predictor of mpg. While it has a strong enough correlation to generate a mild relationship, transmission alonge does not provide much information about the variation in mpg. The data indicate that a manual transmission will improve mpg between 3.6 to 10.8 miles with 95% confidence. However, transmission only explains 36% of the variation in mileage.
Instead, weight and cylinders are strong predictors of mpg, explaining more than 80% of the variance in mpg. With all other variables being equal, the 95% confidence interval for the impact than an increase of 1000 lbs in weight has on lowering mpg is between 1.7 to 4.8 miles per gallon. The 95% confidence interval for the impact that the increase from 4 to 6 cylinders has on mpg is a decline by 1.4 to 7.1 miles per gallon, and for increasing from 4 to 8 cylinders the interval of decline is between 2.7 and 9.5 miles per gallon.
This analysis is built with R version 3.3.1 to evaluate the mtcars data set from the UsingR package, and uses ggplot2, dplyr, and GGally packages as well.
Exploratory data analysis looks at the correlation between all variables. It suggests that transmission may not be the most powerful predictor of mpg in the 1974 Motor Trends data set. The correlation matrix indicates that weight, cylinders, displacement and horsepower are all more highly correlated with mpg - and with each other - than transmission.
round(cor(data),2)
## mpg cyl disp hp drat wt qsec vs am gear carb
## mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
## cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
## disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
## hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
## drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
## wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
## qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
## vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
## am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
## gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
## carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
data6 <- data %>% dplyr::select(mpg, cyl, disp, hp, wt, am)
pairs(data6, panel=panel.smooth, main = "MT Cars Data Extract",
col = 3 + (data$am>0))
The first round of models looks at the predictive power of each of these 5 variables - cylinders, displacement, horsepower, weight and transmission. Although the p-value for the transmission-only model is sufficiently strong to reject the null hypothesis of beta = 0, the residual sum of squares is a modest .36.
The intercept and slope for the regression of mpg against transmission are:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## factor(am)1 7.244939 1.764422 4.106127 2.850207e-04
The confidence interval for the slope is:
confint(fit1am, 'am')
## 2.5 % 97.5 %
## am NA NA
ANOVA analysis reveals that weight is the best single predictor because it has the lowest residual sum of squared errors. Transmission, however, has the highest.
At this point displacement and horsepower variables lose relevance given they have much larger deviance than weight, and, the Shapiro-Wilks test indicates they do not have normally distributed errors.
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(am)
## Model 2: mpg ~ factor(cyl)
## Model 3: mpg ~ disp
## Model 4: mpg ~ hp
## Model 5: mpg ~ wt
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 29 301.26 1 419.63 40.3946 6.032e-07 ***
## 3 30 317.16 -1 -15.90 1.5302 0.226
## 4 30 447.67 0 -130.52
## 5 30 278.32 0 169.35
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## Variable Shapiro_Wilks_P_Value
## p.value "Transmission" 0.8573442
## p.value "Cylinder" 0.517665
## p.value "Displacement" 0.03254543
## p.value "Horse_Power" 0.02568169
## p.value "Weight" 0.1043878
The final model selection identifies that the combination of weight and cylinders is the most powerful equation. It has the lowest deviance and the most well distributed residuals (see appendix).
## Analysis of Variance Table
##
## Model 1: mpg ~ factor(am) + wt
## Model 2: mpg ~ factor(am) + factor(cyl)
## Model 3: mpg ~ wt + factor(cyl)
## Model 4: mpg ~ factor(am) + wt + factor(cyl)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 29 278.32
## 2 28 264.50 1 13.824 2.0400 0.1647
## 3 28 183.06 0 81.437
## 4 27 182.97 1 0.090 0.0133 0.9089
In the model that predicts mpg with the variables of weight and cylinder, the 95% confidence intervals for the coefficient for weight and cylinders are:
## 2.5 % 97.5 %
## (Intercept) 30.123824 37.857764
## wt -4.749898 -1.661328
## factor(cyl)6 -7.094824 -1.416341
## factor(cyl)8 -9.455418 -2.686301
Residuals for the weight + cylinder model (in purple) are closer to zero than the models including the transmission variable.
Finally, analysis of influence measures indicates that 4 vehicles have a large influence on this final model. The Fleetwood Cadillac and Lincoln Continental are very heavy and have very low mpg.
Boxplot of mpg vs. transmission looks like there might be a relationship.
Checking residual patterns
par(mfrow=c(2,2))
plot(fit2wtcyl)
par(mfrow=c(2,2))
plot(dffits(fit2wtcyl))
plot(dfbetas(fit2wtcyl)[,4]) # interaction term
plot(dfbetas(fit2wtcyl)[,3]) #coefficient for wt
plot(dfbetas(fit2wtcyl)[,2]) # coefficient for transmission
Influence
influence.measures(fit2wtcyl)
## Influence measures of
## lm(formula = mpg ~ wt + factor(cyl), data = data6) :
##
## dfb.1_ dfb.wt dfb.f..6 dfb.f..8 dffit cov.r
## Mazda RX4 -0.0207 0.0227 -0.05103 -0.01773 -0.0627 1.380
## Mazda RX4 Wag 0.0141 -0.0155 0.06417 0.01211 0.0834 1.349
## Datsun 710 -0.1888 -0.0168 0.28558 0.24631 -0.5001 0.893
## Hornet 4 Drive -0.0236 0.0258 0.22420 -0.02019 0.3393 1.222
## Hornet Sportabout 0.1171 -0.1282 0.05799 0.18624 0.2443 1.184
## Valiant 0.0228 -0.0250 -0.05376 0.01951 -0.0966 1.355
## Duster 360 -0.1073 0.1176 -0.05316 -0.19460 -0.2747 1.130
## Merc 240D -0.0350 0.0777 -0.08402 -0.10175 0.1173 1.366
## Merc 230 0.0547 -0.1272 0.14126 0.16970 -0.1971 1.328
## Merc 280 -0.0194 0.0212 0.04917 -0.01660 0.0869 1.355
## Merc 280C 0.0358 -0.0392 -0.09089 0.03068 -0.1607 1.335
## Merc 450SE -0.0121 0.0133 -0.00600 0.05994 0.1705 1.179
## Merc 450SL 0.0406 -0.0444 0.02009 0.09660 0.1561 1.202
## Merc 450SLC -0.0148 0.0162 -0.00733 -0.04038 -0.0689 1.240
## Cadillac Fleetwood 0.1128 -0.1236 0.05589 0.05958 -0.1526 1.440
## Lincoln Continental 0.0260 -0.0285 0.01287 0.01476 -0.0337 1.537
## Chrysler Imperial -0.7478 0.8192 -0.37047 -0.41228 0.9877 0.947
## Fiat 128 0.3726 -0.0644 -0.39830 -0.30819 0.7712 0.577
## Honda Civic 0.2143 -0.1396 -0.05525 0.00983 0.2545 1.247
## Toyota Corolla 0.6843 -0.3720 -0.30122 -0.10296 0.9224 0.517
## Toyota Corona -0.1558 -0.1100 0.39876 0.37879 -0.6371 0.741
## Dodge Challenger -0.0622 0.0681 -0.03080 -0.10655 -0.1457 1.234
## AMC Javelin -0.1116 0.1223 -0.05529 -0.17682 -0.2314 1.195
## Camaro Z28 -0.0417 0.0457 -0.02066 -0.14330 -0.2641 1.099
## Pontiac Firebird 0.0646 -0.0707 0.03198 0.22722 0.4217 0.907
## Fiat X1-9 -0.0454 0.0216 0.02524 0.01249 -0.0665 1.280
## Porsche 914-2 -0.0780 0.0206 0.07125 0.05146 -0.1463 1.236
## Lotus Europa 0.1878 -0.1292 -0.03667 0.02123 0.2143 1.296
## Ford Pantera L -0.1955 0.2142 -0.09688 -0.26434 -0.3174 1.207
## Ferrari Dino -0.0492 0.0539 -0.16311 -0.04213 -0.2062 1.320
## Maserati Bora -0.0722 0.0791 -0.03578 -0.13097 -0.1849 1.203
## Volvo 142E 0.0164 -0.2422 0.38830 0.42323 -0.5566 0.920
## cook.d hat inf
## Mazda RX4 0.001019 0.1643
## Mazda RX4 Wag 0.001802 0.1480
## Datsun 710 0.059350 0.0910
## Hornet 4 Drive 0.029105 0.1437
## Hornet Sportabout 0.015161 0.0986
## Valiant 0.002413 0.1531
## Duster 360 0.019011 0.0874
## Merc 240D 0.003558 0.1620
## Merc 230 0.009993 0.1558
## Merc 280 0.001956 0.1519
## Merc 280C 0.006658 0.1519
## Merc 450SE 0.007437 0.0719
## Merc 450SL 0.006247 0.0777
## Merc 450SLC 0.001229 0.0756
## Cadillac Fleetwood 0.006020 0.2074 *
## Lincoln Continental 0.000295 0.2479 *
## Chrysler Imperial 0.225488 0.2289
## Fiat 128 0.126528 0.0915
## Honda Civic 0.016529 0.1300
## Toyota Corolla 0.175238 0.1086 *
## Toyota Corona 0.091865 0.0937
## Dodge Challenger 0.005465 0.0914
## AMC Javelin 0.013632 0.0991
## Camaro Z28 0.017514 0.0736
## Pontiac Firebird 0.042564 0.0735
## Fiat X1-9 0.001146 0.1016
## Porsche 914-2 0.005510 0.0928
## Lotus Europa 0.011787 0.1428
## Ford Pantera L 0.025486 0.1312
## Ferrari Dino 0.010924 0.1533
## Maserati Bora 0.008746 0.0874
## Volvo 142E 0.073641 0.1121