I have explored the “mtcars” data set to investigate the influence of a car’s transmission type (am) on the car’s consumption. Two questions were of interest:
If the car’s weight (wt) is not considered [wt = 0] then manual transmission is better for mpg. Remark: There are no cars with automatic transmission < 2500lbs and no cars with manual transmission > 3800lbs.
The MPG difference greatly depends on weight (wt), horse power (hp), and the car’s transmission type. The table below gives an overview on the expected value of MPG by a given wt, hp (quantiles) and transmission. The zero and one indicate manual transmission (1), and automatic transmission (0). Manual transmission is better for cars with lower (Q25%) wt and hp. Whereby automatic transmission is better for cars with higher (Q75%) wt and hp.
## Q25%(1) Q25%(0) Q50%(1) Q50%(0) Q75%(1) Q75%(0) Dif25 Dif50 Dif75
## wt 2.58 2.58 3.22 3.22 3.61 3.61 0.00 0.00 0.00
## hp 96.50 96.50 146.69 146.69 180.00 180.00 0.00 0.00 0.00
## am 1.00 0.00 1.00 0.00 1.00 0.00 1.00 1.00 1.00
## E[MPG] 24.17 21.85 18.94 18.90 15.65 17.02 2.32 0.04 -1.36
## lwr 19.09 16.65 13.48 13.92 9.78 12.07 2.45 -0.44 -2.29
## upr 29.25 27.06 24.41 23.88 21.52 21.96 2.19 0.53 -0.44
summary(lm(mpg ~ am, data = mtcars))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## am 7.244939 1.764422 4.106127 2.850207e-04
Conclusion: The model (only “am” considered) estimates, statistically significant, an increase of 7.245 mpg by switching from automatic (0) to manual (1) transmission. Cars with automatic transmission have a range in average of 24.4 mpg (intercept = 17.15 + 7.24) and with manual transmission of 17.15 mpg.
summary(lm(mpg ~., data = mtcars))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
Conclusion: The model (all variables considered) estimates an increase of 2.52 mpg by switching from automatic (0) to manual (1) transmission and holding all other variables constant. The estimate comes with a p-value of 0.233 and is not statistically significant. The model further indicates that weight (wt), cylinder (cyl), horse power (hp), carburetors (carb) seem to have a negative effect on MPG, whereby the transmission type (shifting from automatic to manual), and the number of gears have a positive effect.
fit1 <- lm(mpg ~ am, data = mtcars)
fit3 <- update(fit1, mpg ~ am + wt)
fit5 <- update(fit3, mpg ~ am + wt + hp)
fit7 <- update(fit5, mpg ~ am + wt + hp + factor(cyl))
fit9 <- update(fit7, mpg ~ am + wt + hp + factor(cyl) + factor(gear))
fit11 <- update(fit9, mpg ~ am + wt + hp + factor(cyl) + factor(gear)+ disp + factor(vs) + qsec + factor(carb))
anova(fit1, fit3, fit5, fit7, fit9, fit11 )
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ am + wt
## Model 3: mpg ~ am + wt + hp
## Model 4: mpg ~ am + wt + hp + factor(cyl)
## Model 5: mpg ~ am + wt + hp + factor(cyl) + factor(gear)
## Model 6: mpg ~ am + wt + hp + factor(cyl) + factor(gear) + disp + factor(vs) +
## qsec + factor(carb)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 29 278.32 1 442.58 57.9367 1.051e-06 ***
## 3 28 180.29 1 98.03 12.8327 0.002491 **
## 4 26 151.03 2 29.27 1.9155 0.179552
## 5 24 149.67 2 1.36 0.0891 0.915249
## 6 16 122.22 8 27.44 0.4490 0.873734
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: Based on the variance table, which is the outcome of the nested model testing, I chose model 3 with lm(mpg ~ am + wt + hp). Remark: The p-values in the variance table are for the hypothesis test of whether the new variables are all zero or not (i.e. whether or not they’re necessary). Unclear is if there are interactions between the variables.
mtcars$am[which(mtcars$am == 0)] <- 'Automatic'
mtcars$am[which(mtcars$am == 1)] <- 'Manual'
mtcars$am <- as.factor(mtcars$am)
p <- plot_ly(mtcars, x = ~wt, y = ~hp, z = ~mpg, color = ~am, colors = c('salmon', 'lightblue')) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Weight'),
yaxis = list(title = 'Gross horsepower'),
zaxis = list(title = 'Miles Per Gallon')))
p
The correlation table below illustrates the association between weight, horse power, transmission type.
my_fn<-function(data, mapping, method = "loess", ...) {
p<-ggplot(data = data, mapping = mapping) +
geom_point() +
geom_smooth(method = method, ...)
p}
g = ggpairs(mtcars[,c("mpg", "wt", "hp", "am")], lower = list(continuous = my_fn))
g
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
The plot shows a dependency between weight and transmission. Green dots indicate manual and red dots indicate automatic transmission, hp is not considered.
require(ggplot2)
fitint<-lm(mpg ~ wt * factor(am), data = mtcars)
summary(fitint)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 31.416055 3.0201093 10.402291 4.001043e-11
## wt -3.785908 0.7856478 -4.818836 4.551182e-05
## factor(am)Manual 14.878423 4.2640422 3.489277 1.621034e-03
## wt:factor(am)Manual -5.298360 1.4446993 -3.667449 1.017148e-03
s11<-coef(fitint)[2]; i11<-coef(fitint)[1]
s12<-s11 +coef(fitint)[4]; i12<-i11 + coef(fitint)[3]
g = ggplot(mtcars, aes(x = wt, y = mpg, color = factor(am)))
g1 = geom_boxplot()
g = g + geom_point(size = 3, colour = "black") + geom_point(size = 4)
g = g + xlab("Weight") + ylab("Miles per Gallon")
g = g + geom_abline(slope = s11, intercept = i11, colour = "salmon")
g = g + geom_abline(slope = s12, intercept = i12, colour = "lightblue" )
g
Finding: Cars with automatic and manual transmissions are bipolar distributed. Cars with lower weight are equipped with manual transmission and cars with higher weight with automatic transmission. The plot indicates an interaction between weight and transmission. The influence of the weight on MPG changes with the transmission type. The effect is described by two different slopes and intercepts.
\[ E[Y_{i}|x_{1},x_{2},x_{3}]=\beta_{0}+\beta_{1}x_{1}+\beta_{2}x_{2}+\beta_{3}x_{3} \\ E[Y_{i}|x_{1}=hp,x_{2}=wt,x_{3}=am=\beta_{0}+\beta_{1}hp+\beta_{2}wt+\beta_{3}am \\ \] Interaction between wt and am \[ E[Y_{i}|x_{1}=hp,x_{2}=wt,x_{3}=am]=\beta_{0}+\beta_{1}hp+\beta_{2}wt+\beta_{3}am+\beta_{4}wt\times am \\ E[Y_{i}|x_{1}=hp,x_{2}=wt,x_{3}=am=1]=\beta_{0}+\beta_{1}hp+\beta_{3}am+(\beta_{2}+\beta_{4)}\times wt \\ E[Y_{i}|x_{1}=hp,x_{2}=wt,x_{3}=am=0]=\beta_{0}+\beta_{1}hp+\beta_{2}wt\\ \]
require(ggplot2)
fitint<-lm(mpg ~ hp * factor(am), data = mtcars)
summary(fitint)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 26.6248478696 2.18294320 12.19676624 1.014017e-12
## hp -0.0591369818 0.01294486 -4.56837583 9.018508e-05
## factor(am)Manual 5.2176533777 2.66509311 1.95777527 6.028998e-02
## hp:factor(am)Manual 0.0004028907 0.01646022 0.02447662 9.806460e-01
s11<-coef(fitint)[2]; i11<-coef(fitint)[1]
s12<-s11 +coef(fitint)[4]; i12<-i11 + coef(fitint)[3]
g = ggplot(mtcars, aes(x = hp, y = mpg, color = factor(am)))
g1 = geom_boxplot()
g = g + geom_point(size = 3, colour = "black") + geom_point(size = 4)
g = g + xlab("Horse Power") + ylab("Miles per Gallon")
g = g + geom_abline(slope = s11, intercept = i11, colour = "salmon")
g = g + geom_abline(slope = 0, intercept = mean(mtcars[mtcars$am==0,]$mpg), colour = "salmon", lty = 2 )
g = g + geom_abline(slope = s12, intercept = i12, colour = "lightblue")
g = g + geom_abline(slope = 0, intercept = mean(mtcars[mtcars$am==1,]$mpg), colour = "lightblue", lty = 2 )
g
## Warning: Removed 1 rows containing missing values (geom_abline).
## Warning: Removed 1 rows containing missing values (geom_abline).
Finding: The marginal difference between MPG[am=1] and MPG[am=0] is independent from hp. There is not interaction between hp and am.
Based on the results derived above the model is not further adjusted. \[ E[Y_{i}| am=1]=\beta_{0}+\beta_{1}hp+\beta_{3}am+(\beta_{2}+\beta_{4)}\times wt \\ E[Y_{i}| am=0]=\beta_{0}+\beta_{1}hp+\beta_{2}wt\\ \]
fit<-lm(mpg ~ wt * factor(am) + hp, data = mtcars)
summary(fit)
##
## Call:
## lm(formula = mpg ~ wt * factor(am) + hp, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.0639 -1.3315 -0.9347 1.2180 5.0822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.947333 2.723411 11.363 8.55e-12 ***
## wt -2.515586 0.844497 -2.979 0.00605 **
## factor(am)Manual 11.554813 4.023277 2.872 0.00784 **
## hp -0.026949 0.009796 -2.751 0.01048 *
## wt:factor(am)Manual -3.577910 1.442796 -2.480 0.01968 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.332 on 27 degrees of freedom
## Multiple R-squared: 0.8696, Adjusted R-squared: 0.8503
## F-statistic: 45.01 on 4 and 27 DF, p-value: 1.451e-11
Conclusion: For am = 0, wt = 0, hp = 0 the intercept 30.7 MPG. The slope indicates a decrease (not significant) of -1.86 MPG per 1000 lbs (am=0) holding all other variables constant. For am = 1, wt = 0, hp = 0, the intercept is 44.5 MPG. The slope indicates a (significant) of -5.77 MPG/1000 lbs holding all other variables constant.
data(mtcars); par(mfrow = c(2, 2))
fitint<-lm(mpg ~ wt*factor(am) + hp, data = mtcars); plot(fitint)
Patterns fitted vs. residual plot There are no patterns identifiable, which indicates a prober model fit.
Cresiduals normally distributed QQ plot The distribution of the residuals independent identical distributed, which indicates a prober model fit.
Outliers scale location vs standardized residuals The standard residual distribution shows no pattern, which indicates a prober model fit.
Residuals vs. levarage to see if specific points (cars) falsify the entire model results. No outliers are identified that would leverage and influence the MPG.
For comparing the mpg values of automatic and manual transmission I calculated MPG for a set of different weight and horse power values.
The table gives you the overview on the difference E[MPG] (expected value, with confidence interval) with a given weight, horse power (quantiles) and manual or automatic transmission. See column “diff25, diff50, diff75”. The E[mpg] above a specific weight and horse power is higher for an automatic tranmission.
## Q25%(1) Q25%(0) Q50%(1) Q50%(0) Q75%(1) Q75%(0) Dif25 Dif50 Dif75
## wt 2.58 2.58 3.22 3.22 3.61 3.61 0.00 0.00 0.00
## hp 96.50 96.50 146.69 146.69 180.00 180.00 0.00 0.00 0.00
## am 1.00 0.00 1.00 0.00 1.00 0.00 1.00 1.00 1.00
## E[MPG] 24.17 21.85 18.94 18.90 15.65 17.02 2.32 0.04 -1.36
## lwr 19.09 16.65 13.48 13.92 9.78 12.07 2.45 -0.44 -2.29
## upr 29.25 27.06 24.41 23.88 21.52 21.96 2.19 0.53 -0.44