library(ggplot2)
This report observe influence of transmission type (automatic/manual) on miles per gallon for 32 cars. Data was obtained from standard dataset mtcars. Exploratory analysis shows that manual transmission cars typically have higher mileage per gallon (MPG) than automatic ones. Linear model with adjusted R squared value equal 88% was obtained. The difference in MPG between manual and automatic transmission is 14.08 - 2.93*weight considering this model. Analysis of Cook’s distances showed no highly influential data points.
First, here’s dimension of data set and its first 4 lines.
## [1] 32 11
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
Second, we may estimate average mpg for both types of transmission, automatic and manual respectively.
## average mpg for automatic and manual transmission
round(c(mean(mtcars$mpg[mtcars$am == 0]), mean(mtcars$mpg[mtcars$am == 1])), 2)
## [1] 17.15 24.39
## 17.15 24.39
Also, Fig. 1 in Appendix visualizes average mpg and its quantiles. There is also Fig. 2 in Appendix that shows linear regression mpg~hp+am for both transmissions too. It’s easy to see that slopes are pretty equal for both types and bias between two regression lines caused by am term. All these things give an idea of strong dependence of miles per gallon and transmission type.
Let’s look at linear regression using all features:
coef(summary(lm(mpg ~ ., data = mtcars)))
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
There are no features show at least 5% significance. We’ll follow advice to use features with modulus greater than 1, wt, am and qsec. It’s obvious to use weight and transmission type. So intuition to use qsec is that: we would like to use horse powers, type of engine, number of cylinders and displacement as one feature. And qsec is ideal candidate for this as resulting parameter of combining all other features.
That strategy leads to 2 models: mpg~qsec+wt+am and mpg~qsec+wt*am. After comparing adjusted R squared values 83% vs 88% we chose model with product of weight and transmission type, that describes 88% of variance of mpg.
fit_alt <- lm(mpg ~ qsec + wt + am, data = mtcars); # summary(fit_alt)
fit <- lm(mpg ~ qsec + wt*am, data = mtcars); # summary(fit)
round(c(summary(fit_alt)$adj.r.squared, summary(fit)$adj.r.squared) , 2)
## [1] 0.83 0.88
coef(fit)
## (Intercept) qsec wt am wt:am
## 9.723053 1.016974 -2.936531 14.079428 -4.141376
Coefficients means that every second of qsec gives additional mile per gallon. While qsec stay constant manual transmission changed to 14.08 - 2.93*weight. Interesting, there’s some weight0 that cars with manual and automatic transmission have equal mpg. We may find it solving equation wt*weight0 + wt:am*weight0 + am = 0. That gives weight0 = 1.989 lbs. Heavier cars with manual transmission have smaller mpg than automatic (with equal qsec).
Top 2 plots in Fig. 3 show that residuals distributed normally (Normal Q-Q) and fitted well. Also we find that maximal Cook distance of the fit is:
max(cooks.distance(fit))
## [1] 0.225106
As shown on Fig. 4. all values of Cook’s distance are quite similar, so we haven’t highly influential points.
ggplot(mtcars, aes(x = am, y = mpg, colour = factor(am))) +
geom_boxplot() +
#labs(title = "Fig.1: boxplot of average mpg by transmission type") +
scale_colour_discrete(labels = c("automatic", "manual"),
name = "Transmission type")
ggplot(mtcars) +
geom_jitter(aes(hp, mpg, colour = factor(am)), size = 3, alpha = .7) +
geom_smooth(aes(hp, mpg, colour = factor(am)), method = lm) +
#labs(title = "Fig.2: Regression lines of mpg~hp by transmission type")+
scale_colour_discrete(labels = c("automatic", "manual"),
name = "Transmission type")
par(mfrow=c(2, 2))
plot(fit)
plot(cooks.distance(fit))