Supposing I work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:
- "Is an automatic or manual transmission better for MPG"
- "Quantify the MPG difference between automatic and manual transmissions"
1. Loading Data
We load the dataset
data(mtcars)
head(mtcars)
Motor Trend Car Road Tests
Description
The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).
Format
A data frame with 32 observations on 11 (numeric) variables.
| [, 1] |
mpg |
Miles/(US) gallon |
| [, 2] |
cyl |
Number of cylinders |
| [, 3] |
disp |
Displacement (cu.in.) |
| [, 4] |
hp |
Gross horsepower |
| [, 5] |
drat |
Rear axle ratio |
| [, 6] |
wt |
Weight (1000 lbs) |
| [, 7] |
qsec |
1/4 mile time |
| [, 8] |
vs |
Engine (0 = V-shaped, 1 = straight) |
| [, 9] |
am |
Transmission (0 = automatic, 1 = manual) |
| [,10] |
gear |
Number of forward gears |
| [,11] |
carb |
Number of carburetors |
For convenience we can convert the variable "am" to a factor and add a more clear classification "Automatic" & "Manual". and we can perform a friefly analysis of both variables
mtcars$am = as.factor(mtcars$am)
levels(mtcars$am) = c("Automatic", "Manual")
summary(mtcars$mpg); summary(mtcars$am)
Min. 1st Qu. Median Mean 3rd Qu. Max.
10.40 15.43 19.20 20.09 22.80 33.90
Automatic Manual
19 13
2.EDA
scatterplotMatrix(~mpg+disp+drat+wt+hp|am, data=mtcars, col = c("skyblue4", "indianred4"), main="Type of Transmission")

boxplot(mpg~am,data = mtcars,xlab = "Transmission Automatic,Manual", ylab = "MPG", main="MPG by Transmission Type", col=c("skyblue4", "indianred4"))

3. t-test
hist(mtcars$mpg, breaks=10, xlab="MPG", main="MPG histogram", col = "skyblue3")

plot(density(mtcars$mpg), main="kernel density", xlab="MPG", col="lightpink4" )

ggpairs(mtcars,lower=list(continuous="smooth"))

Interpretation : In this plot, we see many multi-collinearity, and it suggests that we should NOT use all the variables as predictor otherwise it will be overfitting.
4.Quantify the MPG difference between automatic and manual transmissions
Consider all the other varaibles as possible predictor and MPG as outcome. Use R step function to find out the best fit model
First, Glimpse at all relationship between each variable
Finding best model
best_model<-step(lm(mpg ~ .,data = mtcars), trace=0)
summary(blm)
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4811 -1.5555 -0.7257 1.4110 4.6610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6178 6.9596 1.382 0.177915
wt -3.9165 0.7112 -5.507 6.95e-06 ***
qsec 1.2259 0.2887 4.247 0.000216 ***
amManual 2.9358 1.4109 2.081 0.046716 *
---
Signif. codes:
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
#par(mfrow=c(2,2))
plot(best_model)




We can conclude that the best model are with wt/qsec/am as predictor and the R-square is 84.97%, which is good fitting to mpg outcome.The mpg of manual cars is 2.9358 mpg better than that of automatic cars.
LS0tDQp0aXRsZTogIlJlZ3Jlc3Npb24gTW9kZWxzIENvdXJzZSBQcm9qZWN0Ig0Kb3V0cHV0Og0KICB3b3JkX2RvY3VtZW50OiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KU3VwcG9zaW5nIEkgd29yayBmb3IgX01vdG9yIFRyZW5kXywgDQphIG1hZ2F6aW5lIGFib3V0IHRoZSBhdXRvbW9iaWxlIGluZHVzdHJ5LiBMb29raW5nIGF0IGEgZGF0YSBzZXQgb2YgYSBjb2xsZWN0aW9uIG9mIGNhcnMsIHRoZXkgYXJlIGludGVyZXN0ZWQgaW4gZXhwbG9yaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBhIHNldCBvZiB2YXJpYWJsZXMgYW5kIG1pbGVzIHBlciBnYWxsb24gKE1QRykgKG91dGNvbWUpLiBUaGV5IGFyZSBwYXJ0aWN1bGFybHkgaW50ZXJlc3RlZCBpbiB0aGUgZm9sbG93aW5nIHR3byBxdWVzdGlvbnM6ICANCg0KLSAqKiJJcyBhbiBhdXRvbWF0aWMgb3IgbWFudWFsIHRyYW5zbWlzc2lvbiBiZXR0ZXIgZm9yIE1QRyIqKg0KLSAqKiJRdWFudGlmeSB0aGUgTVBHIGRpZmZlcmVuY2UgYmV0d2VlbiBhdXRvbWF0aWMgYW5kIG1hbnVhbCB0cmFuc21pc3Npb25zIioqICANCmBgYHtyLCBlY2hvPUZBTFNFfQ0KbGlicmFyeShrbml0cikNCmBgYA0KDQoNCiMjICoqMS4gTG9hZGluZyBEYXRhKioNCldlIGxvYWQgdGhlIGRhdGFzZXQNCmBgYHtyfQ0KZGF0YShtdGNhcnMpDQpoZWFkKG10Y2FycykNCmBgYA0KDQpNb3RvciBUcmVuZCBDYXIgUm9hZCBUZXN0cw0KDQpfRGVzY3JpcHRpb25fIA0KDQpUaGUgZGF0YSB3YXMgZXh0cmFjdGVkIGZyb20gdGhlIDE5NzQgTW90b3IgVHJlbmQgVVMgbWFnYXppbmUsIGFuZCBjb21wcmlzZXMgZnVlbCBjb25zdW1wdGlvbiBhbmQgMTAgYXNwZWN0cyBvZiBhdXRvbW9iaWxlIGRlc2lnbiBhbmQgcGVyZm9ybWFuY2UgZm9yIDMyIGF1dG9tb2JpbGVzICgxOTczLTc0IG1vZGVscykuDQoNCl9Gb3JtYXRfDQoNCkEgZGF0YSBmcmFtZSB3aXRoIDMyIG9ic2VydmF0aW9ucyBvbiAxMSAobnVtZXJpYykgdmFyaWFibGVzLg0KDQp8ICAgICAgfCAgICAgfCAgICAgICB8IA0KfC0tLS0tLXwtLS0tLXwtLS0tLS0tfA0KfFssIDFdfCBtcGcJIHxNaWxlcy8oVVMpIGdhbGxvbnwNCnxbLCAyXXwJIGN5bHwJIE51bWJlciBvZiBjeWxpbmRlcnN8DQp8WywgM118CSBkaXNwfAkgRGlzcGxhY2VtZW50IChjdS5pbi4pfA0KfFssIDRdfAkgaHAJIHxHcm9zcyBob3JzZXBvd2VyfA0KfFssIDVdfAkgZHJhdHwJIFJlYXIgYXhsZSByYXRpb3wNCnxbLCA2XXwJIHd0CSB8V2VpZ2h0ICgxMDAwIGxicyl8DQp8WywgN118CSBxc2VjfAkgMS80IG1pbGUgdGltZXwNCnxbLCA4XXwJIHZzCSB8RW5naW5lICgwID0gVi1zaGFwZWQsIDEgPSBzdHJhaWdodCl8DQp8WywgOV18CSBhbQkgfFRyYW5zbWlzc2lvbiAoMCA9IGF1dG9tYXRpYywgMSA9IG1hbnVhbCl8DQp8WywxMF18CSBnZWFyfAkgTnVtYmVyIG9mIGZvcndhcmQgZ2VhcnN8DQp8WywxMV18CSBjYXJifAkgTnVtYmVyIG9mIGNhcmJ1cmV0b3JzfA0KDQpgYGB7ciwgZWNobz1GQUxTRX0NCiMjIGxvYWRpbmcgbGlicmFyaWVzDQpsaWJyYXJ5KGNhcikgDQpsaWJyYXJ5KEdHYWxseSkNCmxpYnJhcnkoZ2dwbG90MikNCmBgYA0KDQpGb3IgY29udmVuaWVuY2Ugd2UgY2FuIGNvbnZlcnQgdGhlIHZhcmlhYmxlICJhbSIgdG8gYSBmYWN0b3IgYW5kIGFkZCBhIG1vcmUgY2xlYXIgY2xhc3NpZmljYXRpb24gIkF1dG9tYXRpYyIgJiAiTWFudWFsIi4gIGFuZCB3ZSBjYW4gcGVyZm9ybSBhIGZyaWVmbHkgYW5hbHlzaXMgb2YgYm90aCB2YXJpYWJsZXMNCmBgYHtyfQ0KbXRjYXJzJGFtID0gYXMuZmFjdG9yKG10Y2FycyRhbSkNCmxldmVscyhtdGNhcnMkYW0pID0gYygiQXV0b21hdGljIiwgIk1hbnVhbCIpDQpzdW1tYXJ5KG10Y2FycyRtcGcpOyBzdW1tYXJ5KG10Y2FycyRhbSkNCmBgYA0KIyAqKjIuRURBKioNCmBgYHtyfQ0Kc2NhdHRlcnBsb3RNYXRyaXgofm1wZytkaXNwK2RyYXQrd3QraHB8YW0sIGRhdGE9bXRjYXJzLCBjb2wgPSBjKCJza3libHVlNCIsICJpbmRpYW5yZWQ0IiksIG1haW49IlR5cGUgb2YgVHJhbnNtaXNzaW9uIikNCmBgYA0KDQpgYGB7cn0NCmJveHBsb3QobXBnfmFtLGRhdGEgPSBtdGNhcnMseGxhYiA9ICJUcmFuc21pc3Npb24gQXV0b21hdGljLE1hbnVhbCIsIHlsYWIgPSAiTVBHIiwgbWFpbj0iTVBHIGJ5IFRyYW5zbWlzc2lvbiBUeXBlIiwgY29sPWMoInNreWJsdWU0IiwgImluZGlhbnJlZDQiKSkNCmBgYA0KIyoqMy4gdC10ZXN0KioNCmBgYHtyfQ0KaGlzdChtdGNhcnMkbXBnLCBicmVha3M9MTAsIHhsYWI9Ik1QRyIsIG1haW49Ik1QRyBoaXN0b2dyYW0iLCBjb2wgPSAic2t5Ymx1ZTMiKQ0KDQpgYGANCmBgYHtyfQ0KcGxvdChkZW5zaXR5KG10Y2FycyRtcGcpLCBtYWluPSJrZXJuZWwgZGVuc2l0eSIsIHhsYWI9Ik1QRyIsIGNvbD0ibGlnaHRwaW5rNCIgKQ0KYGBgDQoNCg0KYGBge3J9DQpnZ3BhaXJzKG10Y2Fycyxsb3dlcj1saXN0KGNvbnRpbnVvdXM9InNtb290aCIpKQ0KYGBgDQpJbnRlcnByZXRhdGlvbiA6IEluIHRoaXMgcGxvdCwgd2Ugc2VlIG1hbnkgbXVsdGktY29sbGluZWFyaXR5LCBhbmQgaXQgc3VnZ2VzdHMgdGhhdCB3ZSBzaG91bGQgTk9UIHVzZSBhbGwgdGhlIHZhcmlhYmxlcyBhcyBwcmVkaWN0b3Igb3RoZXJ3aXNlIGl0IHdpbGwgYmUgb3ZlcmZpdHRpbmcuDQoNCiMgKio0LlF1YW50aWZ5IHRoZSBNUEcgZGlmZmVyZW5jZSBiZXR3ZWVuIGF1dG9tYXRpYyBhbmQgbWFudWFsIHRyYW5zbWlzc2lvbnMqKg0KDQpDb25zaWRlciBhbGwgdGhlIG90aGVyIHZhcmFpYmxlcyBhcyBwb3NzaWJsZSBwcmVkaWN0b3IgYW5kIE1QRyBhcyBvdXRjb21lLiBVc2UgUiBzdGVwIGZ1bmN0aW9uIHRvIGZpbmQgb3V0IHRoZSBiZXN0IGZpdCBtb2RlbA0KDQpGaXJzdCwgR2xpbXBzZSBhdCBhbGwgcmVsYXRpb25zaGlwIGJldHdlZW4gZWFjaCB2YXJpYWJsZQ0KDQoqKkZpbmRpbmcgYmVzdCBtb2RlbCoqDQpgYGB7cn0NCmJlc3RfbW9kZWw8LXN0ZXAobG0obXBnIH4gLixkYXRhID0gbXRjYXJzKSwgdHJhY2U9MCkNCnN1bW1hcnkoYmxtKQ0KYGBgDQoNCmBgYHtyfQ0KI3BhcihtZnJvdz1jKDIsMikpDQpwbG90KGJlc3RfbW9kZWwpDQpgYGANCldlIGNhbiBjb25jbHVkZSB0aGF0IHRoZSBiZXN0IG1vZGVsIGFyZSB3aXRoIHd0L3FzZWMvYW0gYXMgcHJlZGljdG9yIGFuZCB0aGUgUi1zcXVhcmUgaXMgODQuOTclLCB3aGljaCBpcyBnb29kIGZpdHRpbmcgdG8gbXBnIG91dGNvbWUuVGhlIG1wZyBvZiBtYW51YWwgY2FycyBpcyAyLjkzNTggbXBnIGJldHRlciB0aGFuIHRoYXQgb2YgYXV0b21hdGljIGNhcnMuIA0K