Supposing I work for Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, they are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). They are particularly interested in the following two questions:

1. Loading Data

We load the dataset

data(mtcars)
head(mtcars)

Motor Trend Car Road Tests

Description

The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models).

Format

A data frame with 32 observations on 11 (numeric) variables.

[, 1] mpg Miles/(US) gallon
[, 2] cyl Number of cylinders
[, 3] disp Displacement (cu.in.)
[, 4] hp Gross horsepower
[, 5] drat Rear axle ratio
[, 6] wt Weight (1000 lbs)
[, 7] qsec 1/4 mile time
[, 8] vs Engine (0 = V-shaped, 1 = straight)
[, 9] am Transmission (0 = automatic, 1 = manual)
[,10] gear Number of forward gears
[,11] carb Number of carburetors

For convenience we can convert the variable "am" to a factor and add a more clear classification "Automatic" & "Manual". and we can perform a friefly analysis of both variables

mtcars$am = as.factor(mtcars$am)
levels(mtcars$am) = c("Automatic", "Manual")
summary(mtcars$mpg); summary(mtcars$am)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  10.40   15.43   19.20   20.09   22.80   33.90 
Automatic    Manual 
       19        13 

2.EDA

scatterplotMatrix(~mpg+disp+drat+wt+hp|am, data=mtcars, col = c("skyblue4", "indianred4"), main="Type of Transmission")

boxplot(mpg~am,data = mtcars,xlab = "Transmission Automatic,Manual", ylab = "MPG", main="MPG by Transmission Type", col=c("skyblue4", "indianred4"))

3. t-test

hist(mtcars$mpg, breaks=10, xlab="MPG", main="MPG histogram", col = "skyblue3")

plot(density(mtcars$mpg), main="kernel density", xlab="MPG", col="lightpink4" )

ggpairs(mtcars,lower=list(continuous="smooth"))

Interpretation : In this plot, we see many multi-collinearity, and it suggests that we should NOT use all the variables as predictor otherwise it will be overfitting.

4.Quantify the MPG difference between automatic and manual transmissions

Consider all the other varaibles as possible predictor and MPG as outcome. Use R step function to find out the best fit model

First, Glimpse at all relationship between each variable

Finding best model

best_model<-step(lm(mpg ~ .,data = mtcars), trace=0)
summary(blm)

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4811 -1.5555 -0.7257  1.4110  4.6610 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6178     6.9596   1.382 0.177915    
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
amManual      2.9358     1.4109   2.081 0.046716 *  
---
Signif. codes:  
0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11
#par(mfrow=c(2,2))
plot(best_model)

We can conclude that the best model are with wt/qsec/am as predictor and the R-square is 84.97%, which is good fitting to mpg outcome.The mpg of manual cars is 2.9358 mpg better than that of automatic cars.

LS0tDQp0aXRsZTogIlJlZ3Jlc3Npb24gTW9kZWxzIENvdXJzZSBQcm9qZWN0Ig0Kb3V0cHV0Og0KICB3b3JkX2RvY3VtZW50OiBkZWZhdWx0DQogIHBkZl9kb2N1bWVudDogZGVmYXVsdA0KICBodG1sX25vdGVib29rOiBkZWZhdWx0DQotLS0NCg0KU3VwcG9zaW5nIEkgd29yayBmb3IgX01vdG9yIFRyZW5kXywgDQphIG1hZ2F6aW5lIGFib3V0IHRoZSBhdXRvbW9iaWxlIGluZHVzdHJ5LiBMb29raW5nIGF0IGEgZGF0YSBzZXQgb2YgYSBjb2xsZWN0aW9uIG9mIGNhcnMsIHRoZXkgYXJlIGludGVyZXN0ZWQgaW4gZXhwbG9yaW5nIHRoZSByZWxhdGlvbnNoaXAgYmV0d2VlbiBhIHNldCBvZiB2YXJpYWJsZXMgYW5kIG1pbGVzIHBlciBnYWxsb24gKE1QRykgKG91dGNvbWUpLiBUaGV5IGFyZSBwYXJ0aWN1bGFybHkgaW50ZXJlc3RlZCBpbiB0aGUgZm9sbG93aW5nIHR3byBxdWVzdGlvbnM6ICANCg0KLSAqKiJJcyBhbiBhdXRvbWF0aWMgb3IgbWFudWFsIHRyYW5zbWlzc2lvbiBiZXR0ZXIgZm9yIE1QRyIqKg0KLSAqKiJRdWFudGlmeSB0aGUgTVBHIGRpZmZlcmVuY2UgYmV0d2VlbiBhdXRvbWF0aWMgYW5kIG1hbnVhbCB0cmFuc21pc3Npb25zIioqICANCmBgYHtyLCBlY2hvPUZBTFNFfQ0KbGlicmFyeShrbml0cikNCmBgYA0KDQoNCiMjICoqMS4gTG9hZGluZyBEYXRhKioNCldlIGxvYWQgdGhlIGRhdGFzZXQNCmBgYHtyfQ0KZGF0YShtdGNhcnMpDQpoZWFkKG10Y2FycykNCmBgYA0KDQpNb3RvciBUcmVuZCBDYXIgUm9hZCBUZXN0cw0KDQpfRGVzY3JpcHRpb25fIA0KDQpUaGUgZGF0YSB3YXMgZXh0cmFjdGVkIGZyb20gdGhlIDE5NzQgTW90b3IgVHJlbmQgVVMgbWFnYXppbmUsIGFuZCBjb21wcmlzZXMgZnVlbCBjb25zdW1wdGlvbiBhbmQgMTAgYXNwZWN0cyBvZiBhdXRvbW9iaWxlIGRlc2lnbiBhbmQgcGVyZm9ybWFuY2UgZm9yIDMyIGF1dG9tb2JpbGVzICgxOTczLTc0IG1vZGVscykuDQoNCl9Gb3JtYXRfDQoNCkEgZGF0YSBmcmFtZSB3aXRoIDMyIG9ic2VydmF0aW9ucyBvbiAxMSAobnVtZXJpYykgdmFyaWFibGVzLg0KDQp8ICAgICAgfCAgICAgfCAgICAgICB8IA0KfC0tLS0tLXwtLS0tLXwtLS0tLS0tfA0KfFssIDFdfCBtcGcJIHxNaWxlcy8oVVMpIGdhbGxvbnwNCnxbLCAyXXwJIGN5bHwJIE51bWJlciBvZiBjeWxpbmRlcnN8DQp8WywgM118CSBkaXNwfAkgRGlzcGxhY2VtZW50IChjdS5pbi4pfA0KfFssIDRdfAkgaHAJIHxHcm9zcyBob3JzZXBvd2VyfA0KfFssIDVdfAkgZHJhdHwJIFJlYXIgYXhsZSByYXRpb3wNCnxbLCA2XXwJIHd0CSB8V2VpZ2h0ICgxMDAwIGxicyl8DQp8WywgN118CSBxc2VjfAkgMS80IG1pbGUgdGltZXwNCnxbLCA4XXwJIHZzCSB8RW5naW5lICgwID0gVi1zaGFwZWQsIDEgPSBzdHJhaWdodCl8DQp8WywgOV18CSBhbQkgfFRyYW5zbWlzc2lvbiAoMCA9IGF1dG9tYXRpYywgMSA9IG1hbnVhbCl8DQp8WywxMF18CSBnZWFyfAkgTnVtYmVyIG9mIGZvcndhcmQgZ2VhcnN8DQp8WywxMV18CSBjYXJifAkgTnVtYmVyIG9mIGNhcmJ1cmV0b3JzfA0KDQpgYGB7ciwgZWNobz1GQUxTRX0NCiMjIGxvYWRpbmcgbGlicmFyaWVzDQpsaWJyYXJ5KGNhcikgDQpsaWJyYXJ5KEdHYWxseSkNCmxpYnJhcnkoZ2dwbG90MikNCmBgYA0KDQpGb3IgY29udmVuaWVuY2Ugd2UgY2FuIGNvbnZlcnQgdGhlIHZhcmlhYmxlICJhbSIgdG8gYSBmYWN0b3IgYW5kIGFkZCBhIG1vcmUgY2xlYXIgY2xhc3NpZmljYXRpb24gIkF1dG9tYXRpYyIgJiAiTWFudWFsIi4gIGFuZCB3ZSBjYW4gcGVyZm9ybSBhIGZyaWVmbHkgYW5hbHlzaXMgb2YgYm90aCB2YXJpYWJsZXMNCmBgYHtyfQ0KbXRjYXJzJGFtID0gYXMuZmFjdG9yKG10Y2FycyRhbSkNCmxldmVscyhtdGNhcnMkYW0pID0gYygiQXV0b21hdGljIiwgIk1hbnVhbCIpDQpzdW1tYXJ5KG10Y2FycyRtcGcpOyBzdW1tYXJ5KG10Y2FycyRhbSkNCmBgYA0KIyAqKjIuRURBKioNCmBgYHtyfQ0Kc2NhdHRlcnBsb3RNYXRyaXgofm1wZytkaXNwK2RyYXQrd3QraHB8YW0sIGRhdGE9bXRjYXJzLCBjb2wgPSBjKCJza3libHVlNCIsICJpbmRpYW5yZWQ0IiksIG1haW49IlR5cGUgb2YgVHJhbnNtaXNzaW9uIikNCmBgYA0KDQpgYGB7cn0NCmJveHBsb3QobXBnfmFtLGRhdGEgPSBtdGNhcnMseGxhYiA9ICJUcmFuc21pc3Npb24gQXV0b21hdGljLE1hbnVhbCIsIHlsYWIgPSAiTVBHIiwgbWFpbj0iTVBHIGJ5IFRyYW5zbWlzc2lvbiBUeXBlIiwgY29sPWMoInNreWJsdWU0IiwgImluZGlhbnJlZDQiKSkNCmBgYA0KIyoqMy4gdC10ZXN0KioNCmBgYHtyfQ0KaGlzdChtdGNhcnMkbXBnLCBicmVha3M9MTAsIHhsYWI9Ik1QRyIsIG1haW49Ik1QRyBoaXN0b2dyYW0iLCBjb2wgPSAic2t5Ymx1ZTMiKQ0KDQpgYGANCmBgYHtyfQ0KcGxvdChkZW5zaXR5KG10Y2FycyRtcGcpLCBtYWluPSJrZXJuZWwgZGVuc2l0eSIsIHhsYWI9Ik1QRyIsIGNvbD0ibGlnaHRwaW5rNCIgKQ0KYGBgDQoNCg0KYGBge3J9DQpnZ3BhaXJzKG10Y2Fycyxsb3dlcj1saXN0KGNvbnRpbnVvdXM9InNtb290aCIpKQ0KYGBgDQpJbnRlcnByZXRhdGlvbiA6IEluIHRoaXMgcGxvdCwgd2Ugc2VlIG1hbnkgbXVsdGktY29sbGluZWFyaXR5LCBhbmQgaXQgc3VnZ2VzdHMgdGhhdCB3ZSBzaG91bGQgTk9UIHVzZSBhbGwgdGhlIHZhcmlhYmxlcyBhcyBwcmVkaWN0b3Igb3RoZXJ3aXNlIGl0IHdpbGwgYmUgb3ZlcmZpdHRpbmcuDQoNCiMgKio0LlF1YW50aWZ5IHRoZSBNUEcgZGlmZmVyZW5jZSBiZXR3ZWVuIGF1dG9tYXRpYyBhbmQgbWFudWFsIHRyYW5zbWlzc2lvbnMqKg0KDQpDb25zaWRlciBhbGwgdGhlIG90aGVyIHZhcmFpYmxlcyBhcyBwb3NzaWJsZSBwcmVkaWN0b3IgYW5kIE1QRyBhcyBvdXRjb21lLiBVc2UgUiBzdGVwIGZ1bmN0aW9uIHRvIGZpbmQgb3V0IHRoZSBiZXN0IGZpdCBtb2RlbA0KDQpGaXJzdCwgR2xpbXBzZSBhdCBhbGwgcmVsYXRpb25zaGlwIGJldHdlZW4gZWFjaCB2YXJpYWJsZQ0KDQoqKkZpbmRpbmcgYmVzdCBtb2RlbCoqDQpgYGB7cn0NCmJlc3RfbW9kZWw8LXN0ZXAobG0obXBnIH4gLixkYXRhID0gbXRjYXJzKSwgdHJhY2U9MCkNCnN1bW1hcnkoYmxtKQ0KYGBgDQoNCmBgYHtyfQ0KI3BhcihtZnJvdz1jKDIsMikpDQpwbG90KGJlc3RfbW9kZWwpDQpgYGANCldlIGNhbiBjb25jbHVkZSB0aGF0IHRoZSBiZXN0IG1vZGVsIGFyZSB3aXRoIHd0L3FzZWMvYW0gYXMgcHJlZGljdG9yIGFuZCB0aGUgUi1zcXVhcmUgaXMgODQuOTclLCB3aGljaCBpcyBnb29kIGZpdHRpbmcgdG8gbXBnIG91dGNvbWUuVGhlIG1wZyBvZiBtYW51YWwgY2FycyBpcyAyLjkzNTggbXBnIGJldHRlciB0aGFuIHRoYXQgb2YgYXV0b21hdGljIGNhcnMuIA0K