Motor Trend is interested in exploring the relationship between a set of variables and miles per gallon (MPG). They are particularly interested in the following two questions:
- “Is an automatic or manual transmission better for MPG”
- “Quantify the MPG difference between automatic and manual transmissions”
This analysis uses mtcars dataset to explore these relationships by using statitical inference first and then building a model using Linear Regression and interpreting the results.
The analysis shows that when we create a linear model between mpg and am, on an average manual transmission scores over automatic transmission by about 7.24494 mpg. However this model does not cover decent variability. The best model (found using stepwise regression) shows that mpg depends on wt, qsec and am. In presence of qsec and am, the expected difference in mpg drops to 2.9358.
data(mtcars)
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Let’s convert am into a categorical variable. 0 = AT(Automatic Transmission), 1 = MT(Manual Transmission)
input<-mtcars
input$am <- as.factor(input$am)
levels(input$am) <-c("AT", "MT")
Boxplot suggests that manual transmission is better than automatic transmission for miles per Gallon. Let’s try to prove it using t-test.
set.seed(12345)
t.test(input$mpg~input$am)
##
## Welch Two Sample t-test
##
## data: input$mpg by input$am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group AT mean in group MT
## 17.14737 24.39231
Mean in group AT is 17.14737 while mean in group MT is 24.39231. The p-value is 0.001374 and thus the result is significant.
The null hypothesis can be rejected and we can say that manual transmission is better than automatic transmission for miles per Gallon.
Let’s build a regression model between mpg as outcome and am as predictor
fit<-lm(mpg~am,data=input)
summary(fit)
##
## Call:
## lm(formula = mpg ~ am, data = input)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amMT 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Let’s try to build a model using mpg as outcome and all remaining variables as predictors.
fit_all<-lm(mpg~.,data=input)
Let’s find the best model using stepwise Regression which choses the best model by AIC. We will use the fit_all model and then go backwards to the fitest model.
fit_best<-step(fit_all,direction = "backward",trace = 0)
summary(fit_best)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = input)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amMT 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
mpg ~ wt + qsec + am
hist(mtcars$mpg,breaks = 10,xlab="MPG")
library(ggplot2)
library(caret)
ggplot(input, aes(x=am, y=mpg)) + geom_boxplot()
pairs(mpg ~ ., data = mtcars)
par(mfrow = c(2,2))
plot(fit_best)