Looking at a data set of a collection of cars, we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). We are particularly interested in the following two questions:
1.“Is an automatic or manual transmission better for MPG” 2.“Quantifying the MPG difference between automatic and manual transmissions”
Let us load the data and find out it’s different aspects.
data(mtcars)
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
g<-ggplot(data=mtcars,aes(x=am,y=mpg))
g<-g+geom_point(aes(fill=vs,col=vs))+labs(title="Miles Per Gallon vs Transmission Type")
##Figure In Appendix
Let’s see which variables correlate with Miles Pe Gallon(mpg).
##Pairs Plot In Appendix
cor(mtcars)[,1]
## mpg cyl disp hp drat wt
## 1.0000000 -0.8521620 -0.8475514 -0.7761684 0.6811719 -0.8676594
## qsec vs am gear carb
## 0.4186840 0.6640389 0.5998324 0.4802848 -0.5509251
We see the variables am,wt,qsec,hp,disp,drat,vs have high correlation with mpg.
Let us do a t test now to see if MPG depends on Automatic Transmission and Manual Transmission unequally.
Automatic<-mtcars$mpg[mtcars$am==0]
Manual<-mtcars$mpg[mtcars$am==1]
t.test(Automatic,Manual)
##
## Welch Two Sample t-test
##
## data: Automatic and Manual
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
Since the p-value is significant ,we reject the null hypothesis and claim that the true difference in means is not equal to 0.In other words,we infer that the effect caused by Automatic Transmission on MPG is significantly different than that caused by Manual Transmission.
In the first model,we fit all variables against the outcome mpg.None of the variables are significant and therfore we won’t go with this model even though adjusted r-squared is high.In 2nd Model,Adjusted R-squared has improved but a few variables have insignificant p-values.We keep trying different models. Now to 3rd Model,here we see a further improvement in Adjusted R-squared.Lastly we try another model including the interaction effect between MPG and Weight(wt).
fit1<-lm(mpg~.,mtcars)
fit2<-lm(mpg~factor(am)+wt+qsec+hp+disp+drat+vs,mtcars)
fit3<-lm(mpg~factor(am)+wt+qsec,data=mtcars)
fit4<-lm(mpg~factor(am)*wt+qsec,data=mtcars)
summary(fit4)
##
## Call:
## lm(formula = mpg ~ factor(am) * wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5076 -1.3801 -0.5588 1.0630 4.3684
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723 5.899 1.648 0.110893
## factor(am)1 14.079 3.435 4.099 0.000341 ***
## wt -2.937 0.666 -4.409 0.000149 ***
## qsec 1.017 0.252 4.035 0.000403 ***
## factor(am)1:wt -4.141 1.197 -3.460 0.001809 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.084 on 27 degrees of freedom
## Multiple R-squared: 0.8959, Adjusted R-squared: 0.8804
## F-statistic: 58.06 on 4 and 27 DF, p-value: 7.168e-13
Here the Adjusted R-squared is 0.8804 telling us that the model explains 88% of the variability in the data.Also the Residual standard error is 2.084 on 27 degrees of freesom which is the lowest among all models.We can conclude that this is the model with the best fit.
##Residual Plot In Appendix
Here we run some more diagnostics to look out for overfitting. Seems like there is none ,also the model shows no signs of heteroskedasticity. The result shows that when wt(weight-lb/1000) and qsec(1/4 mile time) remain constant,cars with manual transmission add 14.079+(-4.41)*wt more MPG(Miles Per Gallon) on average than cars with Automatic Transmission.That is,having same weight (2000 lbs) and same 1/4 mile time,a mcar with manual transmission will have 5.797 more MPG than a car with automatic transmission.