For this project, we are looking at a data set of a collection of cars (mtcars), and we want to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). The two key questions that we want to answer are:
Is an automatic or manual transmission better for MPG
Quantify the MPG difference between automatic and manual transmissions
Before doing any deep analysis, let’s take a quick look into the data.
What we are trying to analyze is how Transmission predicts MPG.
The variables we want to keep a close eye on are “am” (for Transmission– 0 = automatic, 1 = manual) and “mpg”.
We’ll do a linear regression for only Transmission:
fit<- lm(mpg~am, data=mydata)
coef(fit)
## (Intercept) am
## 17.147368 7.244939
The results show that when you have an automatic transmission you have an average of 17.15 mpg and when you go for a manual transmission, you have in average 7.24 mpg more. This initial analysis ignores the rest of the variables.
Before moving forward, let’s understand what is the correlation of am & mpg compared to the rest of the variables. Appendix 1 shows data in %’s and indicates that Transmission and MPG have a correlation of only 60% vs variables like Weight (-87%), Cyl (-85%) and Displacement (-85%).
We’ll turn the applicable variables into factors and run another regression taking all the variables into consideration for the analysis,that way we undertand better the impact of each in MPG.
Based on the quick exploratory analysis done we have discovered other variables that have a higher correlation with MPG and now we need to define what would be the best model that would fit our analysis.
bestMod<- step(allvar,direction="both", trace=0)
summary(bestMod)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mydata)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## am1 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
Based on these results, we can identify that the best model is mpg ~ cyl + hp + wt + am. Adjusted R^2 is .84, which means that 84% of the variability is explained with this model. Now, let’s compare both models, using only “am” and using “cyl”, “hp”, “wt” and “am”.
anova(bestMod, fit)
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + hp + wt + am
## Model 2: mpg ~ am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 151.03
## 2 30 720.90 -4 -569.87 24.527 1.688e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The p-value demostrates the significance of the model. We reject the null hypotheis of the variables “cyl”, “hp” and “wt” not contributing to the model.
Please refer to Appendix 8 for viewing the residual plots. From these plots we can see that:
There is an independence of conditio
Residuals are normally distributed
There is a constant variance
If we refer to Appendix 4 we can see that mpg has a normal distribution. The t.test shows the significance of the difference between manual vs automatic transmission.
t<-t.test(mpg~am,data=mydata)
t$p.value
## [1] 0.001373638
hist(mydata$mpg, freq = FALSE, breaks = 15)
Going back to the original questions that we wanted to get resolved we conclude the following:
Is an automatic or manual transmission better for MPG? Manual transmission is better for MPG.
Quantify the MPG difference between automatic and manual transmissions: mpg will increase 1.809 in cars with Manual transmission compared to automatic tranmission.
## Estimate Std. Error t value Pr(>|t|)
## am 24.39231 3.956183 6.165616 7.666189e-07
## Mazda RX4 Wag Chrysler Imperial Toyota Corona
## 0.2496110 0.2611168 0.2777872
## Lincoln Continental Maserati Bora
## 0.2936819 0.4713671