** Models Course Project**
In our regression models crouse project, I’m interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome).I will use linear regression method to investigate the mtcars datasets in R and answering following questions.
*Is an automatic or manual transmission better for MPG
*Quantify the MPG difference between automatic and manual transmissions
Explotary data analysis
Blue means positive relation, red means negative relation. We can easily find out that mpg variable is positively related with gear, am, drat, vs, qsec variables and negatively related with wt, disp, cyl, hp, carb variables.
Model building and selection
Wt, qsec and am are unarguably our best predictors, this model’s adjusted R2 value is 0.83. We can conclude that about 83% of the variability is explained by the model.
Next, we need to build a model with only mpg and am variable, we compared this model and the model which we obtained earlier to see if there is significant difference between these two models.
model3<-lm(mpg~ am, data = mtcars)
anova(model2,model3)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt + qsec + am
## Model 2: mpg ~ am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169.29
## 2 30 720.90 -2 -551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p value is less than 5%, we would like to reject the null hypothesis that wt, qsec variables don’t contribute to the accuracy of the model.
Residuals and diagnostics
We choose the model with largest R square to explore the residuals and diagnostics.
par(mfrow=c(2,2))
plot(model2)

From the residual vs leverage plot, we can find that there is plenty of outlier or influence point, like fiat 128, Chrysler imperial. These cars seem have huge infuluence to the model, they may affect the accuracy of the model.
Conclusion
We need to use t.test to find out if there is difference between automatic and manual transmissions.
t.test(mpg~am,data=mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Since the p value is 0.001374, we draw a conclusion that manual and automatic transmissions are significatively different.
mtcars$am <- factor(mtcars$am,labels=c('Automatic','Manual'))
library(ggplot2)
ggplot(aes(am,mpg,color=am),data=mtcars)+geom_boxplot()

Boxplots shows that cars with manual transmission get more miles per gallon compared to cars with automatic transmission on average.
summary(model3)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
The coefficients show that automatic transmissions achieve 17.147 miles per gallon fuel economy on average, and that manual transmission cars achieve 17.147 + 7.245 = 24.39 miles per gallon fuel economy on average.
Appendix
pairs(mtcars)
