Effects of different transmissions on MPG

author: liuyubobobo
date: Sunday, April 26, 2015

Synopsis

Load Data and Exploratory Analysis

First of all, we need to load the mtcars data.

library(datasets)
data(mtcars)

Then, we do some exploratory analysis using head , str functions and plot the variables’ relationship using pairs function. The result plot is shown on Appendix, Figure 1.

head(mtcars)
str(mtcars)
pairs(mtcars, panel=panel.smooth, main="pairs plot for mtcars data set", col=4+(mtcars$am==0))

Regression models

Create a regression model using lm function for only mpg and am,which we are interested in. We call this regression model fit0

fit0 <- lm(mpg ~ factor(am), data=mtcars)

Then, create mutiple models which are added a second regressor to see wheter other variables effect the result. Let’s call these models from fit1 to fit9

fit1 <- lm(mpg ~ factor(am)+factor(cyl), data=mtcars); 
fit2 <- lm(mpg ~ factor(am)+disp, data=mtcars)
fit3 <- lm(mpg ~ factor(am)+hp, data=mtcars); 
fit4 <- lm(mpg ~ factor(am)+drat, data=mtcars)
fit5 <- lm(mpg ~ factor(am)+wt, data=mtcars); 
fit6 <- lm(mpg ~ factor(am)+qsec, data=mtcars)
fit7 <- lm(mpg ~ factor(am)+factor(vs), data=mtcars)
fit8 <- lm(mpg ~ factor(am)+factor(gear), data=mtcars)
fit9 <- lm(mpg ~ factor(am)+factor(carb), data=mtcars)

Then we can use summary to see the different coefficients of am in each model, as follows:

##     Model 0     Model 1     Model 2     Model 3     Model 4     Model 5 
##  7.24493927  2.55995370  1.83345825  5.27708531  2.80706095 -0.02361522 
##     Model 6     Model 7     Model 8     Model 9 
##  8.87633094  6.06666667  5.22500000  7.08988506

From the above results, we can see that the coefficients of am in fit5 model are totally different from fit0, the sign of the coefficent is changed. Besides, disp, cyl and drat also make the cofficient of am changed a lot, which means these variables may be included in the best model. We can use nest models and nested likelihood ratio tests to determine which model should be selected.

fit10 <- update(fit0, mpg ~ factor(am)+wt)
fit11 <- update(fit0, mpg ~ factor(am)+wt+disp)
fit12 <- update(fit0, mpg ~ factor(am)+wt+disp+factor(cyl))
fit13 <- update(fit0, mpg ~ factor(am)+wt+disp+factor(cyl)+drat)
anova(fit10,fit11,fit12,fit13)
## Analysis of Variance Table
## 
## Model 1: mpg ~ factor(am) + wt
## Model 2: mpg ~ factor(am) + wt + disp
## Model 3: mpg ~ factor(am) + wt + disp + factor(cyl)
## Model 4: mpg ~ factor(am) + wt + disp + factor(cyl) + drat
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     29 278.32                              
## 2     28 246.56  1    31.763 4.3470 0.04744 *
## 3     26 182.87  2    63.687 4.3579 0.02379 *
## 4     25 182.68  1     0.195 0.0266 0.87166  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The above result shows that ther’s a significant difference between fit11 and fit12. fit12 is selected for this research.

we examine our model by doing residual plots, which is shown on Appendix, Figure 2, and regression diagnostics. The results tell us our model is good enough for this research.

par(mfrow = c(2,2))
plot(fit12)
hatvalues(fit12)
dfbetas(fit12)

Conclusion

We use summary on our final selected model fit12 for analysis.

summary(fit12)

The detail is shown on Appendix, Figure3, From the result, We can clearly see the following coclusions:
1. the mpg for Manual transmission cars is larger than the Automatic transmission cars;
2. the mpg difference between automatic transmission cars and manual transmission cars is not significant;
3. 80.64% of total variation can be described by our model.

Appendix

Figure 1 shows the variables’ relashionship using pais function.

Figure 2 shows the residual plots of our final selected model fit12

Figure 3 shows the summary of our final selected model fit12

## 
## Call:
## lm(formula = mpg ~ factor(am) + wt + disp + factor(cyl), data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5029 -1.2829 -0.4825  1.4954  5.7889 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  33.816067   2.914272  11.604 8.79e-12 ***
## factor(am)1   0.141212   1.326751   0.106  0.91605    
## wt           -3.249176   1.249098  -2.601  0.01513 *  
## disp          0.001632   0.013757   0.119  0.90647    
## factor(cyl)6 -4.304782   1.492355  -2.885  0.00777 ** 
## factor(cyl)8 -6.318406   2.647658  -2.386  0.02458 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.652 on 26 degrees of freedom
## Multiple R-squared:  0.8376, Adjusted R-squared:  0.8064 
## F-statistic: 26.82 on 5 and 26 DF,  p-value: 1.73e-09