In our regression models crouse project, I’m interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome).I will use linear regression method to investigate the mtcars datasets in R and answering following questions.
*Is an automatic or manual transmission better for MPG
*Quantify the MPG difference between automatic and manual transmissions
Let’s see if each variable correlated with each other.
data(mtcars)
library(corrgram)
corrgram(mtcars,order=T,lower.panel=panel.shade,upper.panel=panel.pie,text.panel=panel.txt,main="correrogram of mtcars")
Blue means positive relation, red means negative relation. We can easily find out that mpg variable is positively related with gear, am, drat, vs, qsec variables and negatively related with wt, disp, cyl, hp, carb variables.
Since we already found out that response variable mpg is correlated with other variables, we want to use linear regression technique to explore the relation with mpg and other variables. We build an model with all the variables as predictors, then select the best variables using backward elimination method by AIC algorithm.
library(MASS)
model1<-lm(mpg~.,data=mtcars)
stepAIC(model1,direction="backward")
model2<-lm(mpg~wt + qsec + am, data = mtcars)
summary(model2)
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.481 -1.556 -0.726 1.411 4.661
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.618 6.960 1.38 0.17792
## wt -3.917 0.711 -5.51 7e-06 ***
## qsec 1.226 0.289 4.25 0.00022 ***
## am 2.936 1.411 2.08 0.04672 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.46 on 28 degrees of freedom
## Multiple R-squared: 0.85, Adjusted R-squared: 0.834
## F-statistic: 52.7 on 3 and 28 DF, p-value: 1.21e-11
Wt, qsec and am are unarguably our best predictors, this model’s adjusted R2 value is 0.83. We can conclude that about 83% of the variability is explained by the model.
Next, we need to build a model with only mpg and am variable, we compared this model and the model which we obtained earlier to see if there is significant difference between these two models.
model3<-lm(mpg~ am, data = mtcars)
anova(model2,model3)
## Analysis of Variance Table
##
## Model 1: mpg ~ wt + qsec + am
## Model 2: mpg ~ am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 28 169
## 2 30 721 -2 -552 45.6 1.6e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Since the p value is less than 5%, we would like to reject the null hypothesis that wt, qsec variables don’t contribute to the accuracy of the model.
We choose the model with largest R square to explore the residuals and diagnostics.
par(mfrow=c(2,2))
plot(model2)
From the residual vs leverage plot, we can find that there is plenty of outlier or influence point, like fiat 128, Chrysler imperial. These cars seem have huge infuluence to the model, they may affect the accuracy of the model.
We need to use t.test to find out if there is difference between automatic and manual transmissions.
t.test(mpg~am,data=mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.767, df = 18.33, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.28 -3.21
## sample estimates:
## mean in group 0 mean in group 1
## 17.15 24.39
Since the p value is 0.001374, we draw a conclusion that manual and automatic transmissions are significatively different.
mtcars$am <- factor(mtcars$am,labels=c('Automatic','Manual'))
library(ggplot2)
ggplot(aes(am,mpg,color=am),data=mtcars)+geom_boxplot()
Boxplots shows that cars with manual transmission get more miles per gallon compared to cars with automatic transmission on average.
summary(model3)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.392 -3.092 -0.297 3.244 9.508
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.15 1.12 15.25 1.1e-15 ***
## am 7.24 1.76 4.11 0.00029 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.9 on 30 degrees of freedom
## Multiple R-squared: 0.36, Adjusted R-squared: 0.338
## F-statistic: 16.9 on 1 and 30 DF, p-value: 0.000285
The coefficients show that automatic transmissions achieve 17.147 miles per gallon fuel economy on average, and that manual transmission cars achieve 17.147 + 7.245 = 24.39 miles per gallon fuel economy on average.
pairs(mtcars)