This analysis investigates a data set from Motor Trend, a magazine about the automobile industry. Looking at a data set of a collection of cars, Motor Trend (MT) we are interested in exploring the relationship between a set of variables and miles per gallon (MPG) (outcome). MT is particularly interested in the following two questions:
Is an automatic or manual transmission better for MPG
Quantify the MPG difference between automatic and manual transmissions
The approached used throughout this analysis is based on general linear models. More specifically, multiple linear regression will be used to determine the extent to which independent variables contribute to MPG. I will also determine if there is a statistically significant difference between automatic and manual transmissions with respect to MPG.
First, load the dataset and convert variables to factors.
data(mtcars)
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am,labels=c("Auto","Manual"))
Next, I will build an initial model that includes all predictor variables of MPG.
model <- lm(mpg ~ ., data=mtcars)
bestfit_model <- step(model, direction="both")
Here is a summary of the best model.
summary(bestfit_model)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## amManual 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
The adjusted r-squared value tells us that 84% of the variation in MPG is due to the variables in the above model. (cyl6, cyl8, hp, wt, and amManual) In other words, this model explaints about 84% of the variance of the MPG variable.
Next, we can compare this first model to a model that examines only MPG vs am (automatic vs. manual).
model2 <- lm(mpg ~ am, data=mtcars)
summary(model2)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
In this particular model, I looked at a simple linear regression using am as the predictor variable and mpg as the outcome variable. Essentially, I am examining the effect that the automatic vs. manual transmission has on mpg (miles per gallon).
The results indicate that the automatic vs. manual has a statistically significant effect on mpg. The model also explains that the transmission type (auto vs. manual) accounts for about 34% of the variability of MPG. This basically tells us that other factors contribute to MPG besides transmission type.
We can see that both models are statistically significant, but the best model above demonstrates that other factors aside from the transmission type are important factors in determining MPG.
a <- anova(bestfit_model, model2)
a
## Analysis of Variance Table
##
## Model 1: mpg ~ cyl + hp + wt + am
## Model 2: mpg ~ am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 26 151.03
## 2 30 720.90 -4 -569.87 24.527 1.688e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Based on these results, the p-value is highly significant with p < .001, indicating that the two models are statistically different.
par(mfrow = c(2,2))
plot(bestfit_model)
The Q-Q plot above shows that the data is approximately normal, and no multicollinearity exists based on examining the residuals being fairly randomly dispersed on the graph.
From the above analysis, we can conclude:
Thus, answers to the initial questions are as follows.
Yes, manual transmissions tend to get more MPG than automatical transmissions.
Manual transmissions get approximately 1.8MPG more than automatic transmissions, taking into consideration other factors such as # of cylinders, horsepower, and weight of the car.
boxplot(mpg ~ am, data=mtcars)
mdata <- mtcars[, c(2,4, 7, 9)] # MPG, WT, QSEC, and AM
par(mar=c(1,1,1,1))
pairs(mdata, panel = panel.smooth, col = 9 + mtcars$wt)