In this project we will analyze the variables that influence the MPG of cars in the market. We will explore the mtcars dataset using exploratory data analysis and use linear regression techniques to propose models of which variables best predict milage for the cars. In particular we will attempt to answer the following 2 questions:
- “Is an automatic or manual transmission better for MPG”
- “Quantify the MPG difference between automatic and manual transmissions”
First lets explore the mtcars dataset to get a feel for it and some potential relationships. Pls refer to Appendix 1 for the exploratory plots. We can see that that visually it seems that manual transmission cars have better mpg than automatic transmission cars but the difference seems to decrease as the number of cylinders in the car engine increases from 4 to 6 to 8 cylinders. Weight and horsepower seem correlated to mpg as well. Bottom line is that there are many variables and they each could impact mpg to a certain degree as well as each other.
There are of course many potential relationships in the data but lets use the transmission type “am” as our base model to predict the mpg for the car.
base_model <- lm(mpg ~ am, data = mtcars)
summary(base_model)
Please refer to Appendix 2 for the output of the model. For the basic model, on average a car with automatic transmission has 17.147 mpg while a manual transmission car has an additional 7.245 mpg over the automatic transmission car. The Adjusted R squared value is 0.3385 which indicates the model can only explain about 34% of the variance of the mpg variable. This indicates that other variables should be added to the model.
The step function selects the variables for a better model by running lm multiple times to obtain the best model.
best_model <- step(lm(data=mtcars, mpg ~ .),direction="both")
summary(best_model)
The step function results in a best model of wt,qsec and am as the predictors for mpg.Please see Appendix 2 for the output of this model. The model predicts that manual transmission cars have 2.9358 better mpg compared to automatic transmission cars. The Adjusted R squared value is 0.8336 which indicates that the model can explain 83.4% of the variance of the mpg variable which is a significant improvement over the base model. Also all the coefficients are significant at 0.05 significant level and better.
anova(base_model,best_model)
The anova results specifically the p test is significant and we reject the null hypothesis that the variables wt and qsec don’t contribute to the accuracy of the model. In other words our second model which has wt,qsec and am as variables is a better model than our basic model which only has am as a variable.
Please refer to Appendix 3. From the plots we can see that:
1. Residuals vs Fitted plot consists of random points without any discernible pattern so it supports the independence assumption.
2. Normal QQ plot lies close to a straight line indicating the residuals are normally distributed.
3. Scale vs Location plot shows no consistent pattern so confirms the constant variance assumption.
4. Residuals vs Leverage plot indicates no outliers are present as all values fall within the 0.5 bands. We can conclude our analysis meets the basic assumptions required of linear regression and that our answers are valid.
Assumming the data has a normal distribution, we perform a t-test and we can see that the difference between the automatic and manual transmissions is significant.
t.test(mpg ~ am, data = mtcars)
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group 0 mean in group 1
## 17.14737 24.39231
Based on our above results we can answer the 2 primary questions of this study:
- For the base model where mpg is predicted by transmission type only, manual transmission has 7.245 mpg better milage compared to automatic transmission.
- For the better model where mpg is predicted by transmission type, car weight and qsec time, manual transmission has 2.935 mpg better milage compared to automatic. transmission considering the transmission type alone. The other 2 predictors wt and qsec also contribute to the milage difference.
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amM 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
##
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## amM 2.9358 1.4109 2.081 0.046716 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ wt + qsec + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 28 169.29 2 551.61 45.618 1.55e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
par(mfrow = c(2,2))
plot(best_model)