In this report we aim to show whether manual or automatic transmission is better for mile per gallon consumed (MPG). We also intend to quantify the MPG difference between automatic and manual transmissions. A linear regression model with MPG as the outcome is fitted with transmission , weight and horse power used as response variables. We have predicted that the weight of the car was a significant confounder in our analysis, and the choice of manual or automatic depends on it. We conclude that the cars with a automatic transmission are better for mpg than cars with an automatic transmission.
This assignment makes use of the dataset Motor Trend Car Road Tests (mtcars) data.The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74) models.
The data has 19 automatic cars with an average MPG of 17.15 compared to the mean MPG of manual cars of 24.39. Table 1.0 shows the summary of the MPG for each transmission with the standard deviation.
Plot 1 in the appendix section shows that motor vehicles with high number of cylinders have a lower mean MPG consumption for both transmissions. Plot shows weight as a confounder of the data since, manual cars have lower weight but a higher MPG compared to automic cars with high weight but lower MPG consumption.
Only analyzing the transmission factor, we could conclude that an automatic car is better than a manual one as model1 summary in the appendix shows. However, there could be more factors that influenced in the car efficiency and act as a confounder variable in the results obtained. To get a preview which variables could have more influence in the outcome of our regression model, we then build a model with all the variables included and selected the ones who had a significant participation, i.e. a significant p-value or big residual standard deviation.
Considering the model with all the variables as predictor variables , we obtained a high R2 statistic of 0.89, meaning that this model explained 89.31% of the data variance1.We could have selected this model but, it has many linear dependent variables, so this increases the standard error, hence variance, of the coefficients, named variance inflation.
We used a stepwise selection of the model and used anova to check whether the best fit model has a significance difference from the others. The variables we selected from the model with all the variables are
To validate the model we did a plot of the standardized residuals and the its normal qq plot ( plot3 in the appendix). This shows tha our residuals are normally distributed.
#include the key variables
model2 <- lm(mpg ~ am + wt + hp + wt:am + hp:am ,data= dataCars)
sModel2 <-summary(model2)
sModel2
##
## Call:
## lm(formula = mpg ~ am + wt + hp + wt:am + hp:am, data = dataCars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9873 -1.4467 -0.5355 1.2614 5.5987
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.70393 2.67515 11.477 1.12e-11 ***
## amManual 13.74000 4.22337 3.253 0.00316 **
## wt -1.85591 0.94511 -1.964 0.06034 .
## hp -0.04094 0.01363 -3.004 0.00583 **
## amManual:wt -5.76895 2.07201 -2.784 0.00987 **
## amManual:hp 0.02779 0.01921 1.447 0.15983
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.286 on 26 degrees of freedom
## Multiple R-squared: 0.8793, Adjusted R-squared: 0.8561
## F-statistic: 37.89 on 5 and 26 DF, p-value: 3.901e-11
From the model above, we can observe that the main variables are transmission , the interactions between horse power and transmission and weight and transmission. The model has an r-squared of 0.88 and a p-value of 4.913e-10
A manual motor vehicle has a high MPG 30.7. Both weight and horse power have a negative effect on the MPG efficiency. With other effects held constant, a unit increase in weight will reduce the MPG by -1.86 and the horse power by -0.04.
#include the key variables
model3 <- lm(mpg ~ wt:factor(am) + hp:factor(am) ,data= dataCars)
sModel3 <-summary(model3)
sModel3
##
## Call:
## lm(formula = mpg ~ wt:factor(am) + hp:factor(am), data = dataCars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6786 -2.0664 -0.3051 0.9406 5.8775
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 36.21665 2.40962 15.030 1.23e-14 ***
## wt:factor(am)Automatic -3.36820 0.95787 -3.516 0.00157 **
## wt:factor(am)Manual -3.29638 1.48606 -2.218 0.03514 *
## factor(am)Automatic:hp -0.03846 0.01584 -2.428 0.02214 *
## factor(am)Manual:hp -0.03300 0.01406 -2.347 0.02650 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.661 on 27 degrees of freedom
## Multiple R-squared: 0.8302, Adjusted R-squared: 0.805
## F-statistic: 33 on 4 and 27 DF, p-value: 4.913e-10
Checking on the weight, if we keep the horse power constant, for each unit increase, which is equal to 1000 lbs , the number of miles per galon will decrease -3.3 for manual cars and -3.37 for automatic cars. Regarding the horse power, for each additional horse power car , it will make-0.03 more miles for manual cars and -0.04 for automatic cars.
Finaly we did an anova to compare the models and shows model1 is better than fit but our last model (model2) has no significant difference. But due to variance inflation we setlled on model2.
#compare the models
#using anova
anova(fit, model1, model2)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## Model 3: mpg ~ am + wt + hp + wt:am + hp:am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.9
## 2 15 120.4 15 600.49 4.9874 0.001759 **
## 3 26 135.9 -11 -15.50 0.1755 0.996991
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
| am | count | mean | sd |
|---|---|---|---|
| Automatic | 19 | 17.14737 | 3.833966 |
| Manual | 13 | 24.39231 | 6.166504 |