Executive Summary

As the data science expert of Data Trends Magazine I was asked to deliver a report which could support a “rule of thumb” for our readers who are questioning us if they should buy automatic or manual transmission cars when aiming for a high miles per gallon vehicle. After an exploratory analysis, I concluded that a model with only the type of transmission as predictor was not satisfactory and proceded to find a more suitable one, with the help of the all subsets regression thecnique. A suitable model was found and diagnosed. The answer to our readers is: “Look for a manual transmission car with the lowest horse power avaiable”.

Exploratory Analysis

There are 32 observations of 11 variables. The two principal variables of interest are mpg (numeric - Miles/(US) gallon) which is our dependent variable and am (numeric - Transmission (0 = automatic, 1 = manual)), our predictor. We can see in the Mpg vs Transmission plot (appendix) that, in average, the mpg values are superior for manual transmission cars but they have a greater variance of mpg values that overlaps those of automatic cars, meaning that if a reader of Motor Trend wants to buy a low mpg car, it is not enough criterion to advise him to buy a manual transmission car; other variables should be at stake.

Model Building

First, I tried a model (model1) with all variables as predictors. Considering all the other variables constant, we can expect with this model1 that manual transmission cars can cover, on average, more 2.90 miles per gallon than automatic transmission cars. Sadly, this coefficient, as all others in this model, have unancceptable high p-values, althought the adjusted R-squared value (82%) is good. A better model was researched using the all subsets regression from the leaps package and choosing the Mallows Cp statistic as the stopping rule for this stepwise regression. Literature1 suggests that we should choose the combination of variables with a Cp value closer to the number of that combination of variables (see Cp plot). So, I choosed a model wich includes the intercept, am, cyl and hp (Cp = 4.4 and parameter number = 4). In this model, cyl8 was non significant, so I opted to exclude the cyl variable altogether from the model. In the final model (model2) the p-values are ok and the adjusted R-square tells us that 77% of the variance is explained by this model. That’s acceptable. So, as long hp is constant, manual transmission cars cover, in average, 5.3 (CI = 3.1 - 7.5) more miles with one gallon than manual transmission cars. As the hp coefficient is negative ,if the readers of the magazine want a car with a maximum of mpg, they should choose a manual transmission car with a minimum of hp (see model2 plot).

Diagnostic

Finally, I checked for the linearity of the final model, the normality of the residuals, the constant variability of the residuals and influential observations. I will assume from what we know about the data collection for this data set that the dependent variable values are independent from each other (e.g. the mpg of one car does not influence the mpg of other car), althought cars belong to families of makers… We can see in the Residuals vc Fitted plot (upper left) that the linearity of the model is acceptable as the points seem to be near randomly distributed above and under the traced line. In the upper right we have the NormalQ-Q plot and the points follow the straight diagonal line with more or less precision, so we would say that our residuals distribution is nearly normal. In the Scale Location plot (bottom left) we test for the constant variability of the residuals and, as we can see, the points distribution seems to follow a curved path. Probably, the variability is not constant enough. Finally, at the Residuals vs Leverage plot (bottom right) we verify that although there are some outliers with substancial leverage, like the Toyota Corolla, they don’t appear to be influential in high degree because their Coock’s distance is small (< 1)2. In sum, I think that this model conforms with his assumptions in an acceptable manner.

Conclusion

Manual cars tend to have a superior miles per gallon perfomance than automatic cars with similar horsepower. As a suggestion for a future study I would consider adding to the final model one more variable, the weight (wt) of the cars.

1 http://en.m.wikipedia.org/wiki/Mallow's_Cp
2 http://en.m.wikipedia.org/wiki/Cook's_distance

Appendix