Executive Summary

Motor Trend was interested in the relationship between MPG and the transmission type for the “mtcars” dataset, specifically: “Is an automatic or manual transmission better for MPG?,” and, “Quantify the MPG difference between automatic and manual transmissions.”

A T-test and regression study showed with 95% confidence that manual transmissions are better for MPG. The average MPG for manual transmissions is 7.4 MPG higer (Manual: 24.4,and Automatic: 17.1). However a car’s transmission type accounted for just 34% of MPG variation. Other factors were more important.

The best fit regression model included weight (wt), power (qseq) and transmission type (am). The effect of transmission type was only 2.9 MPG, with a confidence level of 95%. Consumers can find cars with the best MPG by selecting a low weight car with low power and a manual transmission. High power corresponds to a low value of “qsec”, 1/4 mile time in seconds.

Consumers can easily catagorize cars by discrete features. The best fit model using only descrete features included transmission type, # of cylinders and # of carburators. Here the affect of transmission type was 4.2 MPG with 99% confidence. Consumers can find the best mileage in cars with manual transmissions and the lowest numbers of cylinders (4) and carburators(1).

Exact quantification of transmission type effect on mileage is not feasible with the data provided.

Data Processing

Sources include the R mtcars dataset (?mtcars) and “Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411.” The variables for “am” and “vs” were converted to factors because they are classes.

Data Dictionary

  • [, 1] mpg Miles/(US) gallon
  • [, 2] cyl Number of cylinders
  • [, 3] disp Displacement (cu.in.)
  • [, 4] hp Gross horsepower
  • [, 5] drat Rear axle ratio
  • [, 6] wt Weight (lb/1000)
  • [, 7] qsec 1/4 mile time in seconds
  • [, 8] vs Engine cylinder arrangent (0=V, 1=Straight/Inline) - factor
  • [, 9] am Transmission (0 = automatic, 1 = manual) - factor
  • [,10] gear Number of forward gears
  • [,11] carb Number of carburetors

Exploratory Data Analysis

Appendix A contains a histogram of the MPG variable that shows it’s distribution is not skewed significantly and a paired comparison plot of all variables to visually assess relationships. Below, boxplots of MPG with the discrete predictor variables illustrate the presense or absense of correlations.

Hypothesis Testing

This t-test allows us to conclude that the difference in the mean MPG is significant because the 95% confidence interval does not include 0 and the p-value (0.0014) for the two tail test is very small.

t1<-t.test(mtcars[mtcars$am == "Automatic",]$mpg,mtcars[mtcars$am == "Manual",]$mpg) ; t1
## 
##  Welch Two Sample t-test
## 
## data:  mtcars[mtcars$am == "Automatic", ]$mpg and mtcars[mtcars$am == "Manual", ]$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231

Building the simplest model

A linear Model of mpg~am shows that manual transmissions have 7.2 higher MPG on average, compared to automatics. However, the adjusted R squared value is only 34 indicating the presence of confounding variables, so we need to include additional predictors.

fit1 <- lm(mpg~am, data = mtcars);  summary(fit1)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Building the Best Fit Models

Bidirectional AIC step wise regression searches for a good model by adding variables that reduce residuals then checking to see if any of the included variables have a significant reduced effect and can be removed. This process is iterated until the “out of sample” error estimate is as low as possible, which prevents over fitting. http://en.wikipedia.org/wiki/Stepwise_regression and R “?step”.

The model includes weight(wt), power (high qseq = low power) and transmission(amManual). All coefficients have at least 95% two tailed significance levels. and the adjusted r-squared shows that 83% of the MPG variation is explained. Residual plots are in Appendix B.

fitall <- lm(mpg~., data = mtcars);  #summary(fitall)
fitbest <- step(fitall, trace=0);  summary(fitbest)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## amManual      2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

Another Best Fit model was built using only descrete predictors that consumers can quickly use to select cars that are predicted to have high MPG. This search settles on a model with preditors of transmission type, cylinders and carborators. It explains 79% percent of the MPG variance and the coeffients all have 99% significant p-values. In addition the residual plots in Appendix C seem randomly distributed, indicating most of the reducible variance has been modeled. This model has the advantage of being simple for consumers to use when shopping.

fitD <- lm(mpg~am+cyl+gear+carb+vs, data = mtcars)
fitDbest <- step(fitD, trace=0);  summary(fitDbest)
## 
## Call:
## lm(formula = mpg ~ am + cyl + carb, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.8853 -1.1581  0.2646  1.4885  5.4843 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  32.1731     2.4914  12.914 2.59e-13 ***
## amManual      4.2430     1.3094   3.240 0.003074 ** 
## cyl          -1.7175     0.4298  -3.996 0.000424 ***
## carb         -1.1304     0.4058  -2.785 0.009481 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.755 on 28 degrees of freedom
## Multiple R-squared:  0.8113, Adjusted R-squared:  0.7911 
## F-statistic: 40.13 on 3 and 28 DF,  p-value: 2.855e-10

Conclusion

Regression analysis has shown that mulitple features effect MPG. Models can be built that use different sets of predictors to estimate MPG. In all cases, the transmission type is significantly important, with its coefficients having a range of values (2.9, 4.2 and 7.4). A conclusion is diffult because car manufactures may preferentially produce high MPG model cars with manual transmissions. Thus the population of cars may be skewed (not normally distributed). It would be interesting to compare the exact same vehicle model with each transmission type. We estimate that in such a comparison the coefficient should be less than or equal to 2.9.

Appendix A

par(mfrow = c(1,1),oma = c(3, 2, 2, 2));  hist(mtcars$mpg,breaks=20);  pairs(mtcars)

Appendix B

Residual Plot for the Best Fit Model

par(mfrow = c(2,2));  plot(fitall)

Appendix C

Residual Plot for the Discrete Variable Best Fit Model

par(mfrow = c(2,2));  plot(fitDbest)