Summary

This report evaluates the relationship of transmission type (i.e. automatic or manual) on motor vehicle miles per gallon (MPG) efficiency. The dataset mtcars from the R Package Datasets was evaluated in this report. Exploratory analysis provided evidence to suggest that there is difference in MPG between automatic and manual transmissions. An unpaired t.test was performed, which rejected the null hypothesis (p = 0.001) that the mean MPG between automatic and manual transmissions were equal with a mean MPG difference of 7. After performing a simple linear regression, type of transmissions accounted for 34% in the MPG variability. With multivariate linear regression and An Information Criterion (AIC) stepwise algorithm, it was found that 84% of the MPG variability is explained when transmission type is confounded with, cylinders, horse power, and weight of the car: manual transmission yield a 1.8 MPG increase compared to automatic transmission. From this data analysis, we conclude that transmission type does influence MPG, but this effect is further influence by other motor vehicle variables such as the number of cylinders, horse power, and weight.

Data Analysis Code

The complete code for this data analysis is available at my github repository

Variables in the dataset

  1. mpg: miles per gallon (numerical)
  2. cyl: number of cylinders (factor, 4,6,8)
  3. disp: displacement (cu.in.) (numerical)
  4. hp: gross horsepower (numerical)
  5. drat: rear axle ratio (numerical)
  6. wt: weight (1000 pounds) (numerical)
  7. qsec: 1/4 mile time (numerical)
  8. vs: V/S, V-engine or Straight engine (factor, V,S)
  9. am: transmission type (factor, automatic, manual)
  10. gear: number of forwards gears (factor, 3,4,5)
  11. carb: number of carburetors (factor, 1,2,3,4,5,6,7,8)

Loading and Processing the Dataset

data(mtcars)
str(mtcars)

After examining the dimensions of the datasets and the classes of the variables, cyl, vs, am, gear, and carb do not have continuous observations. These variables were then transformed into factor variables.

## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Exploratory analysis

From the boxplot above, the median, 25th and 75th quantiles of MPG are higher for manual transmission.

Inferential analysis

To evaluate that the means of automatic and manual transmission are not equal, an unpaired t.test was performed.

## 
##  Welch Two Sample t-test
## 
## data:  mpg by transm
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean in group Automatic    mean in group Manual 
##                17.14737                24.39231

From the results of the two-sample t test, we reject (p = 0.001) the null hypothesis that the means of automatic and manual transmissions are equal In addition, there is a mean difference of 7 MPG from manual to automatic transmission.

Strategy for Model Selection

Because the initial question to answer was if there is a difference in MPG by type of transmission, an simple linear regression was performed with mpg as outcome and transmission as predictor.

fit <- lm (mpg ~ transm, mtcars)
library(pander)
pander(summary(fit))
  Estimate Std. Error t value Pr(>|t|)
transmManual 7.245 1.764 4.106 0.000285
(Intercept) 17.15 1.125 15.25 1.134e-15
Fitting linear model: mpg ~ transm
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
32 4.902 0.3598 0.3385

From this simple regression model with a p < 0.001, manual transmissions yields 7.3 MPG higher than automatic transmission. Nevertheless, the models explains only 34% of the regression variance for MPG (adjusted R-squared = 0.3384). This results suggests that other variables influence the MPG variability besides transmission type. As seen in the paired graph (refer to the Appendix), there are other variables that seem to influence MPG variability.

To further evaluate the idea that other variables in addition to transmission type influence MPG, a An Information Criterion with an Backward Stepwise Algorithm was performed. The initial model to perform this algorithm was MPG as outcome, and all predictors in the mtcars dataset.

full.fit <- lm(mpg ~., mtcars)
best.fit <- step(full.fit, direction = "backward")
##results were hidden to limit the report's page length
  Estimate Std. Error t value Pr(>|t|)
cyl6cyl -3.031 1.407 -2.154 0.04068
cyl8cyl -2.164 2.284 -0.9472 0.3523
hp -0.03211 0.01369 -2.345 0.02693
wt -2.497 0.8856 -2.819 0.009081
transmManual 1.809 1.396 1.296 0.2065
(Intercept) 33.71 2.605 12.94 7.733e-13
Fitting linear model: mpg ~ cyl + hp + wt + transm
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
32 2.41 0.8659 0.8401

From the AIC-selected model, cylinder (p = 0.04 for 6 cylinders), horse power (p = 0.03), and weight (p < 0.01) influence MPG. These results suggest that within this data set, cylinders, horse power, and weight of the cars confound the effect of transmission type on MPG. As reported by the adjusted R-squared, this models explains 84% of the regression variance.

Finally, plots were constructed to evaluate patterns and non-normality of residuals.

par(mfrow=c(2,2))
plot(best.fit)

From the Residuals vs Fitted graph, the residuals do not show heteroskedacity as the variance seems to be similar throughout the plot. In addition, from the quantile plot, the residuals demonstrate a normal distribution.

Conclusion

The purpose of this data analysis report was to evaluate the relationship between transmission type on motor vehicle miles per gallon consumption (MPG). Exploratory analysis provided support to the further evaluate the question “Is an automatic or manual transmission better for MPG”. A unpaired t-test provided evidence (p = 001) that the mean MPG for automatic and manual transmission were not equal, with manual transmission having on higher mean of 7 MPG (95% CI 3.2 - 11.3) compared to automatic. Nevertheless, a simple linear regression yielded that transmission type only accounted for 34% of the regression variability. A multivariate approach was performed with the other variables in the data sets (refer to the section Variables in the dataset) to determine if transmission type had confounders. As shown in the section Strategy for Model Selection, cylinder, horse power, and weight of the motor vehicle confounded the influence of transmission type on MPG. Therefore, manual transmission does yields a better vehicle gas efficiency (on average 1.8 MPG) than automatic but this difference in MPG is also influenced by cylinders, horse power, and weight of the motor vehicle.

Appendix