Motor Trend Car Road Tests: Automatic vs Manual Transmission on Miles Per Gallon

Synopsis

The main goal of this project is to explore the relationship between a set of variables and miles per gallon (MPG) (outcome). Particularly interested in the following two questions:
1. Is an automatic or manual transmission better for MPG
2. Quantify the MPG difference between automatic and manual transmissions
The questions will be addressed using regression models and exploratory data analyses in the following sections.

Data Processing

library(datasets)
data(mtcars)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))

Exploratory Data Analysis

Various relationships between variables of interest are plotted in Figure 1 (See Appendix). Boxplot for automatic and manual transmissions versus mpg shows that manual tansmission gives higher average mpg compare to automatic transmission (Figure 2 - See Appendix).

Simple Regression Model

rm_single <- lm(mpg ~ am, data = mtcars)
summary(rm_single)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## amManual       7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

It shows a profound difference between the mpg for each type of transmission, where beta0 is the intercept and beta1 is the mean for automatic. Since the p-value is very much less than 0.05, we can reject the null hypothesis that cofounder variables may not contribute to the accuracy of the model. However, adjusted R-squared error predicts that only 33.85% of the regression variance are explained by this model, showing the importance of several other predictor variables. Let us pick the cofounder variables from correlation analysis. Correlation between mpg (miles per gallon) and other variables are given by

data(mtcars)
cor(mtcars)[1,]
##        mpg        cyl       disp         hp       drat         wt 
##  1.0000000 -0.8521620 -0.8475514 -0.7761684  0.6811719 -0.8676594 
##       qsec         vs         am       gear       carb 
##  0.4186840  0.6640389  0.5998324  0.4802848 -0.5509251

It shows that variables cylinders (cyl), weight (wt), displacement (disp), and Gross horsepower(hp) are strongly correlated with mpg. Hence, it is essential to consider these as cofounder variables for our regression analysis.

Multivariate Regression Model

rm_multi <- lm(mpg ~ cyl + disp + hp + wt + am, data = mtcars)
summary(rm_multi)
## 
## Call:
## lm(formula = mpg ~ cyl + disp + hp + wt + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.5952 -1.5864 -0.7157  1.2821  5.5725 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 38.20280    3.66910  10.412 9.08e-11 ***
## cyl         -1.10638    0.67636  -1.636  0.11393    
## disp         0.01226    0.01171   1.047  0.30472    
## hp          -0.02796    0.01392  -2.008  0.05510 .  
## wt          -3.30262    1.13364  -2.913  0.00726 ** 
## am           1.55649    1.44054   1.080  0.28984    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.505 on 26 degrees of freedom
## Multiple R-squared:  0.8551, Adjusted R-squared:  0.8273 
## F-statistic:  30.7 on 5 and 26 DF,  p-value: 4.029e-10

Adjusted R-squared error shows that 82.73% of the regression variance are explained by this model.

Residual and Diagnostics

Figure 3 shows the plot for residuals and diagnostics obtained by plotting the object returned by the multivariate regression model (See Appendix).

  1. Randomly plotted Residuals vs Fitted plot erifies the independence condition.
  2. Normal Q-Q plot predicts that residulas are normally distributed.
  3. Scale-Location plot with scattered points in a constant band pattern predicting constant variance.
  4. Residuals vs Leverage plot indicates value of increased leverage of outliers.

Summary

Boxplot from exploratory data analysis and simple regeression analysis shows that manual transmission has higher MPG compare to automatic transmission.

Manual transmission gives average of 1.55 MPG more than automatic transmission, considering cylinders (cyl), weight (wt), displacement (disp), and Gross horsepower(hp) as cofounding variables.

Appendix

pairs(mpg ~., data = mtcars, main = "Figure 1: Motor Trend Car Road Tests")

boxplot(mpg ~ am, data = mtcars, col = "bisque", 
        xlab = "Transmission type", ylab = "Miles Per Gallon (mpg)", 
        main = "Figure 2: Boxplot showing average mile per gallon")

par(mfrow = c(2,2))
plot(rm_multi, main = "Figure 3: Residuals and Diagnostics")