Executive Summary

Hi, MotorTrends would like to take you to a trip to determine which transmission type, manual or automatic, is the best for maximizing your Miles-per-gallon (MPG). We will also show you the quantity or amount difference of each and hopefully you’ll clearly see which one through the values. In this summary, we will use regression models as a basis to judge which one is better.

Exploratory Data Analysis

First lets explore the data.

data(mtcars)
str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

The library dataset is from a 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973-74 models). The source of the dataset can be found here: https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

cor(mtcars$mpg, select(mtcars, -starts_with('mpg')))
##            cyl       disp         hp      drat         wt     qsec
## [1,] -0.852162 -0.8475514 -0.7761684 0.6811719 -0.8676594 0.418684
##             vs        am      gear       carb
## [1,] 0.6640389 0.5998324 0.4802848 -0.5509251

We can see that the cyl, dsp, hp, wt, and carb have a negatively correlated relationship with the mpg variable. This will be helpful in determining which variables we would be using for the regression model.

We also used a boxplot of the mpg and am (transmission type) to see their relationship (Appendix 1). It can be seen from the boxplot that manual transmission seemingly beats automatic transmission in having a better mpg. We also inspected the relationship of the weight and mpg through a scatter plot because they were the most correlated out of the variables (Appendix 2). There seems to be a an obvious trend where the heavier the weight the less miles per gallon.

Regression Models

basicLM <- lm(mpg~am,data=mtcars)
summary(basicLM)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285

Now we can see that the transmission has the variability of 36% which means from the initial interpretation of the box plots is not much. We determine that the transmission type is not the only contributing or influencing factor for the mpg. With this, we will attempt multiple linear regression.

compLM <- lm(mpg~.,data=mtcars)
summary(compLM)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07

We can see that our model achieved 86% R-squared. Among all the variables we can see that the most significant would be weight (wt). This was the variable that increased our R-squared. Transmission (AM) seems to also be an important factor.

Residuals Analysis

Now let us check the residuals and diagnose our models. There are only a few variations that could not be explained so those points shouldn’t be a problem.

plot(compLM,which=1)

plot(basicLM,which=1)

Conclusion

From our regression models and regression analysis we were able to see that the Manual transmission gets more mile-per-gallon than the automatic transmission.

Appendix

Appendix 1: Boxplot of mpg & am

# Add a label column for interpretation
tm <- mtcars$am
tm <- factor(tm, labels = c('at', 'mt'))
mtcars <- cbind(mtcars, tm)

boxplot(mtcars$mpg ~ mtcars$tm, data = mtcars, 
        ylab="Miles per Gallon",
        xlab="Transmission Type",
        main="Miles per Gallon vs Transmission Type", 
        col="red")

Appendix 2: Scatterplot of mpg and wt (Highest correlated variable)

plot(mtcars$wt, mtcars$mpg,
     ylab="Miles per Gallon",
     xlab="Weight",
     main="Miles per Gallon and Weight")