Motor Trend Magazine would like some data analyzed. They are mainly interested in these two points: - “Is an automatic or manual transmission better for MPG?” - “Quantify the MPG difference between automatic and manual transmissions” It is our job to take care of this data anlysis for them.

Loading The Libraries and Data

library(datasets)
data(mtcars)
head(mtcars)
summary(mtcars)
dim(mtcars)

Here we see that the dataset “mtcars” includes 32 observations of 11 variables. This is data from Motor Trend magazine on 11 aspects of 32 cars.

Two of the variables must be changed from “numeric” to “factor” variables because they are qualitative not quantitative values.

Tidying The Data

data(mtcars)
mtcars$cyl  <- factor(mtcars$cyl)
mtcars$vs   <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am   <- factor(mtcars$am,labels=c("Automatic","Manual"))
summary(mtcars)

Several of the variables in the dataset are what is known as “factor” variables rather than “numeric” variables. They need to be transformed in order to be analyzed correctly. Note the difference in the way these variables are now summarized.

Executive Summary

This is a report analyzing the effect of transmission type (automatic or manual) on the mpg of 32 cars. The data is provided in the R library mtcars and the data comes from Motor Trend magazine. Here we seek to determine the actual difference in mpg caused by transmission types with the effects of all other contributing factors removed. To do this we explore the data, create multiple regression models, and interpret the quantitative results of these models. Ultimately we determined that when all other variables are accounted for, having a manual transmission will give a car a 1.8 mpg increase over having an automatic transmission. *For brevity, most of the code in this report will not be evaluated. Please see the appendix for any code outputs, or feel free to run the code on your own machine to test it’s functionality.

Exploratory Data Analysis

Let’s explore the data.

summary(mtcars)

This is a general summary of the data, but it does not help us get an idea of which transmission type gets better gas mileage, so let’s create an exploratory plot.

boxplot(mpg ~ am, data = mtcars, xlab = "Transmission", ylab="MPG", main="Boxplot of MPG vs. Transmission")

This box-and-whisker plot seems to clearly illustrate the answer to our question, “Is an automatic or manual transmission better for MPG?”, but now let’s do a t-test to make sure the differenc is significant. Please view this plot in the appendix.

auto_tran <- mtcars[mtcars$am == "Automatic",]
manu_tran <- mtcars[mtcars$am == "Manual",]
t.test(auto_tran$mpg, manu_tran$mpg)

Based on the p-value of this t-test we can confidently state the difference between mpg of automatic transimission vehicles and manual transmission vehicles is statistically significant enough to be undoubtedly corrolated to the difference in transmission types, but we must use regression to see if there are other factors contributing to this difference. From our regression models we will be able to determine how much of that difference is caused exclusively by the difference in transmission types. Please view the results of this test in the appendix.

Regression Models

Let’s perform simple linear regression.

fitam <- lm(mpg ~ am, data = mtcars)
summary(fitam)

First we look at a model with one regrssor: the transmission type. The results of our regression analysis show that the average mpg for all automatics transmissions is 17.147 and for all manual transmissions it is 17.147 + 7.245 which equals 24.392. This does not take into account any other aspects of the vehicles which may be affecting average mpg.

Now lets perform some multiple regression.

fitall <- lm(mpg ~ ., data = mtcars)
summary(fitall)

Now we look at a regression model that includes all of the variables. This model uses variables that do not necessarily need to be included and therefor introduces unnecessary error.

How do we separate the effects of transmission type from the effect of the 10 other variables? We can use the step() function which will create linear regression models for each combination of variables and then select the one with the right combination of regressors to capture the best fit of the model with the lowest error. Lets try that now.

fitperfect <- step(fitall, direction = "both")
summary(fitperfect)

From the summary of the step function we can see that the confounding variables are cyl, wt and hp. Therefor our best fitted model is lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars). The adjusted R-squared for this model is 0.8401 so this model accounts for 84.01% of the total variability of mpg.

Interpretation of Coefficients

To explain the coefficients in this summary we can say that the intercept is the mpg given an automatic transmission, the cyl6 and cyl8 coefficients represent a decrease in mpg of 3.0mpg and 5.2mpg for each number of cylinders respectively, the hp coefficient represents a decrease in mpg of 0.032mpg for each additional horsepower, the wt coefficient represents a decrease in mpg by 2.497mpg per 1000lbs, and the amManual coefficient represents a 1.81mpg increase in mpg by switching from an automatic transmission to a manual transmission when corrected for all other significantly contributing variables.

Appendix

boxplot(mpg ~ am, data = mtcars, xlab = "Transmission (Automatic = 0, Manual = 1)", ylab="MPG", main="Boxplot of MPG vs. Transmission")

auto_tran <- mtcars[mtcars$am == 0,]
manu_tran <- mtcars[mtcars$am == 1,]
t.test(auto_tran$mpg, manu_tran$mpg)
## 
##  Welch Two Sample t-test
## 
## data:  auto_tran$mpg and manu_tran$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -11.280194  -3.209684
## sample estimates:
## mean of x mean of y 
##  17.14737  24.39231
fitam <- lm(mpg ~ am, data = mtcars)
summary(fitam)
## 
## Call:
## lm(formula = mpg ~ am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.3923 -3.0923 -0.2974  3.2439  9.5077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   17.147      1.125  15.247 1.13e-15 ***
## am             7.245      1.764   4.106 0.000285 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared:  0.3598, Adjusted R-squared:  0.3385 
## F-statistic: 16.86 on 1 and 30 DF,  p-value: 0.000285
fitall <- lm(mpg ~ ., data = mtcars)
summary(fitall)
## 
## Call:
## lm(formula = mpg ~ ., data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4506 -1.6044 -0.1196  1.2193  4.6271 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept) 12.30337   18.71788   0.657   0.5181  
## cyl         -0.11144    1.04502  -0.107   0.9161  
## disp         0.01334    0.01786   0.747   0.4635  
## hp          -0.02148    0.02177  -0.987   0.3350  
## drat         0.78711    1.63537   0.481   0.6353  
## wt          -3.71530    1.89441  -1.961   0.0633 .
## qsec         0.82104    0.73084   1.123   0.2739  
## vs           0.31776    2.10451   0.151   0.8814  
## am           2.52023    2.05665   1.225   0.2340  
## gear         0.65541    1.49326   0.439   0.6652  
## carb        -0.19942    0.82875  -0.241   0.8122  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared:  0.869,  Adjusted R-squared:  0.8066 
## F-statistic: 13.93 on 10 and 21 DF,  p-value: 3.793e-07
fitperfect <- step(fitall, direction = "both")
## Start:  AIC=70.9
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - cyl   1    0.0799 147.57 68.915
## - vs    1    0.1601 147.66 68.932
## - carb  1    0.4067 147.90 68.986
## - gear  1    1.3531 148.85 69.190
## - drat  1    1.6270 149.12 69.249
## - disp  1    3.9167 151.41 69.736
## - hp    1    6.8399 154.33 70.348
## - qsec  1    8.8641 156.36 70.765
## <none>              147.49 70.898
## - am    1   10.5467 158.04 71.108
## - wt    1   27.0144 174.51 74.280
## 
## Step:  AIC=68.92
## mpg ~ disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1    0.2685 147.84 66.973
## - carb  1    0.5201 148.09 67.028
## - gear  1    1.8211 149.40 67.308
## - drat  1    1.9826 149.56 67.342
## - disp  1    3.9009 151.47 67.750
## - hp    1    7.3632 154.94 68.473
## <none>              147.57 68.915
## - qsec  1   10.0933 157.67 69.032
## - am    1   11.8359 159.41 69.384
## + cyl   1    0.0799 147.49 70.898
## - wt    1   27.0280 174.60 72.297
## 
## Step:  AIC=66.97
## mpg ~ disp + hp + drat + wt + qsec + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - carb  1    0.6855 148.53 65.121
## - gear  1    2.1437 149.99 65.434
## - drat  1    2.2139 150.06 65.449
## - disp  1    3.6467 151.49 65.753
## - hp    1    7.1060 154.95 66.475
## <none>              147.84 66.973
## - am    1   11.5694 159.41 67.384
## - qsec  1   15.6830 163.53 68.200
## + vs    1    0.2685 147.57 68.915
## + cyl   1    0.1883 147.66 68.932
## - wt    1   27.3799 175.22 70.410
## 
## Step:  AIC=65.12
## mpg ~ disp + hp + drat + wt + qsec + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  1     1.565 150.09 63.457
## - drat  1     1.932 150.46 63.535
## <none>              148.53 65.121
## - disp  1    10.110 158.64 65.229
## - am    1    12.323 160.85 65.672
## - hp    1    14.826 163.35 66.166
## + carb  1     0.685 147.84 66.973
## + vs    1     0.434 148.09 67.028
## + cyl   1     0.414 148.11 67.032
## - qsec  1    26.408 174.94 68.358
## - wt    1    69.127 217.66 75.350
## 
## Step:  AIC=63.46
## mpg ~ disp + hp + drat + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1     3.345 153.44 62.162
## - disp  1     8.545 158.64 63.229
## <none>              150.09 63.457
## - hp    1    13.285 163.38 64.171
## + gear  1     1.565 148.53 65.121
## + cyl   1     1.003 149.09 65.242
## + vs    1     0.645 149.45 65.319
## + carb  1     0.107 149.99 65.434
## - am    1    20.036 170.13 65.466
## - qsec  1    25.574 175.67 66.491
## - wt    1    67.572 217.66 73.351
## 
## Step:  AIC=62.16
## mpg ~ disp + hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1     6.629 160.07 61.515
## <none>              153.44 62.162
## - hp    1    12.572 166.01 62.682
## + drat  1     3.345 150.09 63.457
## + gear  1     2.977 150.46 63.535
## + cyl   1     2.447 150.99 63.648
## + vs    1     1.121 152.32 63.927
## + carb  1     0.011 153.43 64.160
## - qsec  1    26.470 179.91 65.255
## - am    1    32.198 185.63 66.258
## - wt    1    69.043 222.48 72.051
## 
## Step:  AIC=61.52
## mpg ~ hp + wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## - hp    1     9.219 169.29 61.307
## <none>              160.07 61.515
## + disp  1     6.629 153.44 62.162
## + carb  1     3.227 156.84 62.864
## + drat  1     1.428 158.64 63.229
## - qsec  1    20.225 180.29 63.323
## + cyl   1     0.249 159.82 63.465
## + vs    1     0.249 159.82 63.466
## + gear  1     0.171 159.90 63.481
## - am    1    25.993 186.06 64.331
## - wt    1    78.494 238.56 72.284
## 
## Step:  AIC=61.31
## mpg ~ wt + qsec + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              169.29 61.307
## + hp    1     9.219 160.07 61.515
## + carb  1     8.036 161.25 61.751
## + disp  1     3.276 166.01 62.682
## + cyl   1     1.501 167.78 63.022
## + drat  1     1.400 167.89 63.042
## + gear  1     0.123 169.16 63.284
## + vs    1     0.000 169.29 63.307
## - am    1    26.178 195.46 63.908
## - qsec  1   109.034 278.32 75.217
## - wt    1   183.347 352.63 82.790
summary(fitperfect)
## 
## Call:
## lm(formula = mpg ~ wt + qsec + am, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4811 -1.5555 -0.7257  1.4110  4.6610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   9.6178     6.9596   1.382 0.177915    
## wt           -3.9165     0.7112  -5.507 6.95e-06 ***
## qsec          1.2259     0.2887   4.247 0.000216 ***
## am            2.9358     1.4109   2.081 0.046716 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared:  0.8497, Adjusted R-squared:  0.8336 
## F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11