Motor Trend

Exploring the relationship of MPG with other variables

Executive Summary

As an employee of Motor Trend, a magazine about the automobile industry, I attempt to answer the following questions:

  1. Is an automatic or manual transmission better for MPG?
  2. Quantify the MPG difference between automatic and manual transmissions

This analysis explores the relationship between the variable transmission type (manual or automatic) and the variable MPG (miles per gallon). This analysis uses the mtcars data set.

A t-test between automatic and manual transmission vehicles shows that manual transmission vehicles have around 7MPG greater than automatic transmission vehicles. Fitting multiple linear regressions shows that manual transmission contributes less significantly to MPG, giving an improvement of 1.81 MPG only. Other variables like weight, horsepower, and number of cylinders contribute more significantly to the overall MPG of vehicles.

Loading the required libraries and data set

library(ggplot2)
data(mtcars)

Taking a look at the data and converting categorical variables into factors:

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1
dim(mtcars)
## [1] 32 11
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
## 
##     mpg

Exploratory Analysis

Reference Appendix Fig. 1

Attached box plot compares Automatic and Manual transmission MPG. The plot shows that there is a significant increase in MPG for vehicles with manual transmission than for vehicles with automatic transmission.

Following is the T-Test between transmission type and MPG

ttestRes <- t.test(mpg ~ am)
ttestRes$p.value
## [1] 0.001373638

The T-Test rejects the null hypothesis (the difference between transmission types is 0).

ttestRes$estimate
## mean in group 0 mean in group 1 
##        17.14737        24.39231

The difference estimate between the two transmissions is 7.24494 MPG, which means that the mean for MPG of manual transmitted cars is 7.24494 more than that of automatic transmitted cars.

Regression Analysis

Fitting the model

fit <- lm(mpg ~ ., data = mtcars)
summary(fit)$coef
##                Estimate  Std. Error     t value   Pr(>|t|)
## (Intercept) 23.87913244 20.06582026  1.19004018 0.25252548
## cyl6        -2.64869528  3.04089041 -0.87102622 0.39746642
## cyl8        -0.33616298  7.15953951 -0.04695316 0.96317000
## disp         0.03554632  0.03189920  1.11433290 0.28267339
## hp          -0.07050683  0.03942556 -1.78835344 0.09393155
## drat         1.18283018  2.48348458  0.47627845 0.64073922
## wt          -4.52977584  2.53874584 -1.78425732 0.09461859
## qsec         0.36784482  0.93539569  0.39325050 0.69966720
## vs1          1.93085054  2.87125777  0.67247551 0.51150791
## am1          1.21211570  3.21354514  0.37718957 0.71131573
## gear4        1.11435494  3.79951726  0.29328856 0.77332027
## gear5        2.52839599  3.73635801  0.67670068 0.50889747
## carb2       -0.97935432  2.31797446 -0.42250436 0.67865093
## carb3        2.99963875  4.29354611  0.69863900 0.49546781
## carb4        1.09142288  4.44961992  0.24528452 0.80956031
## carb6        4.47756921  6.38406242  0.70136677 0.49381268
## carb8        7.25041126  8.36056638  0.86721532 0.39948495

We see that none of the coefficients have p-value less than 0.05. Thus, we cannot conclude which variables are more significant.

Backward selection may determine which variables are most significant

stepf <- step(fit)
## Start:  AIC=76.4
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb
## 
##        Df Sum of Sq    RSS    AIC
## - carb  5   13.5989 134.00 69.828
## - gear  2    3.9729 124.38 73.442
## - am    1    1.1420 121.55 74.705
## - qsec  1    1.2413 121.64 74.732
## - drat  1    1.8208 122.22 74.884
## - cyl   2   10.9314 131.33 75.184
## - vs    1    3.6299 124.03 75.354
## <none>              120.40 76.403
## - disp  1    9.9672 130.37 76.948
## - wt    1   25.5541 145.96 80.562
## - hp    1   25.6715 146.07 80.588
## 
## Step:  AIC=69.83
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am + gear
## 
##        Df Sum of Sq    RSS    AIC
## - gear  2    5.0215 139.02 67.005
## - disp  1    0.9934 135.00 68.064
## - drat  1    1.1854 135.19 68.110
## - vs    1    3.6763 137.68 68.694
## - cyl   2   12.5642 146.57 68.696
## - qsec  1    5.2634 139.26 69.061
## <none>              134.00 69.828
## - am    1   11.9255 145.93 70.556
## - wt    1   19.7963 153.80 72.237
## - hp    1   22.7935 156.79 72.855
## 
## Step:  AIC=67
## mpg ~ cyl + disp + hp + drat + wt + qsec + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - drat  1    0.9672 139.99 65.227
## - cyl   2   10.4247 149.45 65.319
## - disp  1    1.5483 140.57 65.359
## - vs    1    2.1829 141.21 65.503
## - qsec  1    3.6324 142.66 65.830
## <none>              139.02 67.005
## - am    1   16.5665 155.59 68.608
## - hp    1   18.1768 157.20 68.937
## - wt    1   31.1896 170.21 71.482
## 
## Step:  AIC=65.23
## mpg ~ cyl + disp + hp + wt + qsec + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - disp  1    1.2474 141.24 63.511
## - vs    1    2.3403 142.33 63.757
## - cyl   2   12.3267 152.32 63.927
## - qsec  1    3.1000 143.09 63.928
## <none>              139.99 65.227
## - hp    1   17.7382 157.73 67.044
## - am    1   19.4660 159.46 67.393
## - wt    1   30.7151 170.71 69.574
## 
## Step:  AIC=63.51
## mpg ~ cyl + hp + wt + qsec + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - qsec  1     2.442 143.68 62.059
## - vs    1     2.744 143.98 62.126
## - cyl   2    18.580 159.82 63.466
## <none>              141.24 63.511
## - hp    1    18.184 159.42 65.386
## - am    1    18.885 160.12 65.527
## - wt    1    39.645 180.88 69.428
## 
## Step:  AIC=62.06
## mpg ~ cyl + hp + wt + vs + am
## 
##        Df Sum of Sq    RSS    AIC
## - vs    1     7.346 151.03 61.655
## <none>              143.68 62.059
## - cyl   2    25.284 168.96 63.246
## - am    1    16.443 160.12 63.527
## - hp    1    36.344 180.02 67.275
## - wt    1    41.088 184.77 68.108
## 
## Step:  AIC=61.65
## mpg ~ cyl + hp + wt + am
## 
##        Df Sum of Sq    RSS    AIC
## <none>              151.03 61.655
## - am    1     9.752 160.78 61.657
## - cyl   2    29.265 180.29 63.323
## - hp    1    31.943 182.97 65.794
## - wt    1    46.173 197.20 68.191
summary(stepf)$coef
##                Estimate Std. Error   t value     Pr(>|t|)
## (Intercept) 33.70832390 2.60488618 12.940421 7.733392e-13
## cyl6        -3.03134449 1.40728351 -2.154040 4.068272e-02
## cyl8        -2.16367532 2.28425172 -0.947214 3.522509e-01
## hp          -0.03210943 0.01369257 -2.345025 2.693461e-02
## wt          -2.49682942 0.88558779 -2.819404 9.081408e-03
## am1          1.80921138 1.39630450  1.295714 2.064597e-01

This model has 4 variables, viz. cylinders, horsepower, weight, transmission. The R-squared value (0.8659) confirms that this model explains about 87% of the variance in MPG. The p-values also are significant because they have a p-value < 0.05. The coefficients conclude that increasing the number of cylinders from 4 to 6 will decrease the MPG by 3.03.

Furthermore, increasing the cylinders to 8 will decrease the MPG by 2.16. Increasing the horsepower will decrease MPG by 3.21 for every 100HP. Increasing weight will decrease MPG by 2.5 for every 1000 lbs. Manual transmission will improve the MPG by 1.81.

Residual Plots

Reference Appendix Fig. 2

The following inferences can be drawn from the plots:

  1. The points of the Residuals vs. Fitted plot are random, supporting the assumption of independence.
  2. Most points of the Normal Q-Q plot lie on or very close to the line, concluding that the residuals are normally distributed.
  3. The random points of the Scale-Location plot show that the variance is constant.
  4. The Residuals vs. Leverage plot confirms that there are no outliers, since all the points are within the 0.05.
sum((abs(dfbetas(stepf)))>1)
## [1] 0

Thus, the analysis proves the assumptions and meets the requirements.

Conclusion

Manual transmission gives a slight boost to MPG. However, the variables weight, horsepower and number of cylinders are more significant when determining the MPG.

Appendix

Fig. 1

  boxplot(mpg ~ am, 
          xlab="Transmission Type (0 = Automatic, 1 = Manual)", 
          ylab="MPG",
          main="MPG by Transmission Type")

Fig. 2

par(mfrow = c(2, 2))
plot(stepf)