Effect of Transmission Type on MPG

Alexander Kuznetsov

06/19/2018

Executive Summary

In this project we are going to explore how fuel efficiency of the car is affected by transmission type. Dataset of more than thirty passenger vehicles with 11 characteristics holds records of 19 cars with automatic transmission and 13 cars with manual transmission. First, we are going to explore the dataset and prove that transmission type does impact fuel consumption. Differences in fuel consumption measured in miles per gallon (MPG) are statistically different for cars with automatic and manual transmission. On average, vehicles with automatic transmission consume more fuel having lower MPG than cars with manual transmission. Next, we are to determine relationships between MPG and other car characteristics such as horse power, weigth, number of cylinders, etc. Many of these characteristics are correlated and can be excluded from the regression model.

Exploratory Data Analysis

Commands such as dim, head, summary, str are used to gain high level overview of the dataset. Calls to these commands are shown in Appendix 1.

library(knitr)
Warning: package 'knitr' was built under R version 3.4.4
opts_chunk$set(tidy.opts=list(width.cutoff=75),tidy=TRUE)

mtcars dataset has 32 rows and 11 columns with numeric variables. Transmission type data are stored in column am with 0s corresponding to automatic transmission and 1s corresponding to manual transmission. A call to table function on am column provides a breakdown for vehicles with each type of transmission:

table(mtcars$am)

 0  1 
19 13 

To conclude exploratory analysis, MPG data with respect to the transmission type are visualized using boxplot function. Results are presented in Appendix 1.

Statistical Inference

Cars with automatic transmission clearly have lower MPG than ones with manual transmission. Student’s t-test can be used to determine if these differences are statistically significant. We assume that transmission type does not have any effect on MPG as null hypothesis. Alternative hypothesis suggests that the menas for two subsets are different.

t.test(mtcars$mpg ~ mtcars$am)

    Welch Two Sample t-test

data:  mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
 -11.280194  -3.209684
sample estimates:
mean in group 0 mean in group 1 
       17.14737        24.39231 

Output of t.test indicates that t-statistic is quite significant with p-value much less than 0.05. Therefore, null hypothesis can be rejected suggesting that MPG values among vehicles with manual and automatic transmission are statistically different. Average MPGs for cars with automatic and manual transmissions were found to be around 17 and 24 respectively.

Regression Analysis

Regression model would help better understand the relationship between MPG and type of transmission as some, if not most variance in fuel consumption can be explained with other characteristics such as vehicle weight or engine power. At the same time, many variables recorded in the mtcars dataset are highly correlated. For example, car with more cylinders is also expected to have more powerful engine and have higher acceleration and larger displacement. It is also reasonable to assume that heavier vehicles would require larger and more powerful engines. Therefore, we need to determine independent variables that impact MPG and understand how much of this impact can be explained by transmission type. Next, I will use two approaches to build regression models that explain impact of transmission type on MPG. First, step function is to be used to identify the best regression model. In second approach nested models will be used to find appropriate linear model.

Approach 1: step Function

mtcars$am <- as.factor(mtcars$am)
model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
summary(model)

Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4811 -1.5555 -0.7257  1.4110  4.6610 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   9.6178     6.9596   1.382 0.177915    
wt           -3.9165     0.7112  -5.507 6.95e-06 ***
qsec          1.2259     0.2887   4.247 0.000216 ***
am1           2.9358     1.4109   2.081 0.046716 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared:  0.8497,    Adjusted R-squared:  0.8336 
F-statistic: 52.75 on 3 and 28 DF,  p-value: 1.21e-11

According to the summary, weight, acceleration and transmission type (wt, qsec and am) best describe MPG. Coefficient am1 corresponding to the manual transmission type, indicates that on average MPG increases by 2.9 for vehicles with manual transmission in comparison to vehicles with automatic transmission. As expected, MPG decreases with the weight of the vehicle (-3.9 per 1000 lbs) and increases as qsec increases (time to travel 1/4 mile). Analysis of residuals is shown in Appendix 2-1. Distribution of residuals is close to normal (Normal Q-Q plot) with no patterns (Residuals vs Fitted) or significant leverage points.

Approach 2: Nested Models

Because of space limitations for this project report, code for this section is outlined in Appendix 2-1. The appoach in this section is to build nested models starting with transmission variable (am) and adding variables one by one. After all 10 models are input, ANOVA is used to identify variables which have most significant impact on MPG. These variables are cyl, hp and wt in addition to originally selected am. In next step ANCOVA is used to eliminate dependent variables by building models with interaction terms. It can be shown that hp has significant interaction terms with cyl and wt. Thus, final model in this approach includes only two variables hp and am. This simplified model has residuals with distribution close to normal (Appendix 2-2). Residuals do not appear to have any patterns in their distribution. The model suggests that MPG increases by 5.3 miles per gallon for cars with manual transmission in comparison to cars with automatic transmission. As expected, MPG decreases with incresing engine power measured in horse power.

Approach 1 vs Approach 2

F-test for both approaches provides similar F-statistic indicating that both models are similarly better than corresponding intercept only models. Low p-values for F-test suggest that null hypothesises in both approaches are to be rejected. \(R^2\) though is higher in approach 1 where step function was used. \(R^2\) for approach 2 can be improved by adding additional regressor. However, it can be separate project on ways to determine additional independent variable to include in the model.

Conclusions

Differences in fuel efficiency are statistically significant for vehicles with automatic and manual transmission. On average manual transmission cars can go additional 7 miles per gallon than cars with automatic transmission. However, taking into account other variables, it can be shown that transmission type accounts for 2.9 to 5.3 difference in MPG depending on model selected to fit the data.

Appendix 1: Exploratory Data Analysis

dim(mtcars)
[1] 32 11
head(mtcars)
summary(mtcars)
str(mtcars)
boxplot(mpg ~ am, data = mtcars, names = c("Automatic", "Manual"), xlab = "Transmission", 
    ylab = "Miles per Gallon", main = "MPG by Transmission Type")

Appendix 2-1: Regression Analysis. Approach 1

par(mfrow = c(2, 2))
plot(model)

Appendix 2-2: Regression Analysis. Approach 2

Building nested models and performing ANOVA:

mtcars$am <- as.factor(mtcars$am)
fit1 <- lm(mpg ~ am, mtcars)
fit2 <- lm(mpg ~ am + cyl, mtcars)
fit3 <- lm(mpg ~ am + cyl + disp, mtcars)
fit4 <- lm(mpg ~ am + cyl + disp + hp, mtcars)
fit5 <- lm(mpg ~ am + cyl + disp + hp + drat, mtcars)
fit6 <- lm(mpg ~ am + cyl + disp + hp + drat + wt, mtcars)
fit7 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec, mtcars)
fit8 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs, mtcars)
fit9 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear, mtcars)
fit10 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb, 
    mtcars)
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8, fit9, fit10)
Analysis of Variance Table

Model  1: mpg ~ am
Model  2: mpg ~ am + cyl
Model  3: mpg ~ am + cyl + disp
Model  4: mpg ~ am + cyl + disp + hp
Model  5: mpg ~ am + cyl + disp + hp + drat
Model  6: mpg ~ am + cyl + disp + hp + drat + wt
Model  7: mpg ~ am + cyl + disp + hp + drat + wt + qsec
Model  8: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs
Model  9: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear
Model 10: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb
   Res.Df    RSS Df Sum of Sq       F    Pr(>F)    
1      30 720.90                                   
2      29 271.36  1    449.53 64.0039 8.231e-08 ***
3      28 252.08  1     19.28  2.7452   0.11241    
4      27 216.37  1     35.71  5.0849   0.03493 *  
5      26 214.50  1      1.87  0.2663   0.61121    
6      25 162.43  1     52.06  7.4127   0.01275 *  
7      24 149.09  1     13.34  1.8999   0.18260    
8      23 148.87  1      0.22  0.0309   0.86214    
9      22 147.90  1      0.97  0.1384   0.71365    
10     21 147.49  1      0.41  0.0579   0.81218    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Model which includes only regressors with significant impact on mpg:

model1 <- lm(mpg ~ am + cyl + hp + wt, mtcars)

Studying interaction between hp and cyl (performing ANCOVA):

model2 <- lm(mpg ~ am + cyl + hp + wt + cyl * hp, mtcars)
anova(model1, model2)
Analysis of Variance Table

Model 1: mpg ~ am + cyl + hp + wt
Model 2: mpg ~ am + cyl + hp + wt + cyl * hp
  Res.Df    RSS Df Sum of Sq     F  Pr(>F)   
1     27 170.00                              
2     26 130.44  1    39.558 7.885 0.00933 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Interaction term \(cyl*hp\) has significant impact on model (based on F-statistic and corresponding p-value), therefore, engine power and number of cylinders are correlated. Simplified model can include only hp. Similarly, it can be shown that wt and hp are correlated. Final model is to include only engine power hp and transmission type am.

final.model <- lm(mpg ~ am + hp, mtcars)
summary(final.model)

Call:
lm(formula = mpg ~ am + hp, data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.3843 -2.2642  0.1366  1.6968  5.8657 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 26.584914   1.425094  18.655  < 2e-16 ***
am1          5.277085   1.079541   4.888 3.46e-05 ***
hp          -0.058888   0.007857  -7.495 2.92e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.909 on 29 degrees of freedom
Multiple R-squared:  0.782, Adjusted R-squared:  0.767 
F-statistic: 52.02 on 2 and 29 DF,  p-value: 2.55e-10
par(mfrow = c(2, 2))
plot(final.model)