Alexander Kuznetsov
06/19/2018
In this project we are going to explore how fuel efficiency of the car is affected by transmission type. Dataset of more than thirty passenger vehicles with 11 characteristics holds records of 19 cars with automatic transmission and 13 cars with manual transmission. First, we are going to explore the dataset and prove that transmission type does impact fuel consumption. Differences in fuel consumption measured in miles per gallon (MPG) are statistically different for cars with automatic and manual transmission. On average, vehicles with automatic transmission consume more fuel having lower MPG than cars with manual transmission. Next, we are to determine relationships between MPG and other car characteristics such as horse power, weigth, number of cylinders, etc. Many of these characteristics are correlated and can be excluded from the regression model.
Commands such as dim, head, summary, str are used to gain high level overview of the dataset. Calls to these commands are shown in Appendix 1.
library(knitr)
Warning: package 'knitr' was built under R version 3.4.4
opts_chunk$set(tidy.opts=list(width.cutoff=75),tidy=TRUE)
mtcars dataset has 32 rows and 11 columns with numeric variables. Transmission type data are stored in column am with 0s corresponding to automatic transmission and 1s corresponding to manual transmission. A call to table function on am column provides a breakdown for vehicles with each type of transmission:
table(mtcars$am)
0 1
19 13
To conclude exploratory analysis, MPG data with respect to the transmission type are visualized using boxplot function. Results are presented in Appendix 1.
Cars with automatic transmission clearly have lower MPG than ones with manual transmission. Student’s t-test can be used to determine if these differences are statistically significant. We assume that transmission type does not have any effect on MPG as null hypothesis. Alternative hypothesis suggests that the menas for two subsets are different.
t.test(mtcars$mpg ~ mtcars$am)
Welch Two Sample t-test
data: mtcars$mpg by mtcars$am
t = -3.7671, df = 18.332, p-value = 0.001374
alternative hypothesis: true difference in means is not equal to 0
95 percent confidence interval:
-11.280194 -3.209684
sample estimates:
mean in group 0 mean in group 1
17.14737 24.39231
Output of t.test indicates that t-statistic is quite significant with p-value much less than 0.05. Therefore, null hypothesis can be rejected suggesting that MPG values among vehicles with manual and automatic transmission are statistically different. Average MPGs for cars with automatic and manual transmissions were found to be around 17 and 24 respectively.
Regression model would help better understand the relationship between MPG and type of transmission as some, if not most variance in fuel consumption can be explained with other characteristics such as vehicle weight or engine power. At the same time, many variables recorded in the mtcars dataset are highly correlated. For example, car with more cylinders is also expected to have more powerful engine and have higher acceleration and larger displacement. It is also reasonable to assume that heavier vehicles would require larger and more powerful engines. Therefore, we need to determine independent variables that impact MPG and understand how much of this impact can be explained by transmission type. Next, I will use two approaches to build regression models that explain impact of transmission type on MPG. First, step function is to be used to identify the best regression model. In second approach nested models will be used to find appropriate linear model.
mtcars$am <- as.factor(mtcars$am)
model <- step(lm(mpg ~ ., data = mtcars), trace = 0)
summary(model)
Call:
lm(formula = mpg ~ wt + qsec + am, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-3.4811 -1.5555 -0.7257 1.4110 4.6610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.6178 6.9596 1.382 0.177915
wt -3.9165 0.7112 -5.507 6.95e-06 ***
qsec 1.2259 0.2887 4.247 0.000216 ***
am1 2.9358 1.4109 2.081 0.046716 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.459 on 28 degrees of freedom
Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
According to the summary, weight, acceleration and transmission type (wt, qsec and am) best describe MPG. Coefficient am1 corresponding to the manual transmission type, indicates that on average MPG increases by 2.9 for vehicles with manual transmission in comparison to vehicles with automatic transmission. As expected, MPG decreases with the weight of the vehicle (-3.9 per 1000 lbs) and increases as qsec increases (time to travel 1/4 mile). Analysis of residuals is shown in Appendix 2-1. Distribution of residuals is close to normal (Normal Q-Q plot) with no patterns (Residuals vs Fitted) or significant leverage points.
Because of space limitations for this project report, code for this section is outlined in Appendix 2-1. The appoach in this section is to build nested models starting with transmission variable (am) and adding variables one by one. After all 10 models are input, ANOVA is used to identify variables which have most significant impact on MPG. These variables are cyl, hp and wt in addition to originally selected am. In next step ANCOVA is used to eliminate dependent variables by building models with interaction terms. It can be shown that hp has significant interaction terms with cyl and wt. Thus, final model in this approach includes only two variables hp and am. This simplified model has residuals with distribution close to normal (Appendix 2-2). Residuals do not appear to have any patterns in their distribution. The model suggests that MPG increases by 5.3 miles per gallon for cars with manual transmission in comparison to cars with automatic transmission. As expected, MPG decreases with incresing engine power measured in horse power.
F-test for both approaches provides similar F-statistic indicating that both models are similarly better than corresponding intercept only models. Low p-values for F-test suggest that null hypothesises in both approaches are to be rejected. \(R^2\) though is higher in approach 1 where step function was used. \(R^2\) for approach 2 can be improved by adding additional regressor. However, it can be separate project on ways to determine additional independent variable to include in the model.
Differences in fuel efficiency are statistically significant for vehicles with automatic and manual transmission. On average manual transmission cars can go additional 7 miles per gallon than cars with automatic transmission. However, taking into account other variables, it can be shown that transmission type accounts for 2.9 to 5.3 difference in MPG depending on model selected to fit the data.
dim(mtcars)
[1] 32 11
head(mtcars)
summary(mtcars)
str(mtcars)
boxplot(mpg ~ am, data = mtcars, names = c("Automatic", "Manual"), xlab = "Transmission",
ylab = "Miles per Gallon", main = "MPG by Transmission Type")
par(mfrow = c(2, 2))
plot(model)
Building nested models and performing ANOVA:
mtcars$am <- as.factor(mtcars$am)
fit1 <- lm(mpg ~ am, mtcars)
fit2 <- lm(mpg ~ am + cyl, mtcars)
fit3 <- lm(mpg ~ am + cyl + disp, mtcars)
fit4 <- lm(mpg ~ am + cyl + disp + hp, mtcars)
fit5 <- lm(mpg ~ am + cyl + disp + hp + drat, mtcars)
fit6 <- lm(mpg ~ am + cyl + disp + hp + drat + wt, mtcars)
fit7 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec, mtcars)
fit8 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs, mtcars)
fit9 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear, mtcars)
fit10 <- lm(mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb,
mtcars)
anova(fit1, fit2, fit3, fit4, fit5, fit6, fit7, fit8, fit9, fit10)
Analysis of Variance Table
Model 1: mpg ~ am
Model 2: mpg ~ am + cyl
Model 3: mpg ~ am + cyl + disp
Model 4: mpg ~ am + cyl + disp + hp
Model 5: mpg ~ am + cyl + disp + hp + drat
Model 6: mpg ~ am + cyl + disp + hp + drat + wt
Model 7: mpg ~ am + cyl + disp + hp + drat + wt + qsec
Model 8: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs
Model 9: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear
Model 10: mpg ~ am + cyl + disp + hp + drat + wt + qsec + vs + gear + carb
Res.Df RSS Df Sum of Sq F Pr(>F)
1 30 720.90
2 29 271.36 1 449.53 64.0039 8.231e-08 ***
3 28 252.08 1 19.28 2.7452 0.11241
4 27 216.37 1 35.71 5.0849 0.03493 *
5 26 214.50 1 1.87 0.2663 0.61121
6 25 162.43 1 52.06 7.4127 0.01275 *
7 24 149.09 1 13.34 1.8999 0.18260
8 23 148.87 1 0.22 0.0309 0.86214
9 22 147.90 1 0.97 0.1384 0.71365
10 21 147.49 1 0.41 0.0579 0.81218
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Model which includes only regressors with significant impact on mpg:
model1 <- lm(mpg ~ am + cyl + hp + wt, mtcars)
Studying interaction between hp and cyl (performing ANCOVA):
model2 <- lm(mpg ~ am + cyl + hp + wt + cyl * hp, mtcars)
anova(model1, model2)
Analysis of Variance Table
Model 1: mpg ~ am + cyl + hp + wt
Model 2: mpg ~ am + cyl + hp + wt + cyl * hp
Res.Df RSS Df Sum of Sq F Pr(>F)
1 27 170.00
2 26 130.44 1 39.558 7.885 0.00933 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Interaction term \(cyl*hp\) has significant impact on model (based on F-statistic and corresponding p-value), therefore, engine power and number of cylinders are correlated. Simplified model can include only hp. Similarly, it can be shown that wt and hp are correlated. Final model is to include only engine power hp and transmission type am.
final.model <- lm(mpg ~ am + hp, mtcars)
summary(final.model)
Call:
lm(formula = mpg ~ am + hp, data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-4.3843 -2.2642 0.1366 1.6968 5.8657
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 26.584914 1.425094 18.655 < 2e-16 ***
am1 5.277085 1.079541 4.888 3.46e-05 ***
hp -0.058888 0.007857 -7.495 2.92e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.909 on 29 degrees of freedom
Multiple R-squared: 0.782, Adjusted R-squared: 0.767
F-statistic: 52.02 on 2 and 29 DF, p-value: 2.55e-10
par(mfrow = c(2, 2))
plot(final.model)