Is automatic or manual transmission better for mpg ?. In this report, it’s performed an analysis of mtcars data set. This report explores the relationship between a set of variables versus miles per gallon (MPG). The data was extracted from the 1974 Motor Trend US magazine. This report includes fuel consumption and 10 features of design and performance for 32 automobiles (1973-74 models). It is made regression models and exploratory data analyses to mainly explore how automatic (am = 0) and manual (am = 1) transmissions affect the MPG performance. The t-test shows significant differences between cars with automatic and manual transmission. It is about 7 MPG more effciency for cars with manual transmission than those with automatic transmission. Linear regression models allow the selection of the one with highest Adjusted R-squared value; so, given that weight and 1/4 mile time constant, manual transmitted cars are 14.079 + (-4.141)* weight more MPG (miles per gallon) on average more efficient than automatic cars. Cars that are lighter in weight with a manual transmission and cars that are heavier in weight with an automatic transmission will have higher MPG values. So, we may conclude that lighter cars are better off with a manual transmission, but heavier cars are better off with an automatic one.
First, load the data set mtcars and change some variables from numeric class to factor class.
library(ggplot2)
data(mtcars)
mtcars[1:3, ] # Sample Data
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
dim(mtcars)
## [1] 32 11
mtcars$cyl <- as.factor(mtcars$cyl)
mtcars$vs <- as.factor(mtcars$vs)
mtcars$am <- factor(mtcars$am)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
attach(mtcars)
## The following object is masked from package:ggplot2:
##
## mpg
Please see the Appendix: Figures section for the plots. According to the box plot, we see that manual transmission yields higher values of MPG in general. Based on the pair graph, it can be seen some higher correlations between variables like “wt”, “disp”, “cyl” and “hp”.
The null hypothesis considering the MPG assumes no diffrences between automatic and manual transmissions, i.e. they belong to the same population (assuming the MPG has a normal distribution). We use the two sample T-test to show it.
result <- t.test(mpg ~ am)
result$p.value
## [1] 0.001373638
result$estimate
## mean in group 0 mean in group 1
## 17.14737 24.39231
Since the p-value is 0.00137 (<0.05), we reject the null hypothesis with 95% of confidence. So, the automatic and manual transmissions are from different populations or are different in terms of MPG. The mean for MPG of MANUAL transmitted cars is about 7 MORE than that of AUTOMATIC cars.
First, we fit the full model as the following.
fullModel <- lm(mpg ~ ., data=mtcars)
summary(fullModel) # results hidden
The regression model has the Residual standard error as 2.833 with 15 degrees of freedom. The Adjusted R-squared value is 0.779, which means that the regression model can explain about 78% of the variance of the MPG variable. However, none of the coefficients are significant at 5% significant level.
Then, we use backward selection to select some statistically significant variables.
stepModel <- step(fullModel, k=log(nrow(mtcars)))
summary(stepModel) # results hidden
The regression model considered is mpg ~ wt + qsec + am. It has a Residual standard error as 2.459 with 28 degrees of freedom. The Adjusted R-squared value is 0.8336, which means that the model can explain about 83% of the variance of the MPG variable. All of the coefficients are significant at 0.05 significant level (See Appendix: Figures for the plots).
According to the scatter plot, there is an interaction between “wt” variable and “am” variable, since automatic cars tend to heavier than manual cars. Thus, we have the following model including the interaction term:
amIntWtModel<-lm(mpg ~ wt + qsec + am + wt:am, data=mtcars)
summary(amIntWtModel) # results hidden
This regression model has a Residual standard error of 2.084 with 27 degrees of freedom. The Adjusted R-squared value is 0.8804, which means that the model can explain about 88% of the variance of the MPG variable. All of the coefficients are significant at 0.05 significant level.
It is made a simple model of MPG as the outcome variable, and Transmission as the predictor variable.
amModel<-lm(mpg ~ am, data=mtcars)
summary(amModel) # results hidden
It shows that on average, a car has 17.147 mpg with automatic transmission, and if it is manual transmission, 7.245 mpg is increased. This model has the Residual standard error as 4.902 with 30 degrees of freedom. The Adjusted R-squared value is 0.3385, which means that the model can explain only about 34% of the total variance for the MPG variable. The low Adjusted R-squared value indicates the need to add other variables to the model in order to improve the predictability.
The final regression model.
anova(amModel, stepModel, fullModel, amIntWtModel)
confint(amIntWtModel) # results hidden
The model with the highest Adjusted R-squared value is: “mpg ~ wt + qsec + am + wt:am”.
summary(amIntWtModel)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.723053 5.8990407 1.648243 0.1108925394
## wt -2.936531 0.6660253 -4.409038 0.0001488947
## qsec 1.016974 0.2520152 4.035366 0.0004030165
## am1 14.079428 3.4352512 4.098515 0.0003408693
## wt:am1 -4.141376 1.1968119 -3.460340 0.0018085763
The results show that when “wt” (weight lb/1000) and “qsec” (1/4 mile time) remain constant, cars with manual transmission add 14.079 + (-4.141) * wt - more MPG (miles per gallon) on average, compared automatica cars. A manual transmitted car that weighs 2000 lbs have 5.797 more MPG than an automatic transmitted car that has both the same weight and 1/4 mile time.
According to the residual plots, we can verify the following underlying assumptions:
1. The Residuals vs. Fitted plot shows no consistent pattern, supporting the independence assumption.
2. The Normal Q-Q plot indicates that the residuals are normally distributed because the points lie closely to the line.
3. The Scale-Location plot confirms the constant variance assumption, as the points are randomly distributed.
4. The Residuals vs. Leverage show no outliers, as all values fall well within the 0.5 bands.
This measure how much an observation has effected the estimate of a regression coefficient.
sum((abs(dfbetas(amIntWtModel)))>1)
## [1] 0
The previous analyses meet all basic assumptions of linear regression.
boxplot(mpg ~ am, xlab="Transmission (0 = Automatic, 1 = Manual)", ylab="MPG",
main="Boxplot of MPG vs. Transmission")
pairs(mtcars, panel=panel.smooth, main="Pair Graph of Motor Trend Car Road Tests")
ggplot(mtcars, aes(x=wt, y=mpg, group=am, color=am, height=3, width=3)) + geom_point() +
scale_colour_discrete(labels=c("Automatic", "Manual")) +
xlab("weight") + ggtitle("Scatter Plot of MPG vs. Weight by Transmission")
par(mfrow = c(2, 2))
plot(amIntWtModel)