This report will examine the mtcars data set and explore the relationship between miles per gallon (MPG) and transmission type. Specifically, this project will examine: 1) Is an automatic or manual transmission better for MPG; and 2) Quantify the difference between automatic and manual transmissions.
Results indicate that vehicles with automatic transmissions have fuel mileage significantly lower than vehicles with manual transmissions. Regression analysis demonstrates that the MPG of a vehicle can be predicted given the weight, horse-power, number of cylinders and the transmission type. Based on the best-fit regression model it can be said that vehicles with manual transmissions are 1.8 times more fuel efficient than vehicles with automatic transmissions.
mtcars$cyl <- factor(mtcars$cyl)
mtcars$vs <- factor(mtcars$vs)
mtcars$gear <- factor(mtcars$gear)
mtcars$carb <- factor(mtcars$carb)
mtcars$am <- factor(mtcars$am, labels = c("Automatic", "Manual"))
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
## $ am : Factor w/ 2 levels "Automatic","Manual": 2 2 2 1 1 1 1 1 1 1 ...
## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
## $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
boxplot(mpg ~ am, data = mtcars, col = c("green", "orange"),
ylab = "Miles per Gallon", xlab = "Transmission Type",
main = "MPG vs. Transmission Type")
####4. Inference: Automatic vs. Manual Transmission and MPG
aggregate(mpg~am, data = mtcars, mean)
## am mpg
## 1 Automatic 17.14737
## 2 Manual 24.39231
auto <- mtcars[mtcars$am == "Automatic",]
man <- mtcars[mtcars$am == "Manual",]
t.test(auto$mpg, man$mpg)
##
## Welch Two Sample t-test
##
## data: auto$mpg and man$mpg
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean of x mean of y
## 17.14737 24.39231
For the given data set, manual transmission vehicles had a mean MPG of 24.4 and automatic transmission vehicles a mean MPG of 17.1. The difference is statistically significant.
base_model <- lm(mpg~am, data = mtcars)
summary(base_model)
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## amManual 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
Our simple “base_model” regression indicates that transmission type explains 36% of the variation in MPG. This indicates that other variables need to be accounted for using multivariate linear regression. Examination of the pairs plot supports this conclusion with correlation being evident in a number of variables (see Appendix).
initial_model <- lm(mpg~., data = mtcars)
best_model <- step(initial_model, direction = "both")
summary(best_model)
##
## Call:
## lm(formula = mpg ~ cyl + hp + wt + am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9387 -1.2560 -0.4013 1.1253 5.0513
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.70832 2.60489 12.940 7.73e-13 ***
## cyl6 -3.03134 1.40728 -2.154 0.04068 *
## cyl8 -2.16368 2.28425 -0.947 0.35225
## hp -0.03211 0.01369 -2.345 0.02693 *
## wt -2.49683 0.88559 -2.819 0.00908 **
## amManual 1.80921 1.39630 1.296 0.20646
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.41 on 26 degrees of freedom
## Multiple R-squared: 0.8659, Adjusted R-squared: 0.8401
## F-statistic: 33.57 on 5 and 26 DF, p-value: 1.506e-10
anova(base_model, best_model)
## Analysis of Variance Table
##
## Model 1: mpg ~ am
## Model 2: mpg ~ cyl + hp + wt + am
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 30 720.90
## 2 26 151.03 4 569.87 24.527 1.688e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using step-wise multivariate linear regression our best regression model explains 86% of the variation in MPG using the variables Cylinders, horse-power, weight and transmission type. Examination of the model diagnostics (see Residuals plot in the Appendix) indicates no discernible pattern in Residuals vs. Fitted plot and that the data is normally distributed (Normal Q-Q plot, Appendix) with constant variance (Scale-location plot, Appendix).Analysis of variance of our simple “base” and mulivariate “best” models indicate that the difference in variation explained by the two models is statistically significant which leads us to believe the multivariate model is valid and correctly explains more of the variation in MPG.
####1. Summary It can be stated that under the best fit model, manual transmission vehicles are 1.8 times more fuel efficient than automatic transmission vehicles.
pairs(mpg~., data = mtcars)
par(mfrow = c(2,2))
plot(best_model)