The report analyses the data of a collection of cars and explores the results between a set of variables and miles per gallon (MPG). This report analyses this data for Motor Trend, a magazine about the automobile industry.
The report answers two specific questions
1. Is an automatic or manual transmission better for MPG?
2. Quantify the MPG difference between automatic and manual transmissions?
The report qualitatively proves that manual transmission is better for mpg as well as quanititatively proves it using linear regression.
Loading the necessary libraries that will be required for the analysis.
## Loading required package: carData
Reading the data from the mtcars dataset. The first few lines can be seen using head(mtcars) but has not been shown because of space constraints.
data("mtcars")
In the above data "am’ column gives the Transmission Type where 0 = automatic and 1 = manual. This can be seen in the mtcars dataset helpfile by ?mtcars.
First we do a box plot to check the mpg for the two transmission types (0 = Automatic and 1 = Manual). The box plot can be seen in Appendix Plot 1.
We can see, from the box plot, that mpg for Transmission Type 0 = Manual shows higher mpg. The median for am = 1 (Manual) is 24 approximately whereas the mpg for am = 0 (Automatic) is 17.5 approximately. But we will do a linear regression to conclude this quantitatively.
First we do a simple linear regression of mpg with am as a factor predictor.
fit <- lm(mpg ~ factor(am), mtcars)
summary(fit)$coef
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## factor(am)1 7.244939 1.764422 4.106127 2.850207e-04
The coefficient summary shows us that the average mpg for an Automatic Transmission (am = 0) is 17.1473684 and the average mpg for a Manual Transmission (am = 1) is 24.3923077. We can also see that the variable “am” is significant because the Pr(>|t|) value is less than 0.05. This model has an Rsquare of 0.3598 which means that it explains 35.98% of the variation.
We can also do a t-test to see if the values obtained are significant.
t.test(mtcars[mtcars$am == "0",]$mpg, mtcars[mtcars$am == "1",]$mpg)$estimate
## mean of x mean of y
## 17.14737 24.39231
t.test(mtcars[mtcars$am == "0",]$mpg, mtcars[mtcars$am == "1",]$mpg)$p.value
## [1] 0.001373638
The t-test proves that the variables are significant and gives the mean mpg of automatic as 17.15 and of manual as 24.39.
fitall <- lm(mpg ~ factor(cyl) + disp + hp + drat + wt + qsec + factor(vs) + factor(am) + factor(gear) + factor(carb), mtcars)
The summary of the linear regression with all variables can be seen in Appendix C.
The ANOVA table and VIF table in Appendix A were constructed to see if all the variables are significant. We can clearly see from the Pr(>F) than most of the variables are not significant and we can see from the VIF table that some of the variables have high VIFs. By trying out different models, I settled with a model with added variables.
fitmany <- lm(mpg ~ factor(am) + factor(cyl) + wt, mtcars)
This model has an R-square of 0.8375 which means it explains 83.75% of the variation. All the variables fit are significant as they have a Pr(>|t|) value less than 0.05. Therefore, we can conclude that this model is significant. In this model the automatic (am = 0) has a mgp which is 0.15 lower than the manual (am = 1). The summary of the linear regression can be seen in Appendix C.
We do a residual plot of the first model. The residual plot can be seen in Appendix B. The residual plots show that the residuals are evenly distributed. The Normal QQ plot shows that the residuals are fairly distributed as they lie more or less on the diagonal line. The residual plots of the other two regression models are shown in APPENDIX B
We can conclude that manual transmission is better for MPG. We proved this using visual inspection from a box plot as well as by Linear Regression. The average MPG of a manual transmission is approximately 7 higher than the automatic transmission as per the simple linear regression using 1 predictor. The average MPG of a manual transmission is approximately 0.15 higher than the automatic transmission as per the simple linear regression using 1 predictor.
boxplot(mpg ~ am, data = mtcars, col = "green", main = ("Comparing mpg in 0 = Automatic and 1 = Manual"), xlab = "Transmission Type", ylab = "mpg")
ANOVA TABLE AND VIF TABLE of regression with all variables.
anova(fitall)
## Analysis of Variance Table
##
## Response: mpg
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(cyl) 2 824.78 412.39 51.3766 1.943e-07 ***
## disp 1 57.64 57.64 7.1813 0.01714 *
## hp 1 18.50 18.50 2.3050 0.14975
## drat 1 11.91 11.91 1.4843 0.24191
## wt 1 55.79 55.79 6.9500 0.01870 *
## qsec 1 1.52 1.52 0.1899 0.66918
## factor(vs) 1 0.30 0.30 0.0376 0.84878
## factor(am) 1 16.57 16.57 2.0639 0.17135
## factor(gear) 2 5.02 2.51 0.3128 0.73606
## factor(carb) 5 13.60 2.72 0.3388 0.88144
## Residuals 15 120.40 8.03
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif(fitall)
## GVIF Df GVIF^(1/(2*Df))
## factor(cyl) 128.120962 2 3.364380
## disp 60.365687 1 7.769536
## hp 28.219577 1 5.312210
## drat 6.809663 1 2.609533
## wt 23.830830 1 4.881683
## qsec 10.790189 1 3.284842
## factor(vs) 8.088166 1 2.843970
## factor(am) 9.930495 1 3.151269
## factor(gear) 50.852311 2 2.670408
## factor(carb) 503.211851 5 1.862838
ANOVA TABLE AND VIF TABLE of final regression model.
anova(fitmany)
## Analysis of Variance Table
##
## Response: mpg
## Df Sum Sq Mean Sq F value Pr(>F)
## factor(am) 1 405.15 405.15 59.787 2.576e-08 ***
## factor(cyl) 2 456.40 228.20 33.675 4.618e-08 ***
## wt 1 81.53 81.53 12.031 0.001771 **
## Residuals 27 182.97 6.78
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
vif(fitmany)
## GVIF Df GVIF^(1/(2*Df))
## factor(am) 1.925620 1 1.387667
## factor(cyl) 2.585745 2 1.268079
## wt 3.611208 1 1.900318
Residual Plot of model with “am” variable only
par(mfrow = c(2,2))
plot(fit)
Residual Plot of model with all variables
par(mfrow = c(2,2))
plot(fitall)
## Warning: not plotting observations with leverage one:
## 30, 31
## Warning: not plotting observations with leverage one:
## 30, 31
Residual Plot of final model with many variables.
par(mfrow = c(2,2))
plot(fitmany)
Appendix C: Summary of the fitted linear regressions. Linear Regression 1
summary(fit)
##
## Call:
## lm(formula = mpg ~ factor(am), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## factor(am)1 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
summary(fitall)
##
## Call:
## lm(formula = mpg ~ factor(cyl) + disp + hp + drat + wt + qsec +
## factor(vs) + factor(am) + factor(gear) + factor(carb), data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.5087 -1.3584 -0.0948 0.7745 4.6251
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.87913 20.06582 1.190 0.2525
## factor(cyl)6 -2.64870 3.04089 -0.871 0.3975
## factor(cyl)8 -0.33616 7.15954 -0.047 0.9632
## disp 0.03555 0.03190 1.114 0.2827
## hp -0.07051 0.03943 -1.788 0.0939 .
## drat 1.18283 2.48348 0.476 0.6407
## wt -4.52978 2.53875 -1.784 0.0946 .
## qsec 0.36784 0.93540 0.393 0.6997
## factor(vs)1 1.93085 2.87126 0.672 0.5115
## factor(am)1 1.21212 3.21355 0.377 0.7113
## factor(gear)4 1.11435 3.79952 0.293 0.7733
## factor(gear)5 2.52840 3.73636 0.677 0.5089
## factor(carb)2 -0.97935 2.31797 -0.423 0.6787
## factor(carb)3 2.99964 4.29355 0.699 0.4955
## factor(carb)4 1.09142 4.44962 0.245 0.8096
## factor(carb)6 4.47757 6.38406 0.701 0.4938
## factor(carb)8 7.25041 8.36057 0.867 0.3995
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.833 on 15 degrees of freedom
## Multiple R-squared: 0.8931, Adjusted R-squared: 0.779
## F-statistic: 7.83 on 16 and 15 DF, p-value: 0.000124
summary(fitmany)
##
## Call:
## lm(formula = mpg ~ factor(am) + factor(cyl) + wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4898 -1.3116 -0.5039 1.4162 5.7758
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 33.7536 2.8135 11.997 2.5e-12 ***
## factor(am)1 0.1501 1.3002 0.115 0.90895
## factor(cyl)6 -4.2573 1.4112 -3.017 0.00551 **
## factor(cyl)8 -6.0791 1.6837 -3.611 0.00123 **
## wt -3.1496 0.9080 -3.469 0.00177 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.603 on 27 degrees of freedom
## Multiple R-squared: 0.8375, Adjusted R-squared: 0.8134
## F-statistic: 34.79 on 4 and 27 DF, p-value: 2.73e-10
=======================================================================