This work analyses the “mtcars” dataset to mainly answer the following questions
Is an automatic or manual transmission better for MPG?
Quantify the MPG difference between automatic and manual transmissions?
The mtcars dataset contains details of 32 car models based on 11 variables.
Priliminary Data Analysis indicates that cars with manual transmission show much better mpg than cars with automatic transmission. The difference between the average mpgs of cars with manual and automatic transmission is of 7.245mpg and the cars with manual transmission are 42.25% more efficient than cars with automatic transmission. Further analysis shows that there are other variables, such as the weight of the car, which heavily affect the fuel efficiency.
Here, we first import the dataset.
data(mtcars)
View the first few rows to see what the data looks like and get a general feel about the data.
head(mtcars)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
## Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1
Here it can be observed that the rows represent different models of cars. The columns represent different variables.
summary(mtcars)
## mpg cyl disp hp
## Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
## 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
## Median :19.20 Median :6.000 Median :196.3 Median :123.0
## Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
## 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
## Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
## drat wt qsec vs
## Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
## 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
## Median :3.695 Median :3.325 Median :17.71 Median :0.0000
## Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
## 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
## Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
## am gear carb
## Min. :0.0000 Min. :3.000 Min. :1.000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
## Median :0.0000 Median :4.000 Median :2.000
## Mean :0.4062 Mean :3.688 Mean :2.812
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :1.0000 Max. :5.000 Max. :8.000
Here we check the direct effect of transmission type on average mpg.
Avg_MPG <- aggregate(mpg ~ factor(mtcars$am,labels=c('Automatic','Manual')),
data = mtcars, mean)
Avg_MPG
## factor(mtcars$am, labels = c("Automatic", "Manual")) mpg
## 1 Automatic 17.14737
## 2 Manual 24.39231
Calculating the difference between the two mpg values and finding the percentage increase
Overall_Increase <- Avg_MPG[2, 2] - Avg_MPG[1, 2]
print(Overall_Increase)
## [1] 7.244939
Percentage_Increase <- ((Avg_MPG[2, 2] - Avg_MPG[1, 2])/Avg_MPG[1, 2])*100
Percentage_Increase
## [1] 42.25103
We can see here that the transmission type directly affects the fuel efficiency and the cars with manual transmission are much more efficient than cars with automatic transmission
From the above analysis, we can hypothesize that the mpg is affected adversely when the transmission type is automatic. This can be further explored by performing a linear regression test on our data.
summary(lm(mpg ~ am, data = mtcars))
##
## Call:
## lm(formula = mpg ~ am, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3923 -3.0923 -0.2974 3.2439 9.5077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147 1.125 15.247 1.13e-15 ***
## am 7.245 1.764 4.106 0.000285 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.902 on 30 degrees of freedom
## Multiple R-squared: 0.3598, Adjusted R-squared: 0.3385
## F-statistic: 16.86 on 1 and 30 DF, p-value: 0.000285
It can be observed here that the R-Squared value is approx 36%. This means that t̥here are factors other than mtcars$am affecting the variance in mpg.
Doing a multivariate analysis and checking the fit with all the variables in the dataset.
summary(lm(mpg~., data = mtcars))$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337416 18.71788443 0.6573058 0.51812440
## cyl -0.11144048 1.04502336 -0.1066392 0.91608738
## disp 0.01333524 0.01785750 0.7467585 0.46348865
## hp -0.02148212 0.02176858 -0.9868407 0.33495531
## drat 0.78711097 1.63537307 0.4813036 0.63527790
## wt -3.71530393 1.89441430 -1.9611887 0.06325215
## qsec 0.82104075 0.73084480 1.1234133 0.27394127
## vs 0.31776281 2.10450861 0.1509915 0.88142347
## am 2.52022689 2.05665055 1.2254035 0.23398971
## gear 0.65541302 1.49325996 0.4389142 0.66520643
## carb -0.19941925 0.82875250 -0.2406258 0.81217871
It can be seen here that there are variables which heavily affect the mpg other than am. For example, the variable wt has a severe negetive effect on mpg given the other variables are constant. This means the fuel efficiency would reduce with increase in weight.
Let us perform the test again using the three most effective variables which are am, wt and qsec.
summary(lm(mpg ~ am + wt + qsec, data = mtcars))
##
## Call:
## lm(formula = mpg ~ am + wt + qsec, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4811 -1.5555 -0.7257 1.4110 4.6610
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9.6178 6.9596 1.382 0.177915
## am 2.9358 1.4109 2.081 0.046716 *
## wt -3.9165 0.7112 -5.507 6.95e-06 ***
## qsec 1.2259 0.2887 4.247 0.000216 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.459 on 28 degrees of freedom
## Multiple R-squared: 0.8497, Adjusted R-squared: 0.8336
## F-statistic: 52.75 on 3 and 28 DF, p-value: 1.21e-11
It can be seen that the R-Squared value is almost 85%.
From the above analysis, it can be inferred that the transmission type affects mpg negetively. Although it is one of the main causes, there are other variables which affect the mpg such as the weight of the car and the 1/4 mile time.
It is fairly straightforward to see that increase in weight would cause more stress on the engine and thus reduce the milage. But what is interesting is that slower 1/4 mile times mean higher mpg. This may or may not be attributed to the power (hp) of the car or the torque (which is not included in the data) that is being produced but the analysis of qsec with other variables is out of the scope of this project.
Figure 2 in Appendix shows the residual analysis. It can be seen that the points are randomely scattered on the first plot thus indicating the variables are independant. It can be seen that the standardized residuals are between [-2, 2] and the cook’s distance is less than 1. This indicates that the model is a good fit assuming the normality of the residuals. The normality can be checked from second figure as most of the points fall on the line.
Cars with manual transmission are 42% more fuel efficient that cars with automatic transmission.
boxplot(mtcars$mpg ~ factor(mtcars$am,labels=c('Automatic','Manual')),
ylab = "Miles Per Gallon",
xlab = "Transmission Type")
residual_analysis <- lm(mpg ~ wt + am + qsec, data = mtcars)
par(mfrow = c(2, 2))
plot(residual_analysis, 1:4)