This analysis used the mtcars dataset to identify which transmission type is better for mpg. Using linear regression, I discovered that manual transmissions can give higher mpg than automatic transmissions. On 95% confidence interval, manual transmissions can produce 18.49 to 30.29 mpg compared to 14.85 to 19.44 of automatic transmissions. This study has a high need of larger data to validate its results.
mtcars dataset consists of 32 observations with 11 variables. We’re only interested in mpg or highway miles per US gallon, and am or the type of transmission which can either be automatic or manual.
One factor that might affect the findings of this analysis is the fact that we only have few records in this dataset. So, one recommendation that we could already give is to gather more data to get better approximation of the measures.
The recorded average miles per gallon of cars with manual transmission is 24.39 mpg which is higher than cars with automatic transmission with a mean mpg of 17.15. This fact is evident on figure 1, boxplot of mpg on different transmission types.
Figure 2 presents similar features of the distribution, that is, the range of mpg of cars with manual transmission is wider than automatic cars. This range is also placed closer to the right which signifies higher values.
To conduct a simple hypothesis test that checks if there is a significant difference between the average mpg of an automatic transmission versus a manual one, we do a t-test.
##
## Welch Two Sample t-test
##
## data: mpg by am
## t = -3.7671, df = 18.332, p-value = 0.001374
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -11.280194 -3.209684
## sample estimates:
## mean in group automatic mean in group manual
## 17.14737 24.39231
Having a p-value of 0.001374 that is less than 0.05, and a t-statistic that is -3.77, we can say that there is a significant difference between an automatic and manual transmission. This test actually gives us a clue about which transmission is better for mpg.
We’ll use the three regression methods, simple linear, logistic, and Poisson, to model the relationship of transmission and highway miles per gallon. These will have mpg as the outcome, and am as the predictor. In simple linear regression, we just use the actual values. In logistic regression, we make the mpg variable a dichotomous outcome by doing a condition statement that checks whether the measured mpg is higher than the mean of the mpg variable as a whole. Lastly, in Poisson regression, we simply rounded the mpg variable to its nearest whole number to be treated as a count data.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 17.147368 1.124603 15.247492 1.133983e-15
## ammanual 7.244939 1.764422 4.106127 2.850207e-04
The output above shows the summary of the linear regression model that were fitted on our data. The values of the coefficients are all significant since the calculated p-values are less than 0.05. (Intercept), our \(\beta_0\), that is equal to 17.15 is interpreted as the value of mpg when the transmission is automatic. Surprisingly, it is the average mpg that we got earlier. ammanual, our \(\beta_1\), is interpreted as the increase of mpg on top of our intercept when the transmission is manual, that is, \(\beta_0 + \beta_1\) which is approximately 24.39. This is also the average mpg of a car with a manual transmission which we calculated earlier. The R-squared values are low but that is fine because our coefficients are enough to estimate mpg since we’re only dealing with dichotomous categorical data.
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.321756 0.5627312 -2.348823 0.018832869
## ammanual 2.525729 0.8660249 2.916462 0.003540258
The above shows the summary of the logistic regression we fitted. Note that our response variable has been transformed into a dichotomous outcome by checking if the value of a specific mpg is higher than the mean mpg of the entire dataset, which is 20.09. Both coefficients are significant. (Intercept), our \(\beta_0\) with a value of -1.32, is the log odds of getting an mpg that is higher than our mean mpg when the transmission of a car is automatic. Exponentiating our \(\beta_0\) gives us the actual odds which is approximately 0.27. ammanual, our \(\beta_1\) with an approximate value of 2.53, is the log odds ratio of having an mpg that is higher than the mean mpg of mtcars dataset if the car’s transmission is manual. Exponentiating this will give us the actual odds ratio which is approximately 12.50. The coefficients simply tell us that a car with a manual transmission has a higher odds of having mpg that’s higher than the mean mpg of this dataset.
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 2.8363045 0.05555555 51.053483 0.000000e+00
## ammanual 0.3544883 0.07906312 4.483612 7.339012e-06
The above numbers are the Poisson regression results. Note that the response variable has been rounded to the nearest whole number to be considered as a count data. Our p-values state that the output is statistically significant. (Intercept), our \(\beta_0\), is the estimated log mean of mpg when the transmission is automatic. Exponentiating it will give us 17.05, which is fairly close to the mean we got earlier and the \(\beta_0\) of our first model. ammanual, our \(\beta_1\), is the log estimated increase in the mean of mpg when transmission is manual. \(exp(\beta_0 + \beta_1)\) gives us 24.31, which is really close to the mean mpg of cars with manual transmission and is close to the estimated \(\beta_0 + \beta_1\) of our first model.
After presenting these methods, I would still use linear regression for this problem due to one obvious reason. The outcome that we’re working with is a continuous numerical data. Logistic and Poisson are for binary dichotomous and count data respectively. Although we managed to fit these models onto our data, there would still be some restrictions. For instance, in logistic regression, we will need to adjust our dichotomous outcome for larger data since the mean mpg may change. Poisson regression may work on this simple regression problem but expanding it may cause some issues since this model is intended for count data.
Figure 3 displays the residual plots of the selected model. The scatter plot doesn’t show any observable pattern. The histogram of the residuals is also presented to see its distribution. And it’s not that far from a normal distribution.
In this analysis, we aimed at identifying which type of transmission gives better mpg, and we want to quantify that difference. First, we saw during our exploratory analysis that there is a significant difference in mpg between automatic and manual transmissions. We fit three models, simple linear, logistic, and Poisson regression, on our data. The coefficients that we gathered are all statistically significant but we chose linear regression because of the type of data that we’re modelling.
Based on linear regression coefficients, the mpg of a car with automatic transmission is 17.15, while a car with manual transmission can have an mpg of 24.39. On 95% confidence interval, mpg of cars with automatic transmission can range from 14.85 to 19.44, while the mpg of cars with manual transmission can range from 18.49 to 30.29. This tells us that manual transmission gives better mpg with a difference that can range from 3.64 to 15.44.
Although the model that we fit to the data is significant, mpg can be affected by many factors. This article from www.fueleconomy.gov website gives us other factor which will definitely affect mpg. So, the model that we should create should also include other relevant variables, not just transmission.
The second point that I’ve already mentioned earlier is that we only had few data. The mean mpg per transmission type may vary on larger data. So, we should also consider getting more data.