Course Project Regression
ModelsThe data analysis process in this document has identified that manual
vehicles have better performance than automatics concerning miles per
gallon (mpg). Furthermore, based on this study’s linear
regression model, the difference has reached 7.24 miles per gallon based
on the average weights of automatic and manual cars (See Figure 4 in
APPENDIX Section A4.), which is a significant number.
The Motor Trend Car Road Test has evaluated 32 automobiles varying from 1973-74 models, and this study comprises fuel consumption, horsepower, weight, and other aspects. Based on it, the present publication aims to answer two questions:
If you are interested in reproducing this study, please visit the Github repository to have access to the raw document.
The adjusted data frame has 32 observations with no NA
values, divided into 6 (six) numeric variables (no one is standardized
or scaled) and 5 (five) categorical variables. For more exploratory
details, please, find them in APPENDIX section A1. For more information
about the variables descriptions, please, see it on the R
Documentation website.
Due to Figure 2 in section A2 from APPENDIX, I have tested the hypothesis that the average consumption from automatic and manual vehicles is equal. In other words, \(H_O: \mu_{automatic} = \mu_{manual}\), the p-value is 0.14% which is way less than alpha (5%). For this reason, the \(H_O\) was Rejected, which means the averages of automatic and manual transmissions are from different populations.
The model selection approach used for this project is based on the Week 3 videos and the Chapter Multiple variables and model selection from Regression Models for Data Science in R book.
\[lm(formula = mpg \sim am, data = mtcars)\]
The baseline model is the Ordinary Linear Regression, and this model
uses only transmission (am) to explain the consumption in
miles per gallon (mpg). From the hypothesis tested in
Section 3, this baseline model has identified that manual transmission
performs better than automatic ones. On average, manual transmission
yields 24.39 miles per gallon. On the other hand, automatic transmission
yield 17.15 miles per gallon. The difference is 7.24 miles per
gallon.
Using the anova() function, it was possible to run
several combinations, reaching the final model using am, wt, and
interaction between am and wt.
\[lm(formula = mpg \sim am + wt + am \cdot wt, data = mtcars)\]
I have decided to use a simpler model due to the parsimony. The R2 adjusted has reached 81.51%. All p-values of the model are below alpha (5%).
The linear model coefficients:
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 31.416055 | 3.0201093 | 10.402291 | 0.0000000 |
| ammanual | 14.878422 | 4.2640422 | 3.489276 | 0.0016210 |
| wt | -3.785907 | 0.7856478 | -4.818836 | 0.0000455 |
| ammanual:wt | -5.298361 | 1.4446993 | -3.667449 | 0.0010171 |
Due to the low number of observations, below 50, I have used the Shapiro-Wilk test to ensure the residual’s normality. The p-value obtained from this test was 8.72%, sufficient to reject the null hypothesis and proving the residual’s normality.
The residual analysis will be based on Figure 3 in APPENDIX Section A3.. This figure aims to corroborate the following explanations:
Considering the baseline model, the am variable can
explain 33.85% of the miles per gallon. However, In the final model, the
percentage of variance explained by the model rises to 81.51% with the
inclusion of two more predictors (wt and interaction of
wt and am).
\[mpg= \begin{cases} \text{Automatic vehicle (am = 0)} \implies \beta_0 + \beta_2 \cdot wt = 31.42 -3.79 \cdot wt \\ \text{Manual vehicle (am = 1)} \implies (\beta_0 + \beta_1) + (\beta_2 + \beta_3) \cdot wt = 46.29-9.08 \cdot wt \end{cases}\]
Figure 1 – Exploratory Data Visualization
mpg) vs Transmission
(am)Figure 2 – Fuel Consumption divided into Transmission and number of Gears.
Figure 3 – Residuals.
Figure 4 – On average, a manual car yields 7.24 more miles per gallon than automatics.