Author: Russ Robbins
Affiliated Code Repository (right click, and open new window or tab)
As a person moves from a larger cylinder motor to a smaller cylinder motor they can expect to increase their MPG, and vice versa.
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.88458 2.0738436 18.267808 8.369155e-18
## x -2.87579 0.3224089 -8.919699 6.112687e-10
To compute MPG savings, you multiply 2.88 * the number of cylinders and then add this result to 37.89.
## [1] -7.068474 -4.434687
Used multiple linear regression in order to keep the resulting model easily interpretable. (This included not transforming any explanatory variables or creating any interaction terms by combining explanatory variables and then using them in the model.)
Explored data. This included summary of each of the variables. It also included plotting the relationships of each variable and every other variable pairwise.
Eliminated variables that are not directly explanatory for miles per gallon. I used “all subsets regression.”
Considered sets of independent variables and their joint prospective effect on miles per gallon. Seek the optimal model by reducing the explanatory variables from eight to the smallest number. Develop and use an understanding of the uncertainty in the competing models’ estimates. Seek the model, by use of hypothesis testing, which maximizes model’s R-squared and that has coefficients that are statistically significant as represented for the model as a whole with F, and for individual statistics with t See Figure 4.
Assured that the assumptions that were made about the explanatory variables are true, for any models that appear explanatory, by running diagnostic procedures. See Figures 5 through 7.
Considered uncertainty with regards to the predictions of the model by using confidence intervals**.
Documented the results of the analysis in any easy to understand report.
Additional figures are shown below to provide additional information about the kinds of diagnostics I ran to see whether particular relationships were actually linear. Each of the figures is simply one of many of these results that I analyzed.
##
## Call:
## lm(formula = mpg ~ cyl + wt + carb, data = m)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6692 -1.5668 -0.4254 1.2567 5.7404
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.6021 1.6823 23.541 < 2e-16 ***
## cyl -1.2898 0.4326 -2.981 0.005880 **
## wt -3.1595 0.7423 -4.256 0.000211 ***
## carb -0.4858 0.3295 -1.474 0.151536
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.517 on 28 degrees of freedom
## Multiple R-squared: 0.8425, Adjusted R-squared: 0.8256
## F-statistic: 49.91 on 3 and 28 DF, p-value: 2.322e-11
##
## Call:
## lm(formula = mpg ~ cyl + wt, data = m)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2893 -1.5512 -0.4684 1.5743 6.1004
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 39.6863 1.7150 23.141 < 2e-16 ***
## cyl -1.5078 0.4147 -3.636 0.001064 **
## wt -3.1910 0.7569 -4.216 0.000222 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.568 on 29 degrees of freedom
## Multiple R-squared: 0.8302, Adjusted R-squared: 0.8185
## F-statistic: 70.91 on 2 and 29 DF, p-value: 6.809e-12
Forward and backward subsetting was used and F and t’s considered in each.
Probably not.
## cyl wt
## 2.579312 2.579312
This is not a good diagnostic. It appears that wt does not have a linear relationship with MPG. This suggests that if we are going to use a linear model with no coefficient transformations, again, for easy, interpretability, we should stick with using cyl as a predictor for MPG. Therefore at this point, I changed the suggested model to Cyl affects MPG. However, this is not a hugely significant issue since both Cyl and Wt are highly correlated, and thus explain much of the same variance in the model. Further, it makes the resulting model Cyl-> MPG very interpretable. Cyl affects MPG had a large R-squared, F, and t statistics, so is very reasonable as a final model, if the other diagnostics prove supportive. From this point forward I checked out several other diagnostics and checked to see whether the variables I excluded (qsec and hp) explained any variance. Tney did not. I ended up with Cyl predicts Mpg very well and simply.