library(readxl)
mpg <- read_excel("C:/Users/Lynx/Documents/MSDA/621/mpg.xlsx")
mpg <- as.data.frame(mpg)
model <- lm(mpg ~ ., data = mpg)
summary(model)
##
## Call:
## lm(formula = mpg ~ ., data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.007 -5.636 -1.242 4.758 23.192
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.9698 2.0432 2.432 0.0154 *
## acceleration 1.1912 0.1292 9.217 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.101 on 396 degrees of freedom
## Multiple R-squared: 0.1766, Adjusted R-squared: 0.1746
## F-statistic: 84.96 on 1 and 396 DF, p-value: < 2.2e-16
The adjusted R-Squared is 0.1746
plot(model$fitted.values, model$residuals)
abline(h = 0)
The pattern amongst the risiduals is not evenly distributed, and as such, a Box-Cox transformation would be beneficial.
library(MASS)
boxcox(model)
Because the maximum point of in the curve is closest to 0, a log transformation will be applied.
model2 <- lm(I(log(mpg)) ~ ., data = mpg)
summary(model2)
##
## Call:
## lm(formula = I(log(mpg)) ~ ., data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06515 -0.23641 -0.00943 0.23576 0.79343
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.24656 0.08759 25.648 <2e-16 ***
## acceleration 0.05491 0.00554 9.911 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3044 on 396 degrees of freedom
## Multiple R-squared: 0.1987, Adjusted R-squared: 0.1967
## F-statistic: 98.23 on 1 and 396 DF, p-value: < 2.2e-16
The new Adjusted R-Squared is now 0.1967 which is > than 0.1746. This signifies an improvement to the model after applying the transformation.
plot(mpg ~ ., data = mpg)
There appears to be some slight nonlinearity in the relationship between the mpg and acceleration variables.
model3 <- lm(I(log(mpg)) ~ acceleration + I(acceleration^2), data = mpg)
summary(model3)
##
## Call:
## lm(formula = I(log(mpg)) ~ acceleration + I(acceleration^2),
## data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.07126 -0.22527 -0.00066 0.21838 0.77803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.023320 0.331575 3.086 0.002170 **
## acceleration 0.213095 0.041764 5.102 5.22e-07 ***
## I(acceleration^2) -0.004959 0.001298 -3.820 0.000155 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2993 on 395 degrees of freedom
## Multiple R-squared: 0.2273, Adjusted R-squared: 0.2234
## F-statistic: 58.1 on 2 and 395 DF, p-value: < 2.2e-16
model4 <- lm(I(log(mpg)) ~ acceleration + I(1/acceleration), data = mpg)
summary(model4)
##
## Call:
## lm(formula = I(log(mpg)) ~ acceleration + I(1/acceleration),
## data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.05749 -0.22920 0.00108 0.22127 0.76895
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.26294 0.58605 7.274 1.89e-12 ***
## acceleration -0.01068 0.01963 -0.544 0.58682
## I(1/acceleration) -14.99800 4.31148 -3.479 0.00056 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3002 on 395 degrees of freedom
## Multiple R-squared: 0.2226, Adjusted R-squared: 0.2186
## F-statistic: 56.54 on 2 and 395 DF, p-value: < 2.2e-16
model5 <- lm(I(log(mpg)) ~ acceleration + I(log(acceleration)), data = mpg)
summary(model5)
##
## Call:
## lm(formula = I(log(mpg)) ~ acceleration + I(log(acceleration)),
## data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06111 -0.22515 0.00151 0.21794 0.77069
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.59298 1.04365 -1.526 0.127724
## acceleration -0.09011 0.03966 -2.272 0.023624 *
## I(log(acceleration)) 2.23400 0.60516 3.692 0.000254 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2997 on 395 degrees of freedom
## Multiple R-squared: 0.2255, Adjusted R-squared: 0.2215
## F-statistic: 57.49 on 2 and 395 DF, p-value: < 2.2e-16
model6 <- lm(I(log(mpg)) ~ acceleration + I(sqrt(acceleration)), data = mpg)
summary(model6)
##
## Call:
## lm(formula = I(log(mpg)) ~ acceleration + I(sqrt(acceleration)),
## data = mpg)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.06339 -0.22731 0.00077 0.21655 0.77214
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.38791 1.23506 -1.933 0.053897 .
## acceleration -0.24561 0.08008 -3.067 0.002310 **
## I(sqrt(acceleration)) 2.36968 0.62997 3.762 0.000194 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2995 on 395 degrees of freedom
## Multiple R-squared: 0.2265, Adjusted R-squared: 0.2225
## F-statistic: 57.82 on 2 and 395 DF, p-value: < 2.2e-16
The y = x^2 model yields the highest Adjusted R-Squared value at 0.2234.
Using the “data.frame” function, create a new data frame with the following three variables:
The Box-Cox transformation of mpg
acceleration
The transformation of acceleration that yielded the best adjusted R-squared in the preceding question
mpg2 <- data.frame(boxcox = I(log(mpg$mpg)), acceleration = mpg$acceleration, accelsquared = (mpg$acceleration)^2)
mpg2_normal <- lm(boxcox ~ ., data = mpg2)
summary(mpg2_normal)
##
## Call:
## lm(formula = boxcox ~ ., data = mpg2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.07126 -0.22527 -0.00066 0.21838 0.77803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.023320 0.331575 3.086 0.002170 **
## acceleration 0.213095 0.041764 5.102 5.22e-07 ***
## accelsquared -0.004959 0.001298 -3.820 0.000155 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2993 on 395 degrees of freedom
## Multiple R-squared: 0.2273, Adjusted R-squared: 0.2234
## F-statistic: 58.1 on 2 and 395 DF, p-value: < 2.2e-16
Acceleration is more influential in predicting the Box-Cox transformation of mpg than its transformation. This is because the absolute value of the estimate for acceleration (0.213095) is bigger than the absolute value of the estimate for its transformation (0.004959).