library(faraway)
library(ggplot2)
lm <- lm(O3 ~ temp + humidity + ibh, ozone)
summary(lm)
##
## Call:
## lm(formula = O3 ~ temp + humidity + ibh, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5291 -3.0137 -0.2249 2.8239 13.9303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.049e+01 1.616e+00 -6.492 3.16e-10 ***
## temp 3.296e-01 2.109e-02 15.626 < 2e-16 ***
## humidity 7.738e-02 1.339e-02 5.777 1.77e-08 ***
## ibh -1.004e-03 1.639e-04 -6.130 2.54e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.524 on 326 degrees of freedom
## Multiple R-squared: 0.684, Adjusted R-squared: 0.6811
## F-statistic: 235.2 on 3 and 326 DF, p-value: < 2.2e-16
summary(ozone$O3)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 5.00 10.00 11.78 17.00 38.00
sd(ozone$O3)
## [1] 8.011277
plot(ozone$O3)
hist(ozone$O3)
O3
has the mean of 11.78 and standard deviation of 8.01. It is an integer variable and falls between [1,38].
summary(ozone$temp)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 25.00 51.00 62.00 61.75 72.00 93.00
sd(ozone$temp)
## [1] 14.45874
plot(ozone$temp)
hist(ozone$temp)
temp
has the mean of 61.75 and standard deviation of 14.46. It is an integer variable and falls between [25,93].
summary(ozone$humidity)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 19.00 47.00 64.00 58.13 73.00 93.00
sd(ozone$humidity)
## [1] 19.865
plot(ozone$humidity)
hist(ozone$humidity)
humidity
has the mean of 58.13 and standard deviation of 19.865. It is an integer variable and falls between [19,93].
summary(ozone$ibh)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 111.0 877.5 2112.5 2572.9 5000.0 5000.0
sd(ozone$ibh)
## [1] 1803.886
plot(ozone$ibh)
hist(ozone$ibh)
ibh
has the mean of 2572.9 and standard deviation of 1803.886. It is an integer variable and falls between [111,5000].
plot(fitted(lm), residuals(lm), xlab="Fitted", ylab="Residuals")
abline(h=0)
The plot suggests some nonlinearity, which should promote some change in the structural form of the model.
library(MASS)
boxcox(lm)
boxcox(lm, lambda=seq(0,0.5,by=0.1))
lm_n <- lm((O3) ^ (0.3) ~ temp + humidity + ibh, ozone)
summary(lm_n)
##
## Call:
## lm(formula = (O3)^(0.3) ~ temp + humidity + ibh, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70821 -0.14410 0.01145 0.16554 0.66129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.338e-01 8.288e-02 10.061 < 2e-16 ***
## temp 1.762e-02 1.082e-03 16.287 < 2e-16 ***
## humidity 4.044e-03 6.867e-04 5.888 9.71e-09 ***
## ibh -6.456e-05 8.402e-06 -7.685 1.82e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared: 0.7168, Adjusted R-squared: 0.7142
## F-statistic: 275 on 3 and 326 DF, p-value: < 2.2e-16
plot(fitted(lm_n),residuals(lm_n),xlab="Fitted",ylab="Residuals")
abline(h=0)
We can see that the residuals seem to have constant variance to the fitted values.
summary(lm)
##
## Call:
## lm(formula = O3 ~ temp + humidity + ibh, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -11.5291 -3.0137 -0.2249 2.8239 13.9303
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.049e+01 1.616e+00 -6.492 3.16e-10 ***
## temp 3.296e-01 2.109e-02 15.626 < 2e-16 ***
## humidity 7.738e-02 1.339e-02 5.777 1.77e-08 ***
## ibh -1.004e-03 1.639e-04 -6.130 2.54e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 4.524 on 326 degrees of freedom
## Multiple R-squared: 0.684, Adjusted R-squared: 0.6811
## F-statistic: 235.2 on 3 and 326 DF, p-value: < 2.2e-16
summary(lm_n)
##
## Call:
## lm(formula = (O3)^(0.3) ~ temp + humidity + ibh, data = ozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70821 -0.14410 0.01145 0.16554 0.66129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 8.338e-01 8.288e-02 10.061 < 2e-16 ***
## temp 1.762e-02 1.082e-03 16.287 < 2e-16 ***
## humidity 4.044e-03 6.867e-04 5.888 9.71e-09 ***
## ibh -6.456e-05 8.402e-06 -7.685 1.82e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared: 0.7168, Adjusted R-squared: 0.7142
## F-statistic: 275 on 3 and 326 DF, p-value: < 2.2e-16
The adjusted R-squared value of transformed model is larger than original one, which means that the new model fits better.
scozone <- data.frame(O = ozone$O3, scale(ozone))
lmod <- lm((O)^(0.3) ~ temp + humidity + ibh, scozone)
summary(lmod)
##
## Call:
## lm(formula = (O)^(0.3) ~ temp + humidity + ibh, data = scozone)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70821 -0.14410 0.01145 0.16554 0.66129
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.99060 0.01277 155.886 < 2e-16 ***
## temp 0.25470 0.01564 16.287 < 2e-16 ***
## humidity 0.08032 0.01364 5.888 9.71e-09 ***
## ibh -0.11647 0.01516 -7.685 1.82e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared: 0.7168, Adjusted R-squared: 0.7142
## F-statistic: 275 on 3 and 326 DF, p-value: < 2.2e-16
We can see that only the coefficients of the predictors have changed.