Exercise 9.3

Using the ozone data, fit a model with O3 as the response and temp, humidity and ibh as predictors. Use the Box–Cox method to determine the best transformation on the response.

library(faraway)
library(ggplot2)
lm <- lm(O3 ~ temp + humidity + ibh, ozone)
summary(lm)
## 
## Call:
## lm(formula = O3 ~ temp + humidity + ibh, data = ozone)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5291  -3.0137  -0.2249   2.8239  13.9303 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.049e+01  1.616e+00  -6.492 3.16e-10 ***
## temp         3.296e-01  2.109e-02  15.626  < 2e-16 ***
## humidity     7.738e-02  1.339e-02   5.777 1.77e-08 ***
## ibh         -1.004e-03  1.639e-04  -6.130 2.54e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.524 on 326 degrees of freedom
## Multiple R-squared:  0.684,  Adjusted R-squared:  0.6811 
## F-statistic: 235.2 on 3 and 326 DF,  p-value: < 2.2e-16

1.Univariate analysis.Present statistical summaries and graphical displays of each variable. Write a couple of sentences that summarize the variables.

summary(ozone$O3)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    5.00   10.00   11.78   17.00   38.00
sd(ozone$O3)
## [1] 8.011277
plot(ozone$O3)

hist(ozone$O3)

O3 has the mean of 11.78 and standard deviation of 8.01. It is an integer variable and falls between [1,38].

summary(ozone$temp)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   25.00   51.00   62.00   61.75   72.00   93.00
sd(ozone$temp)
## [1] 14.45874
plot(ozone$temp)

hist(ozone$temp)

temp has the mean of 61.75 and standard deviation of 14.46. It is an integer variable and falls between [25,93].

summary(ozone$humidity)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   19.00   47.00   64.00   58.13   73.00   93.00
sd(ozone$humidity)
## [1] 19.865
plot(ozone$humidity)

hist(ozone$humidity)

humidity has the mean of 58.13 and standard deviation of 19.865. It is an integer variable and falls between [19,93].

summary(ozone$ibh)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   111.0   877.5  2112.5  2572.9  5000.0  5000.0
sd(ozone$ibh)
## [1] 1803.886
plot(ozone$ibh)

hist(ozone$ibh)

ibh has the mean of 2572.9 and standard deviation of 1803.886. It is an integer variable and falls between [111,5000].

2.Examine residuals vs. fitted values for a model without any transformation.Write a sentence or two about what you see.

plot(fitted(lm), residuals(lm), xlab="Fitted", ylab="Residuals")
abline(h=0)

The plot suggests some nonlinearity, which should promote some change in the structural form of the model.

3.Conduct a Box-Cox assessment of the model and estimate a model based on the results of the assessment. Present the model as an equation.

library(MASS)
boxcox(lm)

boxcox(lm, lambda=seq(0,0.5,by=0.1))

lm_n <- lm((O3) ^ (0.3) ~ temp + humidity + ibh, ozone)
summary(lm_n)
## 
## Call:
## lm(formula = (O3)^(0.3) ~ temp + humidity + ibh, data = ozone)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70821 -0.14410  0.01145  0.16554  0.66129 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.338e-01  8.288e-02  10.061  < 2e-16 ***
## temp         1.762e-02  1.082e-03  16.287  < 2e-16 ***
## humidity     4.044e-03  6.867e-04   5.888 9.71e-09 ***
## ibh         -6.456e-05  8.402e-06  -7.685 1.82e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared:  0.7168, Adjusted R-squared:  0.7142 
## F-statistic:   275 on 3 and 326 DF,  p-value: < 2.2e-16

4.Examine residuals vs. fitted values for a model where the outcome O3 has been transformed according to the Box-Cox analysis results. Write a sentence or two about what you see.

plot(fitted(lm_n),residuals(lm_n),xlab="Fitted",ylab="Residuals")
abline(h=0)

We can see that the residuals seem to have constant variance to the fitted values.

5. Compare the adjusted R-squared values of the two models.

summary(lm)
## 
## Call:
## lm(formula = O3 ~ temp + humidity + ibh, data = ozone)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -11.5291  -3.0137  -0.2249   2.8239  13.9303 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.049e+01  1.616e+00  -6.492 3.16e-10 ***
## temp         3.296e-01  2.109e-02  15.626  < 2e-16 ***
## humidity     7.738e-02  1.339e-02   5.777 1.77e-08 ***
## ibh         -1.004e-03  1.639e-04  -6.130 2.54e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.524 on 326 degrees of freedom
## Multiple R-squared:  0.684,  Adjusted R-squared:  0.6811 
## F-statistic: 235.2 on 3 and 326 DF,  p-value: < 2.2e-16
summary(lm_n)
## 
## Call:
## lm(formula = (O3)^(0.3) ~ temp + humidity + ibh, data = ozone)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70821 -0.14410  0.01145  0.16554  0.66129 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  8.338e-01  8.288e-02  10.061  < 2e-16 ***
## temp         1.762e-02  1.082e-03  16.287  < 2e-16 ***
## humidity     4.044e-03  6.867e-04   5.888 9.71e-09 ***
## ibh         -6.456e-05  8.402e-06  -7.685 1.82e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared:  0.7168, Adjusted R-squared:  0.7142 
## F-statistic:   275 on 3 and 326 DF,  p-value: < 2.2e-16

The adjusted R-squared value of transformed model is larger than original one, which means that the new model fits better.

6. Note that one predictor variable has a range that is much larger than the other predictor variables.Re-estimate the model with the Box-Cox transformed outcome variable using standardized versions of the predictor variables.Comment on any differences from the model based on the untransformed predictor variables.

scozone <- data.frame(O = ozone$O3, scale(ozone))
lmod <- lm((O)^(0.3) ~ temp + humidity + ibh, scozone)
summary(lmod)
## 
## Call:
## lm(formula = (O)^(0.3) ~ temp + humidity + ibh, data = scozone)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70821 -0.14410  0.01145  0.16554  0.66129 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.99060    0.01277 155.886  < 2e-16 ***
## temp         0.25470    0.01564  16.287  < 2e-16 ***
## humidity     0.08032    0.01364   5.888 9.71e-09 ***
## ibh         -0.11647    0.01516  -7.685 1.82e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.232 on 326 degrees of freedom
## Multiple R-squared:  0.7168, Adjusted R-squared:  0.7142 
## F-statistic:   275 on 3 and 326 DF,  p-value: < 2.2e-16

We can see that only the coefficients of the predictors have changed.