CHAPTER 3 OUTPUT

This shows the outputs from Chapter 3 using R. The textbook is “A Modern Approach to Regression with R” by Simon J. Sheather (2008). The R code provided with the book has been updated.

Anscombe’s Quartet (Four data sets)

anscombe <- data.frame(read_excel("MARData.xlsx", sheet = "anscombe"))
attach(anscombe)
kable(anscombe, caption = "Anscombe's dataset")
Anscombe’s dataset
case x1 x2 x3 x4 y1 y2 y3 y4
1 10 10 10 8 8.04 9.14 7.46 6.58
2 8 8 8 8 6.95 8.14 6.77 5.76
3 13 13 13 8 7.58 8.74 12.74 7.71
4 9 9 9 8 8.81 8.77 7.11 8.84
5 11 11 11 8 8.33 9.26 7.81 8.47
6 14 14 14 8 9.96 8.10 8.84 7.04
7 6 6 6 8 7.24 6.13 6.08 5.25
8 4 4 4 19 4.26 3.10 5.39 12.50
9 12 12 12 8 10.84 9.13 8.15 5.56
10 7 7 7 8 4.82 7.26 6.42 7.91
11 5 5 5 8 5.68 4.74 5.73 6.89

Scatter plots for four datasets

#Figure 3.1 on page 46
par(mfrow = c(2,2))
plot(x1, y1, xlim = c(4,20), ylim = c(3,14), main = "Data Set 1")
abline(lsfit(x1, y1))
plot(x2, y2, xlim = c(4,20), ylim = c(3,14), main = "Data Set 2")
abline(lsfit(x2, y2))
plot(x3,y3,xlim = c(4,20), ylim = c(3,14), main = "Data Set 3")
abline(lsfit(x3, y3))
plot(x4, y4, xlim = c(4,20), ylim = c(3,14), main = "Data Set 4")
abline(lsfit(x4, y4))

Regression outputs

#Regression output on page 47
m1 <- lm(y1 ~ x1)
summary(m1)
## 
## Call:
## lm(formula = y1 ~ x1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.92127 -0.45577 -0.04136  0.70941  1.83882 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0001     1.1247   2.667  0.02573 * 
## x1            0.5001     0.1179   4.241  0.00217 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6665, Adjusted R-squared:  0.6295 
## F-statistic: 17.99 on 1 and 9 DF,  p-value: 0.00217
m2 <- lm(y2 ~ x2)
summary(m2)
## 
## Call:
## lm(formula = y2 ~ x2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9009 -0.7609  0.1291  0.9491  1.2691 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)    3.001      1.125   2.667  0.02576 * 
## x2             0.500      0.118   4.239  0.00218 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.237 on 9 degrees of freedom
## Multiple R-squared:  0.6662, Adjusted R-squared:  0.6292 
## F-statistic: 17.97 on 1 and 9 DF,  p-value: 0.002179
m3 <- lm(y3 ~ x3)
summary(m3)
## 
## Call:
## lm(formula = y3 ~ x3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.1586 -0.6146 -0.2303  0.1540  3.2411 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0025     1.1245   2.670  0.02562 * 
## x3            0.4997     0.1179   4.239  0.00218 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared:  0.6663, Adjusted R-squared:  0.6292 
## F-statistic: 17.97 on 1 and 9 DF,  p-value: 0.002176
m4 <- lm(y4 ~ x4)
summary(m4)
## 
## Call:
## lm(formula = y4 ~ x4)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -1.751 -0.831  0.000  0.809  1.839 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   3.0017     1.1239   2.671  0.02559 * 
## x4            0.4999     0.1178   4.243  0.00216 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.236 on 9 degrees of freedom
## Multiple R-squared:  0.6667, Adjusted R-squared:  0.6297 
## F-statistic:    18 on 1 and 9 DF,  p-value: 0.002165

Residual plots for diagnostics

#Figure 3.2 on page 48
par(mfrow = c(2,2))
plot(x1,m1$residuals,ylab = "Residuals",xlim = c(4,20),ylim = c(-3.5,3.5),main = "Data Set 1")
plot(x2,m2$residuals,ylab = "Residuals",xlim = c(4,20),ylim = c(-3.5,3.5),main = "Data Set 2")
plot(x3,m3$residuals,ylab = "Residuals",xlim = c(4,20),ylim = c(-3.5,3.5),main = "Data Set 3")
plot(x4,m4$residuals,ylab = "Residuals",xlim = c(4,20),ylim = c(-3.5,3.5),main = "Data Set 4")

Regression diagnostics

1. Determine whether the proposed regression model is a valid model (i.e., determine whether it provides an adequate fit to the data). The main tools we will use to validate regression assumptions are plots of standardized residuals. 2. Determine which of the data points have x-values that have an unusually large effect on the estimated regression model (such points are called leverage points.) 3. Determine which of the data points are outliers, that is, points which do not follow the pattern set by the bulk of the data, when takes into account the given model. 4. If leverage points exist, determine whether it is a bad leverage point. If a bad leverage point exists, we should assess its influence on the fitted model. 5. Examine whether the assumption of a constant variance of the errors is reasonable. If not, we shall look at how to overcome this problem. 6. If the data are collected over time, examine whether the data are correlated over time. 7. If the sample size is small or prediction intervals are of interest, examine whether the assumption that the errors are normally distributed is reasonable.

#Figure 3.3 on page 50
par(mfrow = c(1,2))
plot(x2,y2,ylim = c(3,10))
abline(lsfit(x2,y2))
plot(x2,m2$residuals,ylab = "Residuals", ylim = c(-2,2),main = "Data Set 2")

detach(anscombe)

Leverage Points

For theory, see (https://rpubs.com/gokul108/reg32)

huber <- read_excel("MARData.xlsx",sheet  = "huber")
attach(huber)

#Regression output on page 54
mBad <- lm(YBad ~ x)
summary(mBad)
## 
## Call:
## lm(formula = YBad ~ x)
## 
## Residuals:
##       1       2       3       4       5       6 
##  2.0858  0.4173 -0.2713 -1.5898 -1.3883  0.7463 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)  0.06833    0.63279   0.108    0.919
## x           -0.08146    0.13595  -0.599    0.581
## 
## Residual standard error: 1.55 on 4 degrees of freedom
## Multiple R-squared:  0.08237,    Adjusted R-squared:  -0.147 
## F-statistic: 0.3591 on 1 and 4 DF,  p-value: 0.5813
mGood <- lm(YGood ~ x)
summary(mGood)
## 
## Call:
## lm(formula = YGood ~ x)
## 
## Residuals:
##        1        2        3        4        5        6 
##  0.47813 -0.31349 -0.12510 -0.56672  0.51167  0.01551 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.83167    0.19640  -9.326 0.000736 ***
## x           -0.95838    0.04219 -22.714 2.23e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4811 on 4 degrees of freedom
## Multiple R-squared:  0.9923, Adjusted R-squared:  0.9904 
## F-statistic: 515.9 on 1 and 4 DF,  p-value: 2.225e-05
#Figure 3.7 on page 55
par(mfrow = c(1,2))
plot(x,YBad,ylim = c(-12,3))
abline(lsfit(x,YBad))
plot(x,YGood,ylim = c(-12,3))
abline(lsfit(x,YGood))

#Leverage values in Table 3.3 on page 57
lm.influence(mBad)$hat
##         1         2         3         4         5         6 
## 0.2897436 0.2358974 0.1974359 0.1743590 0.1666667 0.9358974
lm.influence(mGood)$hat
##         1         2         3         4         5         6 
## 0.2897436 0.2358974 0.1974359 0.1743590 0.1666667 0.9358974
#Regression output and Figure 3.8 on page 58
xq <- x^2
mBadq <- lm(YBad ~ x + I(x^2))
summary(mBadq)
## 
## Call:
## lm(formula = YBad ~ x + I(x^2))
## 
## Residuals:
##        1        2        3        4        5        6 
##  0.24695 -0.25918  0.04771 -0.44237  0.42057 -0.01367 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -1.74057    0.29702  -5.860  0.00991 **
## x           -0.65945    0.08627  -7.644  0.00465 **
## I(x^2)       0.08349    0.01133   7.369  0.00517 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4096 on 3 degrees of freedom
## Multiple R-squared:  0.952,  Adjusted R-squared:  0.9199 
## F-statistic: 29.72 on 2 and 3 DF,  p-value: 0.01053
xx <- c(-4:10)
yy <- summary(mBadq)$coef[1] + summary(mBadq)$coef[2]*xx + summary(mBadq)$coef[3]*xx^2
par(mfrow = c(1,1))
plot(xx,yy,ylim = c(-3,3),type = "l",ylab = "YBad",xlab = "x")
points(x,YBad)

detach(huber)
bonds <- read_excel("MARData.xlsx", sheet = "bonds")
attach(bonds)

#Figure 3.9 on page 63
par(mfrow = c(1,1))
plot(CouponRate, BidPrice, xlab = "Coupon Rate (%)", ylab = "Bid Price ($)", ylim = c(85,120), xlim = c(2,14))
abline(lsfit(CouponRate, BidPrice))

#Regression output on page 63 
m1 <- lm(BidPrice ~ CouponRate)
summary(m1)
## 
## Call:
## lm(formula = BidPrice ~ CouponRate)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.249 -2.470 -0.838  2.550 10.515 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  74.7866     2.8267  26.458  < 2e-16 ***
## CouponRate    3.0661     0.3068   9.994 1.64e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.175 on 33 degrees of freedom
## Multiple R-squared:  0.7516, Adjusted R-squared:  0.7441 
## F-statistic: 99.87 on 1 and 33 DF,  p-value: 1.645e-11
#95% confidence intervals on page 63
round(confint(m1,level = 0.95), 3)
##              2.5 % 97.5 %
## (Intercept) 69.036 80.537
## CouponRate   2.442  3.690
#Table 3.4 on page 62
leverage1 <- hatvalues(m1)
StanRes1 <- rstandard(m1)
residual1 <- m1$residuals
cbind(Case, CouponRate, BidPrice, round(leverage1,3), round(residual1,3), round(StanRes1,3))
##    Case CouponRate BidPrice                    
## 1     1      7.000    92.94 0.049 -3.309 -0.812
## 2     2      9.000   101.44 0.029 -0.941 -0.229
## 3     3      7.000    92.66 0.049 -3.589 -0.881
## 4     4      4.125    94.50 0.153  7.066  1.838
## 5     5     13.125   118.94 0.124  3.911  1.001
## 6     6      8.000    96.75 0.033 -2.565 -0.625
## 7     7      8.750   100.88 0.029 -0.735 -0.179
## 8     8     12.625   117.25 0.103  3.754  0.949
## 9     9      9.500   103.34 0.030 -0.575 -0.140
## 10   10     10.125   106.25 0.036  0.419  0.102
## 11   11     11.625   113.19 0.068  2.760  0.685
## 12   12      8.625    99.44 0.029 -1.792 -0.435
## 13   13      3.000    94.50 0.218 10.515  2.848
## 14   14     10.500   108.31 0.042  1.329  0.325
## 15   15     11.250   111.69 0.058  2.410  0.595
## 16   16      8.375    98.09 0.030 -2.375 -0.578
## 17   17     10.375   107.91 0.040  1.313  0.321
## 18   18     11.250   111.97 0.058  2.690  0.664
## 19   19     12.625   119.06 0.103  5.564  1.407
## 20   20      8.875   100.38 0.029 -1.618 -0.393
## 21   21     10.500   108.50 0.042  1.519  0.372
## 22   22      8.625    99.25 0.029 -1.982 -0.482
## 23   23      9.500   103.63 0.030 -0.285 -0.069
## 24   24     11.500   114.03 0.064  3.983  0.986
## 25   25      8.875   100.38 0.029 -1.618 -0.393
## 26   26      7.375    92.06 0.041 -5.339 -1.306
## 27   27      7.250    90.88 0.044 -6.136 -1.503
## 28   28      8.625    98.41 0.029 -2.822 -0.686
## 29   29      8.500    97.75 0.030 -3.098 -0.753
## 30   30      8.875    99.88 0.029 -2.118 -0.515
## 31   31      8.125    95.16 0.032 -4.539 -1.105
## 32   32      9.000   100.66 0.029 -1.721 -0.418
## 33   33      9.250   102.31 0.029 -0.838 -0.204
## 34   34      7.000    88.00 0.049 -8.249 -2.025
## 35   35      3.500    94.53 0.187  9.012  2.394
#Figure 3.10 on page 64
plot(CouponRate, StanRes1, xlab = "Coupon Rate (%)", ylab = "Standardized Residuals", xlim = c(2,14))
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)

identify(CouponRate, StanRes1, Case)

## integer(0)
# Click near a point to identify its Case. 
# This continues until you select "Stop" after clicking the right mouse button.

#Regression output on page 66
m2 <- update(m1, subset = (1:35)[-c(4,13,35)])
summary(m2)
## 
## Call:
## lm(formula = BidPrice ~ CouponRate, subset = (1:35)[-c(4, 13, 
##     35)])
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.1301 -0.3789  0.2240  0.4576  1.8099 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  57.2932     1.0358   55.31   <2e-16 ***
## CouponRate    4.8338     0.1082   44.67   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.024 on 30 degrees of freedom
## Multiple R-squared:  0.9852, Adjusted R-squared:  0.9847 
## F-statistic:  1996 on 1 and 30 DF,  p-value: < 2.2e-16
#Figure 3.11 on page 65
plot(CouponRate[-c(4,13,35)], BidPrice[-c(4,13,35)], xlab = "Coupon Rate (%)", ylab = "Bid Price ($)", ylim = c(85,120), xlim = c(2,14), main = "Regular Bonds")
abline(m2)

#Figure 3.12 on page 67
StanRes2 <- rstandard(m2)
plot(CouponRate[-c(4,13,35)], StanRes2, xlab = "Coupon Rate (%)", ylab = "Standardized Residuals", xlim = c(2,14), main = "Regular Bonds")
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)

#Figure 3.13 on page 68
cd1 <- cooks.distance(m1)
plot(CouponRate,cd1,xlab = "Coupon Rate (%)", ylab = "Cook's Distance")
abline(h = 4/(35-2),lty = 2)

identify(CouponRate,cd1,Case)

## integer(0)
# Click near a point to identify its Case. 
# This continues until you select "Stop" after clicking the right mouse button.

detach(bonds)
production <- read_excel("MARData.xlsx", sheet = "production")
attach(production)

m1 <- lm(RunTime ~ RunSize)

#Figure 3.14 on page 70
par(mfrow = c(2,2))
plot(m1)

detach(production)
cleaning <- read_excel("MARData.xlsx", sheet = "cleaning")
attach(cleaning)

#Figure 3.15 on page 71
par(mfrow = c(1,1))
plot(Crews,Rooms, xlab = "Number of Crews", ylab = "Number of Rooms Cleaned")
abline(lsfit(Crews,Rooms))

#Regression output on pages 72 and 73
m1 <- lm(Rooms ~ Crews)
summary(m1)
## 
## Call:
## lm(formula = Rooms ~ Crews)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.9990  -4.9901   0.8046   4.0010  17.0010 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   1.7847     2.0965   0.851    0.399    
## Crews         3.7009     0.2118  17.472   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.336 on 51 degrees of freedom
## Multiple R-squared:  0.8569, Adjusted R-squared:  0.854 
## F-statistic: 305.3 on 1 and 51 DF,  p-value: < 2.2e-16
predict(m1,newdata = data.frame(Crews = c(4,16)),interval = "prediction",level = 0.95)
##        fit      lwr      upr
## 1 16.58827  1.58941 31.58713
## 2 60.99899 45.81025 76.18773
#Figure 3.16 on page 73
StanRes1 <- rstandard(m1)
plot(Crews,StanRes1, xlab = "Number of Crews", ylab = "Standardized Residuals")

#Figure 3.17 on page 74
sabs <- sqrt(abs(StanRes1))
plot(Crews, sabs,xlab = "Number of Crews", ylab = "Square Root(|Standardized Residuals|)")
abline(lsfit(Crews,sabs))

#Figure 3.18 on page 75
par(mfrow = c(2,2))
plot(m1)

#Figure 3.19 on page 75
sqrt(tapply(Rooms,Crews,var))
##         2         4         6         8        10        12        16 
##  3.000000  4.966555  4.690416  6.642665  7.927123  7.289915 12.000463
sds <- c(3.000000,4.966555,4.690416,6.642665,7.927123,7.28991,12.000463)
xx <- c(2,4,6,8,10,12,16)
par(mfrow = c(1,1))
plot(xx, sds, xlab = "Number of Crews", ylab = "Standard deviation(Rooms Cleaned)")
abline(lsfit(xx,sds))

#Regression output on page 77
sqrtcrews <- sqrt(Crews)
sqrtrooms <- sqrt(Rooms)
m2 <- lm(sqrtrooms~sqrtcrews)
summary(m2)
## 
## Call:
## lm(formula = sqrtrooms ~ sqrtcrews)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.09825 -0.43988  0.06826  0.42726  1.20275 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   0.2001     0.2757   0.726    0.471    
## sqrtcrews     1.9016     0.0936  20.316   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.594 on 51 degrees of freedom
## Multiple R-squared:   0.89,  Adjusted R-squared:  0.8879 
## F-statistic: 412.7 on 1 and 51 DF,  p-value: < 2.2e-16
predict(m2,newdata = data.frame(sqrtcrews = c(2,4)),interval = "prediction", level = 0.95)
##        fit      lwr      upr
## 1 4.003286 2.789926 5.216646
## 2 7.806449 6.582320 9.030578
#Figure 3.20 on page 78
par(mfrow = c(1,2))
plot(sqrt(Crews),sqrt(Rooms),xlab = "Square Root(Number of Crews)",ylab = "Square Root(Number of Rooms Cleaned)")
abline(lsfit(sqrt(Crews), sqrt(Rooms)))
StanRes2 <- rstandard(m2)
plot(sqrtcrews,StanRes2, xlab = "Square Root(Number of Crews)", ylab = "Standardized Residuals")

#Figure 3.21 on page 78
par(mfrow = c(2,2))
plot(m2)

detach(cleaning)
confood1 <- read_excel("MARData.xlsx", sheet = "confood1")
attach(confood1)

#Figure 3.22 on page 80
par(mfrow = c(1,1))
plot(Price, Sales)
abline(lsfit(Price, Sales))

#Figure 3.23 on page 81
plot(log(Price),log(Sales),xlab = "log(Price)", ylab = "log(Sales)")
abline(lsfit(log(Price),log(Sales)))

#Regression output on page 82
m1 <- lm(log(Sales) ~ log(Price))
summary(m1)
## 
## Call:
## lm(formula = log(Sales) ~ log(Price))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.88973 -0.18188  0.04025  0.22087  1.31026 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   4.8029     0.1744   27.53  < 2e-16 ***
## log(Price)   -5.1477     0.5098  -10.10 1.16e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4013 on 50 degrees of freedom
## Multiple R-squared:  0.671,  Adjusted R-squared:  0.6644 
## F-statistic:   102 on 1 and 50 DF,  p-value: 1.159e-13
#Figure 3.24 on page 82
StanRes1 <- rstandard(m1)
plot(log(Price), StanRes1, xlab = "log(Price)", ylab = "Standardized Residuals")

detach(confood1)
responsetransformation <- read_excel("MARData.xlsx", sheet = "responsetransformation")
attach(responsetransformation)

#Figure 3.25 on page 84
plot(x,y)

#Figure 3.26 on page 85
plot(x,y)

m1 <- lm(y~x)
par(mfrow = c(1,2))
StanRes1 <- rstandard(m1)
absrtsr1 <- sqrt(abs(StanRes1))
plot(x,StanRes1,ylab = "Standardized Residuals")
plot(x,absrtsr1,ylab = "Square Root(|Standardized Residuals|)")

#Figure 3.27 on page 86
par(mfrow = c(3,2))
plot(density(y, bw = "SJ", kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = "y")
rug(y)
boxplot(y, ylab = "Y")
qqnorm(y, ylab = "Y")
qqline(y, lty = 2, col = 2)
sj <- bw.SJ(x,lower = 0.05, upper = 100)
plot(density(x,bw = sj, kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = "x")
rug(x)
boxplot(x, ylab = "x")
qqnorm(x, ylab = "x")
qqline(x, lty = 2, col = 2)

#Figure 3.28 on page 87
#install.packages("alr3")
#You will be asked to 
#--- Please select a CRAN mirror for use in this session ---
library(alr3)
## Loading required package: car
## Loading required package: carData
par(mfrow = c(1,1))
inverse.response.plot(m1, key = TRUE)
## Warning: 'inverse.response.plot' is deprecated.
## Use 'inverseResponsePlot' instead.
## See help("Deprecated") and help("alr3-deprecated").

##       lambda        RSS
## 1  0.3321126   265.8749
## 2 -1.0000000 46673.8798
## 3  0.0000000  3583.8067
## 4  1.0000000  7136.8828
# Click on the section of the plot that you wish to put the figure legend

#Figure 3.29 on page 88
inverse.response.plot(m1,lam = c(-1,-0.5, -0.33, -0.25, 0, 0.25, 0.33, 0.5,1))
## Warning: 'inverse.response.plot' is deprecated.
## Use 'inverseResponsePlot' instead.
## See help("Deprecated") and help("alr3-deprecated").

##        lambda        RSS
## 1   0.3321126   265.8749
## 2  -1.0000000 46673.8798
## 3  -0.5000000 24090.7391
## 4  -0.3300000 15264.2435
## 5  -0.2500000 11637.0650
## 6   0.0000000  3583.8067
## 7   0.2500000   440.0384
## 8   0.3300000   265.9846
## 9   0.5000000   880.1573
## 10  1.0000000  7136.8828
lambda <- c(-1,-0.5, -0.33, -0.25, 0, 0.25, 0.33, 0.5,1)
RSS <- c(46673.9,24090.7,15264.2,11637.1,3583.8,440,266,880.2,7136.9)
plot(lambda,RSS,type = "l", ylab = expression(RSS(lambda)), xlab = expression(lambda))

#Figure 3.30 on page 92
library(MASS)
## 
## Attaching package: 'MASS'
## The following object is masked from 'package:alr3':
## 
##     forbes
par(mfrow = c(1,2))
boxcox(m1,lambda = seq(0.28,0.39,length = 20))
boxcox(m1,lambda = seq(0.325,0.34,length = 20))

#Regression output & Figure 3.31 on page 93
ty <- y^(1/3)
par(mfrow = c(2,2))
sj <- bw.SJ(ty,lower = 0.05, upper = 100)
plot(density(ty, bw = sj, kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = expression(Y^(1/3)))
rug(ty)
boxplot(ty,ylab = expression(Y^(1/3)))
qqnorm(ty, ylab = expression(Y^(1/3)))
qqline(ty, lty = 2, col = 2)
m2 <- lm(ty ~ x)
plot(x,ty,ylab = expression(Y^(1/3)))
abline(m2)

summary(m2)
## 
## Call:
## lm(formula = ty ~ x)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.144265 -0.035128 -0.002067  0.035897  0.161090 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.008947   0.011152   0.802    0.423    
## x           0.996451   0.004186 238.058   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.05168 on 248 degrees of freedom
## Multiple R-squared:  0.9956, Adjusted R-squared:  0.9956 
## F-statistic: 5.667e+04 on 1 and 248 DF,  p-value: < 2.2e-16
detach(responsetransformation)
library(alr3)
data(salarygov)
attach(salarygov)

#Figure 3.32 on page 96
m1 <- lm(MaxSalary ~ Score)
par(mfrow = c(2,2))
plot(Score, MaxSalary)
abline(m1,lty = 2, col = 2)
StanRes1 <- rstandard(m1)
absrtsr1 <- sqrt(abs(StanRes1))
plot(Score, StanRes1, ylab = "Standardized Residuals")
plot(Score,absrtsr1, ylab = "Square Root(|Standardized Residuals|)")
abline(lsfit(Score,absrtsr1), lty = 2, col = 2)

#Output from R on page 96
summary(tranxy <- powerTransform(MaxSalary ~ Score))
## bcPower Transformation to Normality 
##    Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## Y1    0.1292        0.13       0.0025        0.256
## 
## Likelihood ratio test that transformation parameter is equal to 0
##  (log transformation)
##                            LRT df     pval
## LR test, lambda = (0) 3.927686  1 0.047497
## 
## Likelihood ratio test that no transformation is needed
##                            LRT df       pval
## LR test, lambda = (1) 179.8725  1 < 2.22e-16
#Figure 3.33 on page 97
par(mfrow = c(3,2))

plot(density(MaxSalary, bw = "SJ", kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = "MaxSalary")
rug(MaxSalary)
boxplot(MaxSalary, ylab = "MaxSalary")
qqnorm(MaxSalary, ylab = "MaxSalary")
qqline(MaxSalary, lty = 2, col = 2)
plot(density(Score, bw = "SJ", kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = "Score")
rug(Score)
boxplot(Score, ylab = "Score")
qqnorm(Score, ylab = "Score")
qqline(Score, lty = 2, col = 2)

#Figure 3.34 on page 97
par(mfrow = c(1,1))
plot(sqrt(Score),log(MaxSalary), xlab = expression(sqrt(Score)))
abline(lsfit(sqrt(Score),log(MaxSalary)),lty = 2, col = 2)

#Figure 3.35 on page 98
par(mfrow = c(3,2))
plot(density(log(MaxSalary), bw = "SJ", kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = "log(MaxSalary)")
rug(log(MaxSalary))
boxplot(log(MaxSalary), ylab = "log(MaxSalary)")
qqnorm(log(MaxSalary), ylab = "log(MaxSalary)")
qqline(log(MaxSalary), lty = 2, col = 2)
sj <- bw.SJ(sqrt(Score),lower = 0.05, upper = 100)
plot(density(sqrt(Score),bw = sj, kern = "gaussian"), type = "l",
main = "Gaussian kernel density estimate", xlab = expression(sqrt(Score)))
rug(sqrt(Score))
boxplot(sqrt(Score), ylab = expression(sqrt(Score)))
qqnorm(sqrt(Score), ylab = expression(sqrt(Score)))
qqline(sqrt(Score), lty = 2, col = 2)

#Figure 3.36 on page 99
m2 <- lm(log(MaxSalary) ~ sqrt(Score))
par(mfrow = c(1,2))
StanRes2 <- rstandard(m2)
absrtsr2 <- sqrt(abs(StanRes2))
plot(sqrt(Score),StanRes2,ylab = "Standardized Residuals",xlab = expression(sqrt(Score)))
plot(sqrt(Score),absrtsr2,ylab = "Square Root(|Standardized Residuals|)", xlab = expression(sqrt(Score)))
abline(lsfit(sqrt(Score),absrtsr2),lty = 2,col = 2)

#R output on page 99
summary(tranx <- powerTransform(Score))
## bcPower Transformation to Normality 
##       Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## Score    0.5481         0.5       0.3606       0.7357
## 
## Likelihood ratio test that transformation parameter is equal to 0
##  (log transformation)
##                            LRT df      pval
## LR test, lambda = (0) 35.16895  1 3.023e-09
## 
## Likelihood ratio test that no transformation is needed
##                            LRT df       pval
## LR test, lambda = (1) 21.09339  1 4.3743e-06
#Figure 3.37 on page 100
m3 <- lm(MaxSalary ~ sqrt(Score))
par(mfrow = c(1,1))
inverse.response.plot(m3,key = TRUE)
## Warning: 'inverse.response.plot' is deprecated.
## Use 'inverseResponsePlot' instead.
## See help("Deprecated") and help("alr3-deprecated").

##       lambda       RSS
## 1 -0.1919139  83404509
## 2 -1.0000000  97846135
## 3  0.0000000  84316473
## 4  1.0000000 119006606
#Click on the plot where you want to put the legend

#Figure 3.38 on page 101
par(mfrow = c(2,2))
plot(density(MaxSalary^-0.25,bw = "SJ",kern = "gaussian"),type = "l",
main = "Gaussian kernel density estimate",xlab = expression(MaxSalary^-0.25))
rug(MaxSalary^-0.25)
boxplot(MaxSalary^-0.25,ylab = expression(MaxSalary^-0.25))
qqnorm(MaxSalary^-0.25,ylab = expression(MaxSalary^-0.25))
qqline(MaxSalary^-0.25,lty = 2, col = 2)

#Figure 3.39 on page 102
par(mfrow = c(2,2))

plot(sqrt(Score),MaxSalary^-0.25,xlab = expression(sqrt(Score)),ylab = expression(MaxSalary^-0.25))
abline(lsfit(sqrt(Score),MaxSalary^-0.25),lty = 2,col = 2)
m3 <- lm(MaxSalary^-0.25~sqrt(Score))
StanRes3 <- rstandard(m3)
absrtsr3 <- sqrt(abs(StanRes3))
plot(sqrt(Score),StanRes3,ylab = "Standardized Residuals",xlab = expression(sqrt(Score)))
plot(sqrt(Score),absrtsr3,ylab = "Square Root(|Standardized Residuals|)",xlab = expression(sqrt(Score)))
abline(lsfit(sqrt(Score),absrtsr3),lty = 2,col = 2)

detach(salarygov)

#################EXERCISES

#Exercise 3.5.1
airfares <- read_excel("MARData.xlsx", sheet = "airfares")
attach(airfares)

#R output on page 104
m1 <- lm(Fare~Distance)
summary(m1)
## 
## Call:
## lm(formula = Fare ~ Distance)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.265  -4.475   1.024   2.745  26.440 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 48.971770   4.405493   11.12 1.22e-08 ***
## Distance     0.219687   0.004421   49.69  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.41 on 15 degrees of freedom
## Multiple R-squared:  0.994,  Adjusted R-squared:  0.9936 
## F-statistic:  2469 on 1 and 15 DF,  p-value: < 2.2e-16
#Figure 3.41 on page 104
par(mfrow = c(1,2))
plot(Distance,Fare)
abline(lsfit(Distance,Fare))
leverage1 <- hatvalues(m1)
StanRes1 <- rstandard(m1)
residual1 <- m1$residuals
plot(Distance,StanRes1, ylab = "Standardized Residuals")
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)

detach(airfares)
#Exercise 3.5.4
glakes <- read_excel("MARData.xlsx", sheet = "glakes")
attach(glakes)

#R output on page 107
m1 <- lm(Time ~ Tonnage)
summary(m1)
## 
## Call:
## lm(formula = Time ~ Tonnage)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -23.882  -6.397  -1.261   5.931  21.850 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.344707   2.642633   4.671 6.32e-05 ***
## Tonnage      0.006518   0.000531  12.275 5.22e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.7 on 29 degrees of freedom
## Multiple R-squared:  0.8386, Adjusted R-squared:  0.833 
## F-statistic: 150.7 on 1 and 29 DF,  p-value: 5.218e-13
#Figure 3.42 on page 106
par(mfrow = c(2,2))
plot(Tonnage, Time)
abline(lsfit(Tonnage,Time))
leverage1 <- hatvalues(m1)
StanRes1 <- rstandard(m1)
absrtsr1 <- sqrt(abs(StanRes1))
residual1 <- m1$residuals
plot(Tonnage,StanRes1, ylab = "Standardized Residuals")
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)
plot(Tonnage,absrtsr1,ylab = "Square Root(|Standardized Residuals|)")
abline(lsfit(Tonnage,absrtsr1),lty = 2,col = 1)
qqnorm(StanRes1, ylab = "Standardized Residuals")
qqline(StanRes1, lty = 2, col = 1)

#Figure 3.43 on page 107
par(mfrow = c(3,2))
plot(density(Time,bw = "SJ",kern = "gaussian"),type = "l",
main = "Gaussian kernel density estimate",xlab = "Time")
rug(Time)
boxplot(Time,ylab = "Time")
qqnorm(Time, ylab = "Time")
qqline(Time, lty = 2, col = 1)
sj <- bw.SJ(Tonnage,lower = 0.1, upper = 1000)
plot(density(Tonnage,bw = sj,kern = "gaussian"),type = "l",
main = "Gaussian kernel density estimate",xlab = "Tonnage")
rug(Tonnage)
boxplot(Tonnage,ylab = "Tonnage")
qqnorm(Tonnage, ylab  = "Tonnage")
qqline(Tonnage, lty = 2, col = 1)

#R output on page 108
library(alr3)
summary(tranxy <- powerTransform(Time ~ Tonnage))
## bcPower Transformation to Normality 
##    Est Power Rounded Pwr Wald Lwr Bnd Wald Upr Bnd
## Y1    0.5101         0.5       0.1286       0.8916
## 
## Likelihood ratio test that transformation parameter is equal to 0
##  (log transformation)
##                            LRT df     pval
## LR test, lambda = (0) 5.087807  1 0.024095
## 
## Likelihood ratio test that no transformation is needed
##                           LRT df      pval
## LR test, lambda = (1) 8.06233  1 0.0045195
#R output on page 108
m2 <- lm(log(Time) ~ I(Tonnage^0.25))
summary(m2)
## 
## Call:
## lm(formula = log(Time) ~ I(Tonnage^0.25))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6607 -0.2410 -0.0044  0.2203  0.4956 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      1.18842    0.19468   6.105  1.2e-06 ***
## I(Tonnage^0.25)  0.30910    0.02728  11.332  3.6e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3034 on 29 degrees of freedom
## Multiple R-squared:  0.8158, Adjusted R-squared:  0.8094 
## F-statistic: 128.4 on 1 and 29 DF,  p-value: 3.599e-12
#Figure 3.44 on page 108
tTime <- log(Time)
tTonnage <- Tonnage^0.25
par(mfrow = c(2,2))
plot(tTonnage,tTime,xlab = expression(Tonnage^0.25),ylab = "log(Time)")
abline(lsfit(tTonnage,tTime))
leverage2 <- hatvalues(m2)
StanRes2 <- rstandard(m2)
absrtsr2 <- sqrt(abs(StanRes2))
residual2 <- m2$residuals
plot(tTonnage,StanRes2, ylab = "Standardized Residuals",xlab = expression(Tonnage^0.25))
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)
plot(tTonnage,absrtsr2,ylab = "Square Root(|Standardized Residuals|)",xlab = expression(Tonnage^0.25))
abline(lsfit(tTonnage,absrtsr2),lty = 2,col = 1)
qqnorm(StanRes2, ylab = "Standardized Residuals")
qqline(StanRes2, lty = 2, col = 1)

#Figure 3.45 on page 109
par(mfrow = c(3,2))
plot(density(tTime,bw = "SJ",kern = "gaussian"),type = "l",
main = "Gaussian kernel density estimate",xlab = "log(Time)")
rug(tTime)
boxplot(tTime,ylab = "log(Time)")
qqnorm(tTime, ylab = "log(Time)")
qqline(tTime, lty = 2, col = 1)
sj <- bw.SJ(tTonnage,lower = 0.1, upper = 1000)
plot(density(tTonnage,bw = sj,kern = "gaussian"),type = "l",
main = "Gaussian kernel density estimate",xlab = expression(Tonnage^0.25))
rug(tTonnage)
boxplot(tTonnage,ylab = expression(Tonnage^0.25))
qqnorm(tTonnage, ylab = expression(Tonnage^0.25))
qqline(tTonnage, lty = 2, col = 1)

detach(glakes)
#Exercise 3.5.5
cars04 <- read_excel("MARData.xlsx", sheet = "cars04")
attach(cars04)

#Output from R on pages 110 and 111
m1 <- lm(SuggestedRetailPrice~DealerCost)
summary(m1)
## 
## Call:
## lm(formula = SuggestedRetailPrice ~ DealerCost)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1743.52  -262.59    74.92   265.98  2912.72 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -61.904248  81.801381  -0.757     0.45    
## DealerCost    1.088841   0.002638 412.768   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 587 on 232 degrees of freedom
## Multiple R-squared:  0.9986, Adjusted R-squared:  0.9986 
## F-statistic: 1.704e+05 on 1 and 232 DF,  p-value: < 2.2e-16
#Figure 3.46 on page 110
par(mfrow = c(2,2))
plot(DealerCost,SuggestedRetailPrice)
abline(lsfit(DealerCost,SuggestedRetailPrice),lty = 2, col = 1)
leverage1 <- hatvalues(m1)
StanRes1 <- rstandard(m1)
absrtsr1 <- sqrt(abs(StanRes1))
residual1 <- m1$residuals
plot(DealerCost,StanRes1, ylab = "Standardized Residuals")
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)
plot(DealerCost,absrtsr1,ylab = "Square Root(|Standardized Residuals|)")
abline(lsfit(DealerCost,absrtsr1),lty = 2,col = 1)
qqnorm(StanRes1, ylab = "Standardized Residuals")
qqline(StanRes1, lty = 2, col = 1)

#Output from R on page 111
m2 <- lm(log(SuggestedRetailPrice)~log(DealerCost))
summary(m2)
## 
## Call:
## lm(formula = log(SuggestedRetailPrice) ~ log(DealerCost))
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.062920 -0.008694  0.000624  0.010621  0.048798 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     -0.069459   0.026459  -2.625  0.00924 ** 
## log(DealerCost)  1.014836   0.002616 387.942  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.01865 on 232 degrees of freedom
## Multiple R-squared:  0.9985, Adjusted R-squared:  0.9985 
## F-statistic: 1.505e+05 on 1 and 232 DF,  p-value: < 2.2e-16
#Figure 3.47 on page 111
par(mfrow = c(2,2))
plot(log(DealerCost),log(SuggestedRetailPrice))
abline(lsfit(log(DealerCost),log(SuggestedRetailPrice)),lty = 1, col = 1)
leverage2 <- hatvalues(m2)
StanRes2 <- rstandard(m2)
absrtsr2 <- sqrt(abs(StanRes2))
residual2 <- m2$residuals
plot(log(DealerCost),StanRes2, ylab = "Standardized Residuals")
abline(h = 2,lty = 2)
abline(h = -2,lty = 2)
plot(log(DealerCost),absrtsr2,ylab = "Square Root(|Standardized Residuals|)")
abline(lsfit(log(DealerCost),absrtsr2),lty = 2,col = 1)
qqnorm(StanRes2, ylab = "Standardized Residuals")
qqline(StanRes2, lty = 2, col = 1)

detach(cars04)
#Exercise 3.5.6 based on a different set of generated data
n <-500
x <- runif(n,0,1)^3
e <- rnorm(n,0,0.1)
y <- exp(+2.5 + 1*x + e)

#Figure 3.48 on page 112
m1 <- lm(y~x)
library(alr3)
par(mfrow = c(1,1))
inverse.response.plot(m1,key = TRUE)
## Warning: 'inverse.response.plot' is deprecated.
## Use 'inverseResponsePlot' instead.
## See help("Deprecated") and help("alr3-deprecated").

##       lambda      RSS
## 1  0.5894914 1484.639
## 2 -1.0000000 2452.983
## 3  0.0000000 1624.527
## 4  1.0000000 1552.160