Author: Jay Liao (ID: RE6094028)

Exercise 3.14

This problem focuses on the collinearity problem.

Exercise 3.14 - (a)

Perform the following commands in R:

set.seed(1)
x1=runif(100)
x2=0.5*x1+rnorm(100)/10
y=2+2*x1+0.3*x2+rnorm(100)

The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?

set.seed(1)
x1=runif(100)
x2=0.5*x1+rnorm(100)/10
y=2+2*x1+0.3*x2+rnorm(100)

The regression coefficients are \(\beta_0=2,\ \beta_1=2\), and \(\beta_2=0.3\).

Exercise 3.14 - (b)

What is the correlation between x1 and x2? Create a scatter plot displaying the relationship between the variables.

cor(x1, x2)

[1] 0.8351212

plot(x1, x2, pch=19, main='Scatter Plot of x1 and x2')

[ANS] The correlation between x1 and x2 is 0.8351212. It is a qu

Exercise 3.14 - (c)

Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are \(\beta_0\), \(\beta_1\), and \(\beta_2\)? How do these relate to the true \(\beta_0\), \(\beta_1\), and \(\beta_2\)? Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)? How about the null hypothesis \(H_0:\ \beta_2 = 0\)?

fit_c <- lm(y ~ x1 + x2)
summary(fit_c)


Call:
lm(formula = y ~ x1 + x2)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8311 -0.7273 -0.0537  0.6338  2.3359 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.1305     0.2319   9.188 7.61e-15 ***
x1            1.4396     0.7212   1.996   0.0487 *  
x2            1.0097     1.1337   0.891   0.3754    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.056 on 97 degrees of freedom
Multiple R-squared:  0.2088,    Adjusted R-squared:  0.1925 
F-statistic:  12.8 on 2 and 97 DF,  p-value: 1.164e-05

confint(fit_c, 2:3, .95)

          2.5 %   97.5 %
x1  0.008213776 2.870897
x2 -1.240451256 3.259800

[ANS] According to the results of regression analysis, x1 can significantly (\(\alpha = .05\)) predict y while x2 cannot. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95%\) CI of \(\beta_1\) does not cover 0, while we cannot reject \(H_0:\ \beta_2 = 0\) since the \(95\%\) CI of \(\beta_2\) covers 0.

Exercise 3.14 - (d)

Now fit a least squares regression to predict y using only x1. Comment on your results. Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)?

fit_d <- lm(y ~ x1)
summary(fit_d)


Call:
lm(formula = y ~ x1)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.89495 -0.66874 -0.07785  0.59221  2.45560 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.1124     0.2307   9.155 8.27e-15 ***
x1            1.9759     0.3963   4.986 2.66e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.055 on 98 degrees of freedom
Multiple R-squared:  0.2024,    Adjusted R-squared:  0.1942 
F-statistic: 24.86 on 1 and 98 DF,  p-value: 2.661e-06

confint(fit_d, 2, .95)

      2.5 %   97.5 %
x1 1.189529 2.762329

[ANS] According to the results of regression analysis, x1 can significantly (\(\alpha = .05\)) predict y. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95\%\) CI of \(\beta_1\) does not cover 0.

Exercise 3.14 - (e)

Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)?

fit_e <- lm(y ~ x2)
summary(fit_e)


Call:
lm(formula = y ~ x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.62687 -0.75156 -0.03598  0.72383  2.44890 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3899     0.1949   12.26  < 2e-16 ***
x2            2.8996     0.6330    4.58 1.37e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.072 on 98 degrees of freedom
Multiple R-squared:  0.1763,    Adjusted R-squared:  0.1679 
F-statistic: 20.98 on 1 and 98 DF,  p-value: 1.366e-05

confint(fit_e, 2, .95)

      2.5 %   97.5 %
x2 1.643324 4.155846

[ANS] According to the results of regression analysis, x2 can significantly (\(\alpha = .05\)) predict y. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95%\) CI of \(\beta_1\) does not cover 0.

Exercise 3.14 - (f)

Do the results obtained in (c)–(e) contradict each other? Explain your answer.

[ANS] 相較於\(X_2\)，\(X_1\)對\(Y\)的解釋力較強，是比較有用的預測子（predictor），故同時將\(X_1\)與\(X_2\)放入模型時，\(X_1\)的效果顯著而\(X_2\)的不顯著，這是因為\(X_2\)與\(X_1\)呈現高度正相關（\(r_{X_1,X_2}=\) 0.8351212）。也因此，當將\(X_1\)與\(X_2\)各自放入模型時，\(X_1\)的效果依然較\(X_2\)的效果明顯（p-value較小），\(X_1\)的效果的係數估計值也較接近真實值（\(X_2\)的係數估計不準確，連95%信賴區間都抓不到真實值），這就是所謂共線性的問題。

Exercise 3.14 - (g)

Now suppose we obtain one additional observation, which was unfortunately mismeasured.

x1=c(x1, 0.1)
x2=c(x2, 0.8)
y=c(y,6)

Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.

Re-fit and plot the scatter plots

fit_c2 <- lm(y ~ x1 + x2)
summary(fit_c2)


Call:
lm(formula = y ~ x1 + x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.73348 -0.69318 -0.05263  0.66385  2.30619 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2267     0.2314   9.624 7.91e-16 ***
x1            0.5394     0.5922   0.911  0.36458    
x2            2.5146     0.8977   2.801  0.00614 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.075 on 98 degrees of freedom
Multiple R-squared:  0.2188,    Adjusted R-squared:  0.2029 
F-statistic: 13.72 on 2 and 98 DF,  p-value: 5.564e-06

confint(fit_c2, 2:3, .95)

        2.5 %   97.5 %
x1 -0.6357561 1.714635
x2  0.7331298 4.296009

fit_d2 <- lm(y ~ x1)
summary(fit_d2)


Call:
lm(formula = y ~ x1)

Residuals:
    Min      1Q  Median      3Q     Max 
-2.8897 -0.6556 -0.0909  0.5682  3.5665 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.2569     0.2390   9.445 1.78e-15 ***
x1            1.7657     0.4124   4.282 4.29e-05 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.111 on 99 degrees of freedom
Multiple R-squared:  0.1562,    Adjusted R-squared:  0.1477 
F-statistic: 18.33 on 1 and 99 DF,  p-value: 4.295e-05

confint(fit_d2, 2, .95)

       2.5 %   97.5 %
x1 0.9474479 2.583943

plot(x1, y, pch=19)
abline(fit_d2, lwd = 2.5, col = 'red')

fit_e2 <- lm(y ~ x2)
summary(fit_e2)


Call:
lm(formula = y ~ x2)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.64729 -0.71021 -0.06899  0.72699  2.38074 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   2.3451     0.1912  12.264  < 2e-16 ***
x2            3.1190     0.6040   5.164 1.25e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.074 on 99 degrees of freedom
Multiple R-squared:  0.2122,    Adjusted R-squared:  0.2042 
F-statistic: 26.66 on 1 and 99 DF,  p-value: 1.253e-06

confint(fit_e2, 2, .95)

      2.5 %   97.5 %
x2 1.920513 4.317586

plot(x2, y, pch=19)
abline(fit_e2, lwd = 2.5, col = 'red')

Plot with outliers labels

dta_g <- data.frame(x1, x2, y, new_point = FALSE)
dta_g$new_point[nrow(dta_g)] <- TRUE

library(ggplot2)
qplot(x1, x2, col = factor(new_point),
      data = dta_g, geom = c('point', 'smooth')) +
  scale_color_manual(values = 1:2) +
  theme_bw() + theme(legend.position = 'top')

qplot(x1, y, col = factor(new_point),
      data = dta_g, geom = c('point', 'smooth')) +
  scale_color_manual(values = 1:2) +
  theme_bw() + theme(legend.position = 'top')

qplot(x2, y, col = factor(new_point),
      data = dta_g, geom = c('point', 'smooth')) +
  scale_color_manual(values = 1:2) +
  theme_bw() + theme(legend.position = 'top')

Plot with the leverage of a point

library(broom)
library(dplyr)

augment(lm(y ~ x1 + x2)) %>%
  cbind(new = dta_g$new_point) %>%
  ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) + 
  geom_point() + 
  geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
  scale_color_manual(values = 1:2) + 
  theme_light() + theme(legend.position = "top") +
  labs(x = "Leverage",y = "Standardized Residual", 
       title = "Residuals vs Leverage: y ~ x1 + x2")

augment(lm(y ~ x1)) %>%
  cbind(new = dta_g$new_point) %>%
  ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) + 
  geom_point() + 
  geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
  scale_color_manual(values = 1:2) + 
  theme_light() + theme(legend.position = "top") +
  labs(x = "Leverage",y = "Standardized Residual", 
       title = "Residuals vs Leverage: y ~ x1")

augment(lm(y ~ x2)) %>%
  cbind(new = dta_g$new_point) %>%
  ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) + 
  geom_point() + 
  geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) + 
  geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
  scale_color_manual(values = 1:2) + 
  theme_light() + theme(legend.position = "top") +
  labs(x = "Leverage",y = "Standardized Residual", 
       title = "Residuals vs Leverage: y ~ x2")

加入\((X_1,X_2,Y)=(0.1,0.8,6)\)這個觀測值後，各模型的分析結果與加入前的均相反：相較於\(X_1\)，\(X_2\)對\(Y\)的解釋力較強，是比較有用的預測子（predictor），故同時將\(X_1\)與\(X_2\)放入模型時，\(X_2\)的效果顯著而\(X_1\)的不顯著。也因此，當將\(X_1\)與\(X_2\)各自放入模型時，\(X_2\)的效果依然較\(X_1\)的效果明顯（p-value較小）。

Exercise 3.15

This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.

library(MASS)
library(corrplot)
data(Boston)

Exercise 3.15 - (a)

For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.

fit_a <- list(NA)
for (k in 1:13) {
  fit_a[[k]] <- lm(Boston$crim ~ Boston[,k+1])
  print(colnames(Boston)[k+1])
  print(summary(fit_a[[k]]))
  print(round(confint(fit_a[[k]], 2, .95)) ,4)
}

[1] "zn"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-4.429 -4.222 -2.620  1.250 84.523 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      4.45369    0.41722  10.675  < 2e-16 ***
Boston[, k + 1] -0.07393    0.01609  -4.594 5.51e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared:  0.04019,   Adjusted R-squared:  0.03828 
F-statistic:  21.1 on 1 and 504 DF,  p-value: 5.506e-06

                2.5 % 97.5 %
Boston[, k + 1]     0      0
[1] "indus"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-11.972  -2.698  -0.736   0.712  81.813 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -2.06374    0.66723  -3.093  0.00209 ** 
Boston[, k + 1]  0.50978    0.05102   9.991  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared:  0.1653,    Adjusted R-squared:  0.1637 
F-statistic: 99.82 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      1
[1] "chas"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-3.738 -3.661 -3.435  0.018 85.232 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.7444     0.3961   9.453   <2e-16 ***
Boston[, k + 1]  -1.8928     1.5061  -1.257    0.209    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared:  0.003124,  Adjusted R-squared:  0.001146 
F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094

                2.5 % 97.5 %
Boston[, k + 1]    -5      1
[1] "nox"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-12.371  -2.738  -0.974   0.559  81.728 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)      -13.720      1.699  -8.073 5.08e-15 ***
Boston[, k + 1]   31.249      2.999  10.419  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared:  0.1772,    Adjusted R-squared:  0.1756 
F-statistic: 108.6 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]    25     37
[1] "rm"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-6.604 -3.952 -2.654  0.989 87.197 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       20.482      3.365   6.088 2.27e-09 ***
Boston[, k + 1]   -2.684      0.532  -5.045 6.35e-07 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared:  0.04807,   Adjusted R-squared:  0.04618 
F-statistic: 25.45 on 1 and 504 DF,  p-value: 6.347e-07

                2.5 % 97.5 %
Boston[, k + 1]    -4     -2
[1] "age"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-6.789 -4.257 -1.230  1.527 82.849 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -3.77791    0.94398  -4.002 7.22e-05 ***
Boston[, k + 1]  0.10779    0.01274   8.463 2.85e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared:  0.1244,    Adjusted R-squared:  0.1227 
F-statistic: 71.62 on 1 and 504 DF,  p-value: 2.855e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      0
[1] "dis"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-6.708 -4.134 -1.527  1.516 81.674 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)       9.4993     0.7304  13.006   <2e-16 ***
Boston[, k + 1]  -1.5509     0.1683  -9.213   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared:  0.1441,    Adjusted R-squared:  0.1425 
F-statistic: 84.89 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]    -2     -1
[1] "rad"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-10.164  -1.381  -0.141   0.660  76.433 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -2.28716    0.44348  -5.157 3.61e-07 ***
Boston[, k + 1]  0.61791    0.03433  17.998  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared:  0.3913,    Adjusted R-squared:   0.39 
F-statistic: 323.9 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     1      1
[1] "tax"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-12.513  -2.738  -0.194   1.065  77.696 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -8.528369   0.815809  -10.45   <2e-16 ***
Boston[, k + 1]  0.029742   0.001847   16.10   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared:  0.3396,    Adjusted R-squared:  0.3383 
F-statistic: 259.2 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      0
[1] "ptratio"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-7.654 -3.985 -1.912  1.825 83.353 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -17.6469     3.1473  -5.607 3.40e-08 ***
Boston[, k + 1]   1.1520     0.1694   6.801 2.94e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared:  0.08407,   Adjusted R-squared:  0.08225 
F-statistic: 46.26 on 1 and 504 DF,  p-value: 2.943e-11

                2.5 % 97.5 %
Boston[, k + 1]     1      1
[1] "black"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-13.756  -2.299  -2.095  -1.296  86.822 

Coefficients:
                 Estimate Std. Error t value Pr(>|t|)    
(Intercept)     16.553529   1.425903  11.609   <2e-16 ***
Boston[, k + 1] -0.036280   0.003873  -9.367   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.946 on 504 degrees of freedom
Multiple R-squared:  0.1483,    Adjusted R-squared:  0.1466 
F-statistic: 87.74 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      0
[1] "lstat"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
    Min      1Q  Median      3Q     Max 
-13.925  -2.822  -0.664   1.079  82.862 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     -3.33054    0.69376  -4.801 2.09e-06 ***
Boston[, k + 1]  0.54880    0.04776  11.491  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared:  0.2076,    Adjusted R-squared:  0.206 
F-statistic:   132 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      1
[1] "medv"

Call:
lm(formula = Boston$crim ~ Boston[, k + 1])

Residuals:
   Min     1Q Median     3Q    Max 
-9.071 -4.022 -2.343  1.298 80.957 

Coefficients:
                Estimate Std. Error t value Pr(>|t|)    
(Intercept)     11.79654    0.93419   12.63   <2e-16 ***
Boston[, k + 1] -0.36316    0.03839   -9.46   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared:  0.1508,    Adjusted R-squared:  0.1491 
F-statistic: 89.49 on 1 and 504 DF,  p-value: < 2.2e-16

                2.5 % 97.5 %
Boston[, k + 1]     0      0

依據這13個以crim為依變項之單迴歸模型的分析結果，可以發現僅有chas此變項對crim的主效果未達統計顯著，其餘12個變項的主效果皆達統計顯著，顯示這些變項對於預測chas應是有幫助的，其中，當獨變項是rad和tax時，迴歸模型大於\(.3\)，比其他模型高許多，指出rad和tax與crim的相關程度應更大。

下面幾張圖呈現了這些變項與crim的相關。

par(mfrow=c(2,2))
for (k in c(1,2,4:13)) {
  plot(Boston[,k+1], Boston$crim, xlab='crim', ylab=colnames(Boston)[k+1],
       main=paste0('Scatter Plot of crim and ', colnames(Boston)[k+1]))
}

par(mfrow=c(1,1))
corrplot(cor(Boston[,-4]))

此圖顯示rad和tax與crim的相關程度比其他變項與crim的相關程度大。

Exercise 3.15 - (b)

Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis \(H_0:\ \beta_j = 0\)?

fit_b <- lm(crim ~ . -crim, data=Boston)
summary(fit_b)


Call:
lm(formula = crim ~ . - crim, data = Boston)

Residuals:
   Min     1Q Median     3Q    Max 
-9.924 -2.120 -0.353  1.019 75.051 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  17.033228   7.234903   2.354 0.018949 *  
zn            0.044855   0.018734   2.394 0.017025 *  
indus        -0.063855   0.083407  -0.766 0.444294    
chas         -0.749134   1.180147  -0.635 0.525867    
nox         -10.313535   5.275536  -1.955 0.051152 .  
rm            0.430131   0.612830   0.702 0.483089    
age           0.001452   0.017925   0.081 0.935488    
dis          -0.987176   0.281817  -3.503 0.000502 ***
rad           0.588209   0.088049   6.680 6.46e-11 ***
tax          -0.003780   0.005156  -0.733 0.463793    
ptratio      -0.271081   0.186450  -1.454 0.146611    
black        -0.007538   0.003673  -2.052 0.040702 *  
lstat         0.126211   0.075725   1.667 0.096208 .  
medv         -0.198887   0.060516  -3.287 0.001087 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.439 on 492 degrees of freedom
Multiple R-squared:  0.454, Adjusted R-squared:  0.4396 
F-statistic: 31.47 on 13 and 492 DF,  p-value: < 2.2e-16

round(confint(fit_b, 2:14, .95), 4)

           2.5 %  97.5 %
zn        0.0080  0.0817
indus    -0.2277  0.1000
chas     -3.0679  1.5696
nox     -20.6789  0.0518
rm       -0.7740  1.6342
age      -0.0338  0.0367
dis      -1.5409 -0.4335
rad       0.4152  0.7612
tax      -0.0139  0.0063
ptratio  -0.6374  0.0953
black    -0.0148 -0.0003
lstat    -0.0226  0.2750
medv     -0.3178 -0.0800

在此複迴歸模型中，zn, dis, rad, black與medv此4個變項的效果達統計顯著（\(\alpha=.05\)），針對這些變項係數的假設檢定，我們可以拒絕\(H_0:\beta_j=0\)，也就是說，這幾個獨變項的迴歸係數\(\beta_j\)的\(1-\alpha=95%\)信賴區間不包含0。

Exercise 3.15 - (c)

How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.

coef_a <- NA
for (k in 1:13) {coef_a <- c(coef_a, fit_a[[k]]$coefficients[2])}
coef_b <- fit_b$coefficients[-1]
coef_a

                Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] 
             NA     -0.07393498      0.50977633     -1.89277655     31.24853120 
Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] 
    -2.68405122      0.10778623     -1.55090168      0.61791093      0.02974225 
Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] 
     1.15198279     -0.03627964      0.54880478     -0.36315992

coef_b

           zn         indus          chas           nox            rm 
  0.044855215  -0.063854824  -0.749133611 -10.313534912   0.430130506 
          age           dis           rad           tax       ptratio 
  0.001451643  -0.987175726   0.588208591  -0.003780016  -0.271080558 
        black         lstat          medv 
 -0.007537505   0.126211376  -0.198886821

原本在單迴歸模型中效果顯著的變項在這裡不顯著了，很可能是因為這些變項彼此間也有一定程度的相關，所以某變項可解釋的部分被與之有相關的另一變項解釋掉了，此變項的效果也就未達統計顯著了。

plot(coef_a[-1], coef_b,
     xlab='the univariate regression coefficients',
     ylab='the multiple regression coefficients')

此圖顯顯示單迴歸係數與複迴歸係數有一outlier使得整體資料呈現高度負相關。

Exercise 3.15 - (d)

Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor \(X\), fit a model of the form \(Y = \beta_0 + \beta_1X + \beta_2X^2 + \beta_3X^3 + \epsilon\).

fit_d <- list(NA)
for (k in 1:13) {
  X <- Boston[,k+1]
  fit_d[[k]] <- lm(Boston$crim ~ I(X) + I(X^2) + I(X^3))
  print(colnames(Boston)[k+1])
  print(summary(fit_d[[k]]))
  print(round(confint(fit_d[[k]], 2:4, .95),4))
}

[1] "zn"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-4.821 -4.614 -1.294  0.473 84.130 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.846e+00  4.330e-01  11.192  < 2e-16 ***
I(X)        -3.322e-01  1.098e-01  -3.025  0.00261 ** 
I(X^2)       6.483e-03  3.861e-03   1.679  0.09375 .  
I(X^3)      -3.776e-05  3.139e-05  -1.203  0.22954    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared:  0.05824,   Adjusted R-squared:  0.05261 
F-statistic: 10.35 on 3 and 502 DF,  p-value: 1.281e-06

         2.5 %  97.5 %
I(X)   -0.5479 -0.1164
I(X^2) -0.0011  0.0141
I(X^3) -0.0001  0.0000
[1] "indus"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-8.278 -2.514  0.054  0.764 79.713 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  3.6625683  1.5739833   2.327   0.0204 *  
I(X)        -1.9652129  0.4819901  -4.077 5.30e-05 ***
I(X^2)       0.2519373  0.0393221   6.407 3.42e-10 ***
I(X^3)      -0.0069760  0.0009567  -7.292 1.20e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared:  0.2597,    Adjusted R-squared:  0.2552 
F-statistic: 58.69 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 %  97.5 %
I(X)   -2.9122 -1.0182
I(X^2)  0.1747  0.3292
I(X^3) -0.0089 -0.0051
[1] "chas"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-3.738 -3.661 -3.435  0.018 85.232 

Coefficients: (2 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   3.7444     0.3961   9.453   <2e-16 ***
I(X)         -1.8928     1.5061  -1.257    0.209    
I(X^2)            NA         NA      NA       NA    
I(X^3)            NA         NA      NA       NA    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared:  0.003124,  Adjusted R-squared:  0.001146 
F-statistic: 1.579 on 1 and 504 DF,  p-value: 0.2094

         2.5 % 97.5 %
I(X)   -4.8518 1.0663
I(X^2)      NA     NA
I(X^3)      NA     NA
[1] "nox"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-9.110 -2.068 -0.255  0.739 78.302 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   233.09      33.64   6.928 1.31e-11 ***
I(X)        -1279.37     170.40  -7.508 2.76e-13 ***
I(X^2)       2248.54     279.90   8.033 6.81e-15 ***
I(X^3)      -1245.70     149.28  -8.345 6.96e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared:  0.297, Adjusted R-squared:  0.2928 
F-statistic: 70.69 on 3 and 502 DF,  p-value: < 2.2e-16

           2.5 %    97.5 %
I(X)   -1614.151 -944.5912
I(X^2)  1698.626 2798.4624
I(X^3) -1538.997 -952.4091
[1] "rm"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-18.485  -3.468  -2.221  -0.015  87.219 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)  
(Intercept) 112.6246    64.5172   1.746   0.0815 .
I(X)        -39.1501    31.3115  -1.250   0.2118  
I(X^2)        4.5509     5.0099   0.908   0.3641  
I(X^3)       -0.1745     0.2637  -0.662   0.5086  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared:  0.06779,   Adjusted R-squared:  0.06222 
F-statistic: 12.17 on 3 and 502 DF,  p-value: 1.067e-07

           2.5 %  97.5 %
I(X)   -100.6679 22.3676
I(X^2)   -5.2920 14.3938
I(X^3)   -0.6927  0.3437
[1] "age"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-9.762 -2.673 -0.516  0.019 82.842 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)   
(Intercept) -2.549e+00  2.769e+00  -0.920  0.35780   
I(X)         2.737e-01  1.864e-01   1.468  0.14266   
I(X^2)      -7.230e-03  3.637e-03  -1.988  0.04738 * 
I(X^3)       5.745e-05  2.109e-05   2.724  0.00668 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared:  0.1742,    Adjusted R-squared:  0.1693 
F-statistic: 35.31 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 %  97.5 %
I(X)   -0.0925  0.6398
I(X^2) -0.0144 -0.0001
I(X^3)  0.0000  0.0001
[1] "dis"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-10.757  -2.588   0.031   1.267  76.378 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  30.0476     2.4459  12.285  < 2e-16 ***
I(X)        -15.5543     1.7360  -8.960  < 2e-16 ***
I(X^2)        2.4521     0.3464   7.078 4.94e-12 ***
I(X^3)       -0.1186     0.0204  -5.814 1.09e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared:  0.2778,    Adjusted R-squared:  0.2735 
F-statistic: 64.37 on 3 and 502 DF,  p-value: < 2.2e-16

          2.5 %   97.5 %
I(X)   -18.9650 -12.1437
I(X^2)   1.7715   3.1327
I(X^3)  -0.1587  -0.0785
[1] "rad"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-10.381  -0.412  -0.269   0.179  76.217 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545   2.050108  -0.295    0.768
I(X)         0.512736   1.043597   0.491    0.623
I(X^2)      -0.075177   0.148543  -0.506    0.613
I(X^3)       0.003209   0.004564   0.703    0.482

Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared:    0.4, Adjusted R-squared:  0.3965 
F-statistic: 111.6 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 % 97.5 %
I(X)   -1.5376 2.5631
I(X^2) -0.3670 0.2167
I(X^3) -0.0058 0.0122
[1] "tax"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-13.273  -1.389   0.046   0.536  76.950 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)
(Intercept)  1.918e+01  1.180e+01   1.626    0.105
I(X)        -1.533e-01  9.568e-02  -1.602    0.110
I(X^2)       3.608e-04  2.425e-04   1.488    0.137
I(X^3)      -2.204e-07  1.889e-07  -1.167    0.244

Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared:  0.3689,    Adjusted R-squared:  0.3651 
F-statistic:  97.8 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 % 97.5 %
I(X)   -0.3413 0.0347
I(X^2) -0.0001 0.0008
I(X^3)  0.0000 0.0000
[1] "ptratio"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
   Min     1Q Median     3Q    Max 
-6.833 -4.146 -1.655  1.408 82.697 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)   
(Intercept) 477.18405  156.79498   3.043  0.00246 **
I(X)        -82.36054   27.64394  -2.979  0.00303 **
I(X^2)        4.63535    1.60832   2.882  0.00412 **
I(X^3)       -0.08476    0.03090  -2.743  0.00630 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared:  0.1138,    Adjusted R-squared:  0.1085 
F-statistic: 21.48 on 3 and 502 DF,  p-value: 4.171e-13

           2.5 %   97.5 %
I(X)   -136.6726 -28.0485
I(X^2)    1.4755   7.7952
I(X^3)   -0.1455  -0.0241
[1] "black"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-13.096  -2.343  -2.128  -1.439  86.790 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept)  1.826e+01  2.305e+00   7.924  1.5e-14 ***
I(X)        -8.356e-02  5.633e-02  -1.483    0.139    
I(X^2)       2.137e-04  2.984e-04   0.716    0.474    
I(X^3)      -2.652e-07  4.364e-07  -0.608    0.544    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.955 on 502 degrees of freedom
Multiple R-squared:  0.1498,    Adjusted R-squared:  0.1448 
F-statistic: 29.49 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 % 97.5 %
I(X)   -0.1942 0.0271
I(X^2) -0.0004 0.0008
I(X^3)  0.0000 0.0000
[1] "lstat"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-15.234  -2.151  -0.486   0.066  83.353 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)  
(Intercept)  1.2009656  2.0286452   0.592   0.5541  
I(X)        -0.4490656  0.4648911  -0.966   0.3345  
I(X^2)       0.0557794  0.0301156   1.852   0.0646 .
I(X^3)      -0.0008574  0.0005652  -1.517   0.1299  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared:  0.2179,    Adjusted R-squared:  0.2133 
F-statistic: 46.63 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 % 97.5 %
I(X)   -1.3624 0.4643
I(X^2) -0.0034 0.1149
I(X^3) -0.0020 0.0003
[1] "medv"

Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))

Residuals:
    Min      1Q  Median      3Q     Max 
-24.427  -1.976  -0.437   0.439  73.655 

Coefficients:
              Estimate Std. Error t value Pr(>|t|)    
(Intercept) 53.1655381  3.3563105  15.840  < 2e-16 ***
I(X)        -5.0948305  0.4338321 -11.744  < 2e-16 ***
I(X^2)       0.1554965  0.0171904   9.046  < 2e-16 ***
I(X^3)      -0.0014901  0.0002038  -7.312 1.05e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared:  0.4202,    Adjusted R-squared:  0.4167 
F-statistic: 121.3 on 3 and 502 DF,  p-value: < 2.2e-16

         2.5 %  97.5 %
I(X)   -5.9472 -4.2425
I(X^2)  0.1217  0.1893
I(X^3) -0.0019 -0.0011

這些變項中，indus, nox, age, dis, ptratio與medv的次方項和立方項的效果皆達統計顯著。

Machine Learning - HW3