Machine Learning - HW3
Author: Jay Liao (ID: RE6094028)
Exercise 3.14
This problem focuses on the collinearity problem.
Exercise 3.14 - (a)
Perform the following commands in R:
The last line corresponds to creating a linear model in which y is a function of x1 and x2. Write out the form of the linear model. What are the regression coefficients?
The regression coefficients are \(\beta_0=2,\ \beta_1=2\), and \(\beta_2=0.3\).
Exercise 3.14 - (b)
What is the correlation between x1 and x2? Create a scatter plot displaying the relationship between the variables.
[1] 0.8351212
[ANS] The correlation between x1 and x2 is 0.8351212. It is a qu
Exercise 3.14 - (c)
Using this data, fit a least squares regression to predict y using x1 and x2. Describe the results obtained. What are \(\beta_0\), \(\beta_1\), and \(\beta_2\)? How do these relate to the true \(\beta_0\), \(\beta_1\), and \(\beta_2\)? Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)? How about the null hypothesis \(H_0:\ \beta_2 = 0\)?
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-2.8311 -0.7273 -0.0537 0.6338 2.3359
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1305 0.2319 9.188 7.61e-15 ***
x1 1.4396 0.7212 1.996 0.0487 *
x2 1.0097 1.1337 0.891 0.3754
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.056 on 97 degrees of freedom
Multiple R-squared: 0.2088, Adjusted R-squared: 0.1925
F-statistic: 12.8 on 2 and 97 DF, p-value: 1.164e-05
2.5 % 97.5 %
x1 0.008213776 2.870897
x2 -1.240451256 3.259800
[ANS] According to the results of regression analysis, x1 can significantly (\(\alpha = .05\)) predict y while x2 cannot. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95%\) CI of \(\beta_1\) does not cover 0, while we cannot reject \(H_0:\ \beta_2 = 0\) since the \(95\%\) CI of \(\beta_2\) covers 0.
Exercise 3.14 - (d)
Now fit a least squares regression to predict y using only x1. Comment on your results. Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)?
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.89495 -0.66874 -0.07785 0.59221 2.45560
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.1124 0.2307 9.155 8.27e-15 ***
x1 1.9759 0.3963 4.986 2.66e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.055 on 98 degrees of freedom
Multiple R-squared: 0.2024, Adjusted R-squared: 0.1942
F-statistic: 24.86 on 1 and 98 DF, p-value: 2.661e-06
2.5 % 97.5 %
x1 1.189529 2.762329
[ANS] According to the results of regression analysis, x1 can significantly (\(\alpha = .05\)) predict y. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95\%\) CI of \(\beta_1\) does not cover 0.
Exercise 3.14 - (e)
Now fit a least squares regression to predict y using only x2. Comment on your results. Can you reject the null hypothesis \(H_0:\ \beta_1 = 0\)?
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-2.62687 -0.75156 -0.03598 0.72383 2.44890
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3899 0.1949 12.26 < 2e-16 ***
x2 2.8996 0.6330 4.58 1.37e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.072 on 98 degrees of freedom
Multiple R-squared: 0.1763, Adjusted R-squared: 0.1679
F-statistic: 20.98 on 1 and 98 DF, p-value: 1.366e-05
2.5 % 97.5 %
x2 1.643324 4.155846
[ANS] According to the results of regression analysis, x2 can significantly (\(\alpha = .05\)) predict y. That is, we can reject \(H_0:\ \beta_1 = 0\) since the \(95%\) CI of \(\beta_1\) does not cover 0.
Exercise 3.14 - (f)
Do the results obtained in (c)–(e) contradict each other? Explain your answer.
[ANS] 相較於\(X_2\),\(X_1\)對\(Y\)的解釋力較強,是比較有用的預測子(predictor),故同時將\(X_1\)與\(X_2\)放入模型時,\(X_1\)的效果顯著而\(X_2\)的不顯著,這是因為\(X_2\)與\(X_1\)呈現高度正相關(\(r_{X_1,X_2}=\) 0.8351212)。也因此,當將\(X_1\)與\(X_2\)各自放入模型時,\(X_1\)的效果依然較\(X_2\)的效果明顯(p-value較小),\(X_1\)的效果的係數估計值也較接近真實值(\(X_2\)的係數估計不準確,連95%信賴區間都抓不到真實值),這就是所謂共線性的問題。
Exercise 3.14 - (g)
Now suppose we obtain one additional observation, which was unfortunately mismeasured.
Re-fit the linear models from (c) to (e) using this new data. What effect does this new observation have on the each of the models? In each model, is this observation an outlier? A high-leverage point? Both? Explain your answers.
Re-fit and plot the scatter plots
Call:
lm(formula = y ~ x1 + x2)
Residuals:
Min 1Q Median 3Q Max
-2.73348 -0.69318 -0.05263 0.66385 2.30619
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2267 0.2314 9.624 7.91e-16 ***
x1 0.5394 0.5922 0.911 0.36458
x2 2.5146 0.8977 2.801 0.00614 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.075 on 98 degrees of freedom
Multiple R-squared: 0.2188, Adjusted R-squared: 0.2029
F-statistic: 13.72 on 2 and 98 DF, p-value: 5.564e-06
2.5 % 97.5 %
x1 -0.6357561 1.714635
x2 0.7331298 4.296009
Call:
lm(formula = y ~ x1)
Residuals:
Min 1Q Median 3Q Max
-2.8897 -0.6556 -0.0909 0.5682 3.5665
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.2569 0.2390 9.445 1.78e-15 ***
x1 1.7657 0.4124 4.282 4.29e-05 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.111 on 99 degrees of freedom
Multiple R-squared: 0.1562, Adjusted R-squared: 0.1477
F-statistic: 18.33 on 1 and 99 DF, p-value: 4.295e-05
2.5 % 97.5 %
x1 0.9474479 2.583943
Call:
lm(formula = y ~ x2)
Residuals:
Min 1Q Median 3Q Max
-2.64729 -0.71021 -0.06899 0.72699 2.38074
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.3451 0.1912 12.264 < 2e-16 ***
x2 3.1190 0.6040 5.164 1.25e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.074 on 99 degrees of freedom
Multiple R-squared: 0.2122, Adjusted R-squared: 0.2042
F-statistic: 26.66 on 1 and 99 DF, p-value: 1.253e-06
2.5 % 97.5 %
x2 1.920513 4.317586
Plot with outliers labels
library(ggplot2)
qplot(x1, x2, col = factor(new_point),
data = dta_g, geom = c('point', 'smooth')) +
scale_color_manual(values = 1:2) +
theme_bw() + theme(legend.position = 'top')qplot(x1, y, col = factor(new_point),
data = dta_g, geom = c('point', 'smooth')) +
scale_color_manual(values = 1:2) +
theme_bw() + theme(legend.position = 'top')qplot(x2, y, col = factor(new_point),
data = dta_g, geom = c('point', 'smooth')) +
scale_color_manual(values = 1:2) +
theme_bw() + theme(legend.position = 'top')Plot with the leverage of a point
library(broom)
library(dplyr)
augment(lm(y ~ x1 + x2)) %>%
cbind(new = dta_g$new_point) %>%
ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) +
geom_point() +
geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
scale_color_manual(values = 1:2) +
theme_light() + theme(legend.position = "top") +
labs(x = "Leverage",y = "Standardized Residual",
title = "Residuals vs Leverage: y ~ x1 + x2")augment(lm(y ~ x1)) %>%
cbind(new = dta_g$new_point) %>%
ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) +
geom_point() +
geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
scale_color_manual(values = 1:2) +
theme_light() + theme(legend.position = "top") +
labs(x = "Leverage",y = "Standardized Residual",
title = "Residuals vs Leverage: y ~ x1")augment(lm(y ~ x2)) %>%
cbind(new = dta_g$new_point) %>%
ggplot(aes(x = .hat, y = .std.resid, col = factor(new))) +
geom_point() +
geom_hline(yintercept = -2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_hline(yintercept = 2, col = "deepskyblue", size = 1, alpha = 0.5) +
geom_vline(xintercept = 3 / nrow(dta_g), col = "mediumseagreen", size = 1, alpha = 0.5) +
scale_color_manual(values = 1:2) +
theme_light() + theme(legend.position = "top") +
labs(x = "Leverage",y = "Standardized Residual",
title = "Residuals vs Leverage: y ~ x2")加入\((X_1,X_2,Y)=(0.1,0.8,6)\)這個觀測值後,各模型的分析結果與加入前的均相反:相較於\(X_1\),\(X_2\)對\(Y\)的解釋力較強,是比較有用的預測子(predictor),故同時將\(X_1\)與\(X_2\)放入模型時,\(X_2\)的效果顯著而\(X_1\)的不顯著。也因此,當將\(X_1\)與\(X_2\)各自放入模型時,\(X_2\)的效果依然較\(X_1\)的效果明顯(p-value較小)。
Exercise 3.15
This problem involves the Boston data set, which we saw in the lab for this chapter. We will now try to predict per capita crime rate using the other variables in this data set. In other words, per capita crime rate is the response, and the other variables are the predictors.
Exercise 3.15 - (a)
For each predictor, fit a simple linear regression model to predict the response. Describe your results. In which of the models is there a statistically significant association between the predictor and the response? Create some plots to back up your assertions.
fit_a <- list(NA)
for (k in 1:13) {
fit_a[[k]] <- lm(Boston$crim ~ Boston[,k+1])
print(colnames(Boston)[k+1])
print(summary(fit_a[[k]]))
print(round(confint(fit_a[[k]], 2, .95)) ,4)
}[1] "zn"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-4.429 -4.222 -2.620 1.250 84.523
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.45369 0.41722 10.675 < 2e-16 ***
Boston[, k + 1] -0.07393 0.01609 -4.594 5.51e-06 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.435 on 504 degrees of freedom
Multiple R-squared: 0.04019, Adjusted R-squared: 0.03828
F-statistic: 21.1 on 1 and 504 DF, p-value: 5.506e-06
2.5 % 97.5 %
Boston[, k + 1] 0 0
[1] "indus"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-11.972 -2.698 -0.736 0.712 81.813
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.06374 0.66723 -3.093 0.00209 **
Boston[, k + 1] 0.50978 0.05102 9.991 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.866 on 504 degrees of freedom
Multiple R-squared: 0.1653, Adjusted R-squared: 0.1637
F-statistic: 99.82 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 0 1
[1] "chas"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
Boston[, k + 1] -1.8928 1.5061 -1.257 0.209
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
2.5 % 97.5 %
Boston[, k + 1] -5 1
[1] "nox"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-12.371 -2.738 -0.974 0.559 81.728
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -13.720 1.699 -8.073 5.08e-15 ***
Boston[, k + 1] 31.249 2.999 10.419 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.81 on 504 degrees of freedom
Multiple R-squared: 0.1772, Adjusted R-squared: 0.1756
F-statistic: 108.6 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 25 37
[1] "rm"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-6.604 -3.952 -2.654 0.989 87.197
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 20.482 3.365 6.088 2.27e-09 ***
Boston[, k + 1] -2.684 0.532 -5.045 6.35e-07 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.401 on 504 degrees of freedom
Multiple R-squared: 0.04807, Adjusted R-squared: 0.04618
F-statistic: 25.45 on 1 and 504 DF, p-value: 6.347e-07
2.5 % 97.5 %
Boston[, k + 1] -4 -2
[1] "age"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-6.789 -4.257 -1.230 1.527 82.849
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.77791 0.94398 -4.002 7.22e-05 ***
Boston[, k + 1] 0.10779 0.01274 8.463 2.85e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.057 on 504 degrees of freedom
Multiple R-squared: 0.1244, Adjusted R-squared: 0.1227
F-statistic: 71.62 on 1 and 504 DF, p-value: 2.855e-16
2.5 % 97.5 %
Boston[, k + 1] 0 0
[1] "dis"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-6.708 -4.134 -1.527 1.516 81.674
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.4993 0.7304 13.006 <2e-16 ***
Boston[, k + 1] -1.5509 0.1683 -9.213 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.965 on 504 degrees of freedom
Multiple R-squared: 0.1441, Adjusted R-squared: 0.1425
F-statistic: 84.89 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] -2 -1
[1] "rad"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-10.164 -1.381 -0.141 0.660 76.433
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.28716 0.44348 -5.157 3.61e-07 ***
Boston[, k + 1] 0.61791 0.03433 17.998 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.718 on 504 degrees of freedom
Multiple R-squared: 0.3913, Adjusted R-squared: 0.39
F-statistic: 323.9 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 1 1
[1] "tax"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-12.513 -2.738 -0.194 1.065 77.696
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.528369 0.815809 -10.45 <2e-16 ***
Boston[, k + 1] 0.029742 0.001847 16.10 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.997 on 504 degrees of freedom
Multiple R-squared: 0.3396, Adjusted R-squared: 0.3383
F-statistic: 259.2 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 0 0
[1] "ptratio"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-7.654 -3.985 -1.912 1.825 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -17.6469 3.1473 -5.607 3.40e-08 ***
Boston[, k + 1] 1.1520 0.1694 6.801 2.94e-11 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.24 on 504 degrees of freedom
Multiple R-squared: 0.08407, Adjusted R-squared: 0.08225
F-statistic: 46.26 on 1 and 504 DF, p-value: 2.943e-11
2.5 % 97.5 %
Boston[, k + 1] 1 1
[1] "black"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-13.756 -2.299 -2.095 -1.296 86.822
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.553529 1.425903 11.609 <2e-16 ***
Boston[, k + 1] -0.036280 0.003873 -9.367 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.946 on 504 degrees of freedom
Multiple R-squared: 0.1483, Adjusted R-squared: 0.1466
F-statistic: 87.74 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 0 0
[1] "lstat"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-13.925 -2.822 -0.664 1.079 82.862
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.33054 0.69376 -4.801 2.09e-06 ***
Boston[, k + 1] 0.54880 0.04776 11.491 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.664 on 504 degrees of freedom
Multiple R-squared: 0.2076, Adjusted R-squared: 0.206
F-statistic: 132 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 0 1
[1] "medv"
Call:
lm(formula = Boston$crim ~ Boston[, k + 1])
Residuals:
Min 1Q Median 3Q Max
-9.071 -4.022 -2.343 1.298 80.957
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 11.79654 0.93419 12.63 <2e-16 ***
Boston[, k + 1] -0.36316 0.03839 -9.46 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.934 on 504 degrees of freedom
Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
2.5 % 97.5 %
Boston[, k + 1] 0 0
依據這13個以crim為依變項之單迴歸模型的分析結果,可以發現僅有chas此變項對crim的主效果未達統計顯著,其餘12個變項的主效果皆達統計顯著,顯示這些變項對於預測chas應是有幫助的,其中,當獨變項是rad和tax時,迴歸模型大於\(.3\),比其他模型高許多,指出rad和tax與crim的相關程度應更大。
下面幾張圖呈現了這些變項與crim的相關。
par(mfrow=c(2,2))
for (k in c(1,2,4:13)) {
plot(Boston[,k+1], Boston$crim, xlab='crim', ylab=colnames(Boston)[k+1],
main=paste0('Scatter Plot of crim and ', colnames(Boston)[k+1]))
}此圖顯示rad和tax與crim的相關程度比其他變項與crim的相關程度大。
Exercise 3.15 - (b)
Fit a multiple regression model to predict the response using all of the predictors. Describe your results. For which predictors can we reject the null hypothesis \(H_0:\ \beta_j = 0\)?
Call:
lm(formula = crim ~ . - crim, data = Boston)
Residuals:
Min 1Q Median 3Q Max
-9.924 -2.120 -0.353 1.019 75.051
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 17.033228 7.234903 2.354 0.018949 *
zn 0.044855 0.018734 2.394 0.017025 *
indus -0.063855 0.083407 -0.766 0.444294
chas -0.749134 1.180147 -0.635 0.525867
nox -10.313535 5.275536 -1.955 0.051152 .
rm 0.430131 0.612830 0.702 0.483089
age 0.001452 0.017925 0.081 0.935488
dis -0.987176 0.281817 -3.503 0.000502 ***
rad 0.588209 0.088049 6.680 6.46e-11 ***
tax -0.003780 0.005156 -0.733 0.463793
ptratio -0.271081 0.186450 -1.454 0.146611
black -0.007538 0.003673 -2.052 0.040702 *
lstat 0.126211 0.075725 1.667 0.096208 .
medv -0.198887 0.060516 -3.287 0.001087 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.439 on 492 degrees of freedom
Multiple R-squared: 0.454, Adjusted R-squared: 0.4396
F-statistic: 31.47 on 13 and 492 DF, p-value: < 2.2e-16
2.5 % 97.5 %
zn 0.0080 0.0817
indus -0.2277 0.1000
chas -3.0679 1.5696
nox -20.6789 0.0518
rm -0.7740 1.6342
age -0.0338 0.0367
dis -1.5409 -0.4335
rad 0.4152 0.7612
tax -0.0139 0.0063
ptratio -0.6374 0.0953
black -0.0148 -0.0003
lstat -0.0226 0.2750
medv -0.3178 -0.0800
在此複迴歸模型中,zn, dis, rad, black與medv此4個變項的效果達統計顯著(\(\alpha=.05\)),針對這些變項係數的假設檢定,我們可以拒絕\(H_0:\beta_j=0\),也就是說,這幾個獨變項的迴歸係數\(\beta_j\)的\(1-\alpha=95%\)信賴區間不包含0。
Exercise 3.15 - (c)
How do your results from (a) compare to your results from (b)? Create a plot displaying the univariate regression coefficients from (a) on the x-axis, and the multiple regression coefficients from (b) on the y-axis. That is, each predictor is displayed as a single point in the plot. Its coefficient in a simple linear regression model is shown on the x-axis, and its coefficient estimate in the multiple linear regression model is shown on the y-axis.
coef_a <- NA
for (k in 1:13) {coef_a <- c(coef_a, fit_a[[k]]$coefficients[2])}
coef_b <- fit_b$coefficients[-1]
coef_a Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1]
NA -0.07393498 0.50977633 -1.89277655 31.24853120
Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1]
-2.68405122 0.10778623 -1.55090168 0.61791093 0.02974225
Boston[, k + 1] Boston[, k + 1] Boston[, k + 1] Boston[, k + 1]
1.15198279 -0.03627964 0.54880478 -0.36315992
zn indus chas nox rm
0.044855215 -0.063854824 -0.749133611 -10.313534912 0.430130506
age dis rad tax ptratio
0.001451643 -0.987175726 0.588208591 -0.003780016 -0.271080558
black lstat medv
-0.007537505 0.126211376 -0.198886821
原本在單迴歸模型中效果顯著的變項在這裡不顯著了,很可能是因為這些變項彼此間也有一定程度的相關,所以某變項可解釋的部分被與之有相關的另一變項解釋掉了,此變項的效果也就未達統計顯著了。
plot(coef_a[-1], coef_b,
xlab='the univariate regression coefficients',
ylab='the multiple regression coefficients')此圖顯顯示單迴歸係數與複迴歸係數有一outlier使得整體資料呈現高度負相關。
Exercise 3.15 - (d)
Is there evidence of non-linear association between any of the predictors and the response? To answer this question, for each predictor \(X\), fit a model of the form \(Y = \beta_0 + \beta_1X + \beta_2X^2 + \beta_3X^3 + \epsilon\).
fit_d <- list(NA)
for (k in 1:13) {
X <- Boston[,k+1]
fit_d[[k]] <- lm(Boston$crim ~ I(X) + I(X^2) + I(X^3))
print(colnames(Boston)[k+1])
print(summary(fit_d[[k]]))
print(round(confint(fit_d[[k]], 2:4, .95),4))
}[1] "zn"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-4.821 -4.614 -1.294 0.473 84.130
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.846e+00 4.330e-01 11.192 < 2e-16 ***
I(X) -3.322e-01 1.098e-01 -3.025 0.00261 **
I(X^2) 6.483e-03 3.861e-03 1.679 0.09375 .
I(X^3) -3.776e-05 3.139e-05 -1.203 0.22954
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.372 on 502 degrees of freedom
Multiple R-squared: 0.05824, Adjusted R-squared: 0.05261
F-statistic: 10.35 on 3 and 502 DF, p-value: 1.281e-06
2.5 % 97.5 %
I(X) -0.5479 -0.1164
I(X^2) -0.0011 0.0141
I(X^3) -0.0001 0.0000
[1] "indus"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-8.278 -2.514 0.054 0.764 79.713
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.6625683 1.5739833 2.327 0.0204 *
I(X) -1.9652129 0.4819901 -4.077 5.30e-05 ***
I(X^2) 0.2519373 0.0393221 6.407 3.42e-10 ***
I(X^3) -0.0069760 0.0009567 -7.292 1.20e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.423 on 502 degrees of freedom
Multiple R-squared: 0.2597, Adjusted R-squared: 0.2552
F-statistic: 58.69 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -2.9122 -1.0182
I(X^2) 0.1747 0.3292
I(X^3) -0.0089 -0.0051
[1] "chas"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-3.738 -3.661 -3.435 0.018 85.232
Coefficients: (2 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.7444 0.3961 9.453 <2e-16 ***
I(X) -1.8928 1.5061 -1.257 0.209
I(X^2) NA NA NA NA
I(X^3) NA NA NA NA
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.597 on 504 degrees of freedom
Multiple R-squared: 0.003124, Adjusted R-squared: 0.001146
F-statistic: 1.579 on 1 and 504 DF, p-value: 0.2094
2.5 % 97.5 %
I(X) -4.8518 1.0663
I(X^2) NA NA
I(X^3) NA NA
[1] "nox"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-9.110 -2.068 -0.255 0.739 78.302
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 233.09 33.64 6.928 1.31e-11 ***
I(X) -1279.37 170.40 -7.508 2.76e-13 ***
I(X^2) 2248.54 279.90 8.033 6.81e-15 ***
I(X^3) -1245.70 149.28 -8.345 6.96e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.234 on 502 degrees of freedom
Multiple R-squared: 0.297, Adjusted R-squared: 0.2928
F-statistic: 70.69 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -1614.151 -944.5912
I(X^2) 1698.626 2798.4624
I(X^3) -1538.997 -952.4091
[1] "rm"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-18.485 -3.468 -2.221 -0.015 87.219
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 112.6246 64.5172 1.746 0.0815 .
I(X) -39.1501 31.3115 -1.250 0.2118
I(X^2) 4.5509 5.0099 0.908 0.3641
I(X^3) -0.1745 0.2637 -0.662 0.5086
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.33 on 502 degrees of freedom
Multiple R-squared: 0.06779, Adjusted R-squared: 0.06222
F-statistic: 12.17 on 3 and 502 DF, p-value: 1.067e-07
2.5 % 97.5 %
I(X) -100.6679 22.3676
I(X^2) -5.2920 14.3938
I(X^3) -0.6927 0.3437
[1] "age"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-9.762 -2.673 -0.516 0.019 82.842
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.549e+00 2.769e+00 -0.920 0.35780
I(X) 2.737e-01 1.864e-01 1.468 0.14266
I(X^2) -7.230e-03 3.637e-03 -1.988 0.04738 *
I(X^3) 5.745e-05 2.109e-05 2.724 0.00668 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.84 on 502 degrees of freedom
Multiple R-squared: 0.1742, Adjusted R-squared: 0.1693
F-statistic: 35.31 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -0.0925 0.6398
I(X^2) -0.0144 -0.0001
I(X^3) 0.0000 0.0001
[1] "dis"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-10.757 -2.588 0.031 1.267 76.378
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.0476 2.4459 12.285 < 2e-16 ***
I(X) -15.5543 1.7360 -8.960 < 2e-16 ***
I(X^2) 2.4521 0.3464 7.078 4.94e-12 ***
I(X^3) -0.1186 0.0204 -5.814 1.09e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.331 on 502 degrees of freedom
Multiple R-squared: 0.2778, Adjusted R-squared: 0.2735
F-statistic: 64.37 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -18.9650 -12.1437
I(X^2) 1.7715 3.1327
I(X^3) -0.1587 -0.0785
[1] "rad"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-10.381 -0.412 -0.269 0.179 76.217
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.605545 2.050108 -0.295 0.768
I(X) 0.512736 1.043597 0.491 0.623
I(X^2) -0.075177 0.148543 -0.506 0.613
I(X^3) 0.003209 0.004564 0.703 0.482
Residual standard error: 6.682 on 502 degrees of freedom
Multiple R-squared: 0.4, Adjusted R-squared: 0.3965
F-statistic: 111.6 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -1.5376 2.5631
I(X^2) -0.3670 0.2167
I(X^3) -0.0058 0.0122
[1] "tax"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-13.273 -1.389 0.046 0.536 76.950
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.918e+01 1.180e+01 1.626 0.105
I(X) -1.533e-01 9.568e-02 -1.602 0.110
I(X^2) 3.608e-04 2.425e-04 1.488 0.137
I(X^3) -2.204e-07 1.889e-07 -1.167 0.244
Residual standard error: 6.854 on 502 degrees of freedom
Multiple R-squared: 0.3689, Adjusted R-squared: 0.3651
F-statistic: 97.8 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -0.3413 0.0347
I(X^2) -0.0001 0.0008
I(X^3) 0.0000 0.0000
[1] "ptratio"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-6.833 -4.146 -1.655 1.408 82.697
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 477.18405 156.79498 3.043 0.00246 **
I(X) -82.36054 27.64394 -2.979 0.00303 **
I(X^2) 4.63535 1.60832 2.882 0.00412 **
I(X^3) -0.08476 0.03090 -2.743 0.00630 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 8.122 on 502 degrees of freedom
Multiple R-squared: 0.1138, Adjusted R-squared: 0.1085
F-statistic: 21.48 on 3 and 502 DF, p-value: 4.171e-13
2.5 % 97.5 %
I(X) -136.6726 -28.0485
I(X^2) 1.4755 7.7952
I(X^3) -0.1455 -0.0241
[1] "black"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-13.096 -2.343 -2.128 -1.439 86.790
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.826e+01 2.305e+00 7.924 1.5e-14 ***
I(X) -8.356e-02 5.633e-02 -1.483 0.139
I(X^2) 2.137e-04 2.984e-04 0.716 0.474
I(X^3) -2.652e-07 4.364e-07 -0.608 0.544
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.955 on 502 degrees of freedom
Multiple R-squared: 0.1498, Adjusted R-squared: 0.1448
F-statistic: 29.49 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -0.1942 0.0271
I(X^2) -0.0004 0.0008
I(X^3) 0.0000 0.0000
[1] "lstat"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-15.234 -2.151 -0.486 0.066 83.353
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.2009656 2.0286452 0.592 0.5541
I(X) -0.4490656 0.4648911 -0.966 0.3345
I(X^2) 0.0557794 0.0301156 1.852 0.0646 .
I(X^3) -0.0008574 0.0005652 -1.517 0.1299
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 7.629 on 502 degrees of freedom
Multiple R-squared: 0.2179, Adjusted R-squared: 0.2133
F-statistic: 46.63 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -1.3624 0.4643
I(X^2) -0.0034 0.1149
I(X^3) -0.0020 0.0003
[1] "medv"
Call:
lm(formula = Boston$crim ~ I(X) + I(X^2) + I(X^3))
Residuals:
Min 1Q Median 3Q Max
-24.427 -1.976 -0.437 0.439 73.655
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 53.1655381 3.3563105 15.840 < 2e-16 ***
I(X) -5.0948305 0.4338321 -11.744 < 2e-16 ***
I(X^2) 0.1554965 0.0171904 9.046 < 2e-16 ***
I(X^3) -0.0014901 0.0002038 -7.312 1.05e-12 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 6.569 on 502 degrees of freedom
Multiple R-squared: 0.4202, Adjusted R-squared: 0.4167
F-statistic: 121.3 on 3 and 502 DF, p-value: < 2.2e-16
2.5 % 97.5 %
I(X) -5.9472 -4.2425
I(X^2) 0.1217 0.1893
I(X^3) -0.0019 -0.0011
這些變項中,indus, nox, age, dis, ptratio與medv的次方項和立方項的效果皆達統計顯著。