Problem #7.9.9 a)
##
## Call:
## lm(formula = nox ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121130 -0.040619 -0.009738 0.023385 0.194904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.554695 0.002759 201.021 < 2e-16 ***
## poly(dis, 3)1 -2.003096 0.062071 -32.271 < 2e-16 ***
## poly(dis, 3)2 0.856330 0.062071 13.796 < 2e-16 ***
## poly(dis, 3)3 -0.318049 0.062071 -5.124 4.27e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06207 on 502 degrees of freedom
## Multiple R-squared: 0.7148, Adjusted R-squared: 0.7131
## F-statistic: 419.3 on 3 and 502 DF, p-value: < 2.2e-16
The summary shows that all of the polynomial terms are significant while predicting nox using dit. The plot shows a smooth curve fitting the data quite nicely.
I plotted the polynomials of degrees 1 to 10 and save train RSS.
## [1] 2.768563 2.035262 1.934107 1.932981 1.915290 1.878257 1.849484
## [8] 1.835630 1.833331 1.832171
As expected, train RSS deccreases with each degree of polynomial.
I used a 10-fold cross validation to pick the best polynomial degree.
The 10-fold CV shows that the CV error increases from degree 1 to 3, stays constant at degree 5, and starts to increase for higher degrees. Degree 4 has the best polynomial degree.
Dis has limits of about 1 and 13. I split this range into roughly 4 equal intervals and made the knots at [4,7,11]
##
## Call:
## lm(formula = nox ~ bs(dis, df = 4, knots = c(4, 7, 11)), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.124567 -0.040355 -0.008702 0.024740 0.192920
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.73926 0.01331 55.537 < 2e-16
## bs(dis, df = 4, knots = c(4, 7, 11))1 -0.08861 0.02504 -3.539 0.00044
## bs(dis, df = 4, knots = c(4, 7, 11))2 -0.31341 0.01680 -18.658 < 2e-16
## bs(dis, df = 4, knots = c(4, 7, 11))3 -0.26618 0.03147 -8.459 3.00e-16
## bs(dis, df = 4, knots = c(4, 7, 11))4 -0.39802 0.04647 -8.565 < 2e-16
## bs(dis, df = 4, knots = c(4, 7, 11))5 -0.25681 0.09001 -2.853 0.00451
## bs(dis, df = 4, knots = c(4, 7, 11))6 -0.32926 0.06327 -5.204 2.85e-07
##
## (Intercept) ***
## bs(dis, df = 4, knots = c(4, 7, 11))1 ***
## bs(dis, df = 4, knots = c(4, 7, 11))2 ***
## bs(dis, df = 4, knots = c(4, 7, 11))3 ***
## bs(dis, df = 4, knots = c(4, 7, 11))4 ***
## bs(dis, df = 4, knots = c(4, 7, 11))5 **
## bs(dis, df = 4, knots = c(4, 7, 11))6 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06185 on 499 degrees of freedom
## Multiple R-squared: 0.7185, Adjusted R-squared: 0.7151
## F-statistic: 212.3 on 6 and 499 DF, p-value: < 2.2e-16
The summary shows that all of the spline fit are significant. The plot shows that the spline fits data well except for the extreme values of dis.
I fit regression splines with dfs between 3 and 16.
## [1] 1.934107 1.922775 1.840173 1.833966 1.829884 1.816995 1.825653
## [8] 1.792535 1.796992 1.788999 1.782350 1.781838 1.782798 1.783546
Train RSS monotonically decreases till df=14 and then slowly increases for df=15 and df=16.
I used a 10-fold cross validation to find the best df. I tried all integer values of df between 3 and 16.
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots =
## c(1.1296, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots =
## c(1.1296, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots =
## c(1.137, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots =
## c(1.137, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(3.0993, .Names = "50%"),
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(3.0993, .Names = "50%"),
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(3.1523, .Names = "50%"),
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(3.1523, .Names = "50%"),
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.38403333333333,
## 4.25576666666667: some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.38403333333333,
## 4.25576666666667: some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.38876666666667,
## 4.4031: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.38876666666667,
## 4.4031: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.102875, 3.26745,
## 5.218725: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.102875, 3.26745,
## 5.218725: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.0788, 3.2721,
## 5.2119: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(2.0788, 3.2721,
## 5.2119: some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.94264, 2.62334,
## 3.9175, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.94264, 2.62334,
## 3.9175, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.97036, 2.66262,
## 3.85838, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.97036, 2.66262,
## 3.85838, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.87351666666667,
## 2.3865, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.87351666666667,
## 2.3865, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.84476666666667,
## 2.3817, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.84476666666667,
## 2.3817, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.79777142857143,
## 2.219, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.79777142857143,
## 2.219, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.79078571428571,
## 2.16592857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.79078571428571,
## 2.16592857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.734325, 2.0493,
## 2.4718, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.734325, 2.0493,
## 2.4718, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.7275, 2.0581,
## 2.4553, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.7275, 2.0581,
## 2.4553, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.65344444444444,
## 1.98665555555556, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.65344444444444,
## 1.98665555555556, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.71806666666667,
## 2.00333333333333, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.71806666666667,
## 2.00333333333333, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.6424, 1.96376,
## 2.26178, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.6424, 1.96376,
## 2.26178, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.64186, 1.95434,
## 2.2693, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.64186, 1.95434,
## 2.2693, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.59590909090909,
## 1.90056363636364, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.59590909090909,
## 1.90056363636364, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.61450909090909,
## 1.9047, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.61450909090909,
## 1.9047, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.591425, 1.86565,
## 2.100525, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.591425, 1.86565,
## 2.100525, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.58948333333333,
## 1.85083333333333, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.58948333333333,
## 1.85083333333333, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.5888, 1.8172,
## 2.0407, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.5888, 1.8172,
## 2.0407, : some 'x' values beyond boundary knots may cause ill-conditioned
## bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.5311,
## 1.80062857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.5311,
## 1.80062857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.584,
## 1.81652857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.584,
## 1.81652857142857, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
The CV error is much more jumpy in this case, the minimum is attained at df=10. 10 is the I pick 10 as the optimal degrees of freedom.
Additional Problem
## Warning in bs(dis, degree = 3L, knots = structure(c(1.57935,
## 1.82203333333333, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = structure(c(1.59125,
## 1.86156666666667, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## [1] 1.788753
Df=14 gives us a low RSS error. Although it is not the lowest, the higher the df becomes the worse the variance-bias tradeoff becomes. Thus df=14 gives us the best model.