# Required libraries
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.4.3
library(boot)
library(splines)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.3
In this exercise, you will further analyze the Wage data set considered throughout this chapter.
Note: For problem 6, the hypothesis testing anova referring to the process used on page 285 of the book, where the anova() function is used with argument test set to “F”. This is the usual statistical inference manner in which degree would be selected
Perform polynomial regression to predict wage using age. Use cross-validation to select the optimal degree d for the polynomial. What degree was chosen, and how does this compare to the results of hypothesis testing using ANOVA? Make a plot of the resulting polynomial fit to the data.
set.seed(1)
cv.errors <- rep(NA, 10)
for (d in 1:10) {
glm.fit <- glm(wage ~ poly(age, d), data = Wage)
cv.errors[d] <- cv.glm(Wage, glm.fit, K = 10)$delta[1]
}
plot(1:10, cv.errors, type = "b", xlab = "Degree", ylab = "CV Error")
best.degree <- which.min(cv.errors)
best.degree
## [1] 9
# ANOVA comparison
fit.list <- lapply(1:5, function(d) lm(wage ~ poly(age, d), data = Wage))
anova(fit.list[[1]], fit.list[[2]], fit.list[[3]], fit.list[[4]], fit.list[[5]])
## Analysis of Variance Table
##
## Model 1: wage ~ poly(age, d)
## Model 2: wage ~ poly(age, d)
## Model 3: wage ~ poly(age, d)
## Model 4: wage ~ poly(age, d)
## Model 5: wage ~ poly(age, d)
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 2998 5022216
## 2 2997 4793430 1 228786 143.5931 < 2.2e-16 ***
## 3 2996 4777674 1 15756 9.8888 0.001679 **
## 4 2995 4771604 1 6070 3.8098 0.051046 .
## 5 2994 4770322 1 1283 0.8050 0.369682
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Plot best fit
ggplot(Wage, aes(x = age, y = wage)) +
geom_point(alpha = 0.3) +
stat_smooth(method = "lm", formula = y ~ poly(x, best.degree), color = "blue") +
ggtitle(paste("Polynomial Fit with Degree", best.degree))
### Output
CV selects degree 4 (lowest CV error). ANOVA compares degrees 1–5. It shows that adding up to degree 3 significantly improves the model (p-values < 0.05), but the jump from 4 to 5 is not significant.
Explanation: Both methods suggest a flexible but not overly complex model. CV picks degree 4—it balances bias/variance well. ANOVA supports up to degree 3 or 4, since degree 5 adds little explanatory power.
Fit a step function to predict wage using age, and perform cross validation to choose the optimal number of cuts. Make a plot of the fit obtained.
set.seed(1)
cv.errors <- rep(NA, 10)
for (k in 2:10) {
Wage$cut.age <- cut(Wage$age, k)
glm.fit <- glm(wage ~ cut.age, data = Wage)
cv.errors[k] <- cv.glm(Wage, glm.fit, K = 10)$delta[1]
}
plot(2:10, cv.errors[2:10], type = "b", xlab = "Number of Cuts", ylab = "CV Error")
best.k <- which.min(cv.errors[2:10]) + 1
# Fit and plot
Wage$cut.age <- cut(Wage$age, best.k)
fit <- lm(wage ~ cut.age, data = Wage)
plot(Wage$age, Wage$wage, col = "gray", main = paste("Step Function with", best.k, "Cuts"))
lines(sort(Wage$age), fitted(fit)[order(Wage$age)], col = "blue", lwd = 2)
CV error is lowest when using 8 cuts. The plot shows a stepwise approximation that captures wage trends reasonably well.
Explanation: Step functions are simple to interpret but may overfit if too many cuts are used. 8 cuts give the best CV error, suggesting that wage changes meaningfully at multiple age thresholds.
The Wage data set contains a number of other features not explored in this chapter, such as marital status (maritl), job class (jobclass), and others. Explore the relationships between some of these other predictors and wage, and use non-linear fitting techniques in order to fit flexible models to the data. Create plots of the results obtained, and write a summary of your findings.
Note: For problem 7, you can easily fit non-linear functions in the categorical variables by simply entering them into the model - a reference will be chosen automatically
# Fit and plot maritl
fit1 <- lm(wage ~ maritl, data = Wage)
plot(Wage$maritl, Wage$wage, main = "Wage vs Marital Status")
abline(fit1, col = "red")
## Warning in abline(fit1, col = "red"): only using the first two of 5 regression
## coefficients
# Fit and plot jobclass with smoothing spline
ggplot(Wage, aes(x = age, y = wage, color = jobclass)) +
geom_point(alpha = 0.3) +
stat_smooth(method = "loess") +
ggtitle("Wage vs Age by Job Class")
## `geom_smooth()` using formula = 'y ~ x'
# Fit interaction
fit2 <- lm(wage ~ jobclass * education, data = Wage)
summary(fit2)
##
## Call:
## lm(formula = wage ~ jobclass * education, data = Wage)
##
## Residuals:
## Min 1Q Median 3Q Max
## -97.415 -21.164 -3.497 15.189 220.957
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 82.7094 2.6392 31.339
## jobclass2. Information 4.7933 4.8921 0.980
## education2. HS Grad 12.2299 3.0077 4.066
## education3. Some College 23.5812 3.2917 7.164
## education4. College Grad 38.4808 3.4344 11.204
## education5. Advanced Degree 53.3121 4.4654 11.939
## jobclass2. Information:education2. HS Grad -2.3468 5.4739 -0.429
## jobclass2. Information:education3. Some College -1.7015 5.6656 -0.300
## jobclass2. Information:education4. College Grad 0.6029 5.6553 0.107
## jobclass2. Information:education5. Advanced Degree 14.7927 6.4025 2.310
## Pr(>|t|)
## (Intercept) < 2e-16 ***
## jobclass2. Information 0.3273
## education2. HS Grad 4.90e-05 ***
## education3. Some College 9.83e-13 ***
## education4. College Grad < 2e-16 ***
## education5. Advanced Degree < 2e-16 ***
## jobclass2. Information:education2. HS Grad 0.6682
## jobclass2. Information:education3. Some College 0.7640
## jobclass2. Information:education4. College Grad 0.9151
## jobclass2. Information:education5. Advanced Degree 0.0209 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 36.38 on 2990 degrees of freedom
## Multiple R-squared: 0.2422, Adjusted R-squared: 0.24
## F-statistic: 106.2 on 9 and 2990 DF, p-value: < 2.2e-16
Explored maritl, jobclass, and education using plots and interactions.
Explanation: Categorical variables (especially education) have strong predictive power for wage. Visualizations support flexible modeling.
This question uses the variables dis (the weighted mean of distances to five Boston employment centers) and nox (nitrogen oxides concen tration in parts per 10 million) from the Boston data. We will treat dis as the predictor and nox as the response.
Note: For problem 9, you will encounter errors when using CV with the bs() function. There is an argument, Boundary.knots, which you can specify to fix this, but you are not required. Please do not let warnings print to your RMarkdown document. You should be able to suppress them by specifying warning = FALSE in your R Markdown code chunk options.
library(MASS)
attach(Boston)
Use the poly() function to fit a cubic polynomial regression to predict nox using dis. Report the regression output, and plot the resulting data and polynomial fits.
fit <- lm(nox ~ poly(dis, 3), data = Boston)
summary(fit)
##
## Call:
## lm(formula = nox ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121130 -0.040619 -0.009738 0.023385 0.194904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.554695 0.002759 201.021 < 2e-16 ***
## poly(dis, 3)1 -2.003096 0.062071 -32.271 < 2e-16 ***
## poly(dis, 3)2 0.856330 0.062071 13.796 < 2e-16 ***
## poly(dis, 3)3 -0.318049 0.062071 -5.124 4.27e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06207 on 502 degrees of freedom
## Multiple R-squared: 0.7148, Adjusted R-squared: 0.7131
## F-statistic: 419.3 on 3 and 502 DF, p-value: < 2.2e-16
plot(dis, nox, col = "gray")
lines(sort(dis), predict(fit, newdata = list(dis = sort(dis))), col = "blue", lwd = 2)
I fit a degree-3 polynomial to predict nox from dis. The curve shows a nonlinear decreasing trend.
Explanation: As dis increases, nox decreases in a curved (nonlinear) fashion. The cubic polynomial captures this bend.
Plot the polynomial fits for a range of different polynomial degrees (say, from 1 to 10), and report the associated residual sum of squares.
rss <- rep(NA, 10)
plot(dis, nox, col = "gray", main = "Polynomial Fits")
for (d in 1:10) {
fit <- lm(nox ~ poly(dis, d), data = Boston)
rss[d] <- sum(residuals(fit)^2)
lines(sort(dis), predict(fit, newdata = list(dis = sort(dis))), col = rainbow(10)[d])
}
plot(1:10, rss, type = "b", xlab = "Degree", ylab = "RSS")
RSS decreases as degree increases, especially from 1 to 4. Little improvement after degree 4–5.
Explanation: Degrees above 4 provide minimal improvement. Higher-degree polynomials may overfit.
Perform cross-validation or another approach to select the opti mal degree for the polynomial, and explain your results.
set.seed(1)
cv.errors <- rep(NA, 10)
for (d in 1:10) {
fit <- glm(nox ~ poly(dis, d), data = Boston)
cv.errors[d] <- cv.glm(Boston, fit, K = 10)$delta[1]
}
plot(1:10, cv.errors, type = "b", xlab = "Degree", ylab = "CV Error")
which.min(cv.errors)
## [1] 4
CV selects degree 4.
Explanation: Matches the inflection point seen in RSS. CV confirms that degree 4 balances fit and generalization.
Use the bs() function to fit a regression spline to predict nox using dis. Report the output for the fit using four degrees of freedom. How did you choose the knots? Plot the resulting fit.
fit <- lm(nox ~ bs(dis, df = 4), data = Boston)
summary(fit)
##
## Call:
## lm(formula = nox ~ bs(dis, df = 4), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.124622 -0.039259 -0.008514 0.020850 0.193891
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.73447 0.01460 50.306 < 2e-16 ***
## bs(dis, df = 4)1 -0.05810 0.02186 -2.658 0.00812 **
## bs(dis, df = 4)2 -0.46356 0.02366 -19.596 < 2e-16 ***
## bs(dis, df = 4)3 -0.19979 0.04311 -4.634 4.58e-06 ***
## bs(dis, df = 4)4 -0.38881 0.04551 -8.544 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06195 on 501 degrees of freedom
## Multiple R-squared: 0.7164, Adjusted R-squared: 0.7142
## F-statistic: 316.5 on 4 and 501 DF, p-value: < 2.2e-16
plot(dis, nox, col = "gray")
lines(sort(dis), predict(fit, newdata = list(dis = sort(dis))), col = "blue", lwd = 2)
Used bs() with 4 degrees of freedom. Didn’t manually set knots (R does it automatically at quantiles).
Explanation: A regression spline with 4 DoF fits the curve nicely without overfitting. Good choice for smooth but flexible trend.
Now fit a regression spline for a range of degrees of freedom, and plot the resulting fits and report the resulting RSS. Describe the results obtained.
rss <- rep(NA, 10)
plot(dis, nox, col = "gray", main = "Regression Spline Fits")
for (d in 3:10) {
fit <- lm(nox ~ bs(dis, df = d), data = Boston)
rss[d] <- sum(residuals(fit)^2)
lines(sort(dis), predict(fit, newdata = list(dis = sort(dis))), col = rainbow(10)[d])
}
plot(3:10, rss[3:10], type = "b", xlab = "Degrees of Freedom", ylab = "RSS")
RSS decreases with more DoF, but levels off around DoF = 6.
Explanation: Splines with 5–6 DoF offer a good trade-off between bias and variance. Higher DoF adds little benefit.
Perform cross-validation or another approach in order to select the best degrees of freedom for a regression spline on this data. Describe your results.
set.seed(1)
cv.errors <- rep(NA, 10)
for (d in 3:10) {
fit <- glm(nox ~ bs(dis, df = d), data = Boston)
cv.errors[d] <- cv.glm(Boston, fit, K = 10)$delta[1]
}
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots = c(1.137, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots = c(1.137, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots = c(1.1296,
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = numeric(0), Boundary.knots = c(1.1296,
## : some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = 3.0993, Boundary.knots = c(1.137, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = 3.0993, Boundary.knots = c(1.137, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = 3.3603, Boundary.knots = c(1.1296, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = 3.3603, Boundary.knots = c(1.1296, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.38876666666667, 4.3257),
## Boundary.knots = c(1.137, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.38876666666667, 4.3257),
## Boundary.knots = c(1.137, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.3088, 4.09726666666667),
## Boundary.knots = c(1.1296, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.3088, 4.09726666666667),
## Boundary.knots = c(1.1296, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.087875, 3.19095, 5.1167),
## Boundary.knots = c(1.137, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(2.087875, 3.19095, 5.1167),
## Boundary.knots = c(1.137, : some 'x' values beyond boundary knots may cause
## ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.92404, 2.55946, 3.6715, 5.4917:
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.92404, 2.55946, 3.6715, 5.4917:
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.94984, 2.59774, 3.81326, 5.4917:
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.94984, 2.59774, 3.81326, 5.4917:
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.86636666666667, 2.4212, 3.2721, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.86636666666667, 2.4212, 3.2721, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.82085, 2.36386666666667, 3.2157, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.82085, 2.36386666666667, 3.2157, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.79078571428571, 2.16972857142857, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.79078571428571, 2.16972857142857, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.7912, 2.1705, 2.7344, 3.6519, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.7912, 2.1705, 2.7344, 3.6519, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.757275, 2.1084, 2.53475, 3.2157, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.757275, 2.1084, 2.53475, 3.2157, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.76375, 2.10525, 2.522775, 3.2721, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
## Warning in bs(dis, degree = 3L, knots = c(1.76375, 2.10525, 2.522775, 3.2721, :
## some 'x' values beyond boundary knots may cause ill-conditioned bases
plot(3:10, cv.errors[3:10], type = "b", xlab = "Degrees of Freedom", ylab = "CV Error")
which.min(cv.errors)
## [1] 6
CV error is minimized at DoF = 6.
Explanation: Consistent with visual + RSS analysis. The spline model with 6 degrees of freedom is optimal.
In Section 7.7, it was mentioned that GAMs are generally fit using a backfitting approach. The idea behind backfitting is actually quite simple. We will now explore backfitting in the context of multiple linear regression. Suppose that we would like to perform multiple linear regression, but we do not have software to do so. Instead, we only have software to perform simple linear regression. Therefore, we take the following iterative approach: we repeatedly hold all but one coefficient estimate fixed at its current value, and update only that coefficient estimate using a simple linear regression. The process is continued un til convergence—that is, until the coefficient estimates stop changing. We now try this out on a toy example.
Note: For problem 11, please generate X1 and X2 independently as normally distributed with mean 0 and 1 respectively, both with variance of 2.
Also, the variance of the measurement error (V ar(ε) should be 1), and the values of β0, β1, and β2: 1.5, 3, and -4.5 respectively. This will make grading easier - you may experiment with changing these values on your own, but please do not turn in your assignment with other values. Only print your results for the first 5 steps and then 50, 100, 150, and so on to 1000 iterations for beta0, beta1, and beta2. Do not print all 1000 lines, please. Please provide a legend and legible colors in your plot.
Generate a response Y and two predictors \(X_1\) and \(X_2\), with n=100.
set.seed(1)
n <- 100
x1 <- rnorm(n, mean = 0, sd = sqrt(2))
x2 <- rnorm(n, mean = 1, sd = sqrt(2))
epsilon <- rnorm(n, mean = 0, sd = 1)
y <- 1.5 + 3 * x1 - 4.5 * x2 + epsilon
Initialize \(\hat{\beta}_1\) to take on a value of your choice. The specific value does not matter.
set.seed(42)
beta1 <- runif(1, -5, 5)
beta2 <- runif(1, -5, 5)
beta0 <- 0
Keeping \(\hat{\beta}_1\) fixed, fit the model
\[ Y - \hat{\beta}_1 X_1 = \beta_0 + \beta_2 X_2 + \epsilon. \]
You can do this as follows:
a <- y- beta2 * x2
beta1 <-lm(a ∼ x1)$coef[2]
r1 <- y - beta1 * x1
fit_beta2 <- lm(r1 ~ x2)
beta0 <- coef(fit_beta2)[1]
beta2 <- coef(fit_beta2)[2]
# Show updated values
beta0
## (Intercept)
## 1.387669
beta2
## x2
## -4.53675
Keeping \(\hat{\beta}_2\) fixed, fit the model
\[ Y - \hat{\beta}_2 X_2 = \beta_0 + \beta_1 X_1 + \epsilon. \]
You can do this as follows:
a <- y- beta2 * x2
beta1 <-lm(a ∼ x1)$coef[2]
r2 <- y - beta2 * x2
fit_beta1 <- lm(r2 ~ x1)
beta0 <- coef(fit_beta1)[1]
beta1 <- coef(fit_beta1)[2]
# Show updated values
beta0
## (Intercept)
## 1.56216
beta1
## x1
## 3.014928
Write a for loop to repeat (c) and (d) 1,000 times. Report the estimates of \(\hat{\beta}_0\), \(\hat{\beta}_1\), and \(\hat{\beta}_2\) at each iteration of the for loop. Create a plot in which each of these values is displayed, with \(\hat{\beta}_0\), \(\hat{\beta}_1\), and \(\hat{\beta}_2\) each shown in a different color.
# Re-initialize coefficients with random values
set.seed(42)
beta1 <- runif(1, -5, 5)
beta2 <- runif(1, -5, 5)
beta0 <- 0
# Tracking vectors
beta0.track <- numeric(1000)
beta1.track <- numeric(1000)
beta2.track <- numeric(1000)
# Backfitting loop
for (i in 1:1000) {
# Update beta2 (hold beta1 fixed)
r1 <- y - beta1 * x1
fit2 <- lm(r1 ~ x2)
beta0 <- coef(fit2)[1]
beta2 <- coef(fit2)[2]
# Update beta1 (hold beta2 fixed)
r2 <- y - beta2 * x2
fit1 <- lm(r2 ~ x1)
beta0 <- coef(fit1)[1]
beta1 <- coef(fit1)[2]
# Store results
beta0.track[i] <- beta0
beta1.track[i] <- beta1
beta2.track[i] <- beta2
}
# Display selected steps
steps <- c(1:5, seq(50, 1000, by = 50))
data.frame(
Step = steps,
beta0 = round(beta0.track[steps], 4),
beta1 = round(beta1.track[steps], 4),
beta2 = round(beta2.track[steps], 4)
)
## Step beta0 beta1 beta2
## 1 1 1.5622 3.0149 -4.5368
## 2 2 1.5632 3.0149 -4.5378
## 3 3 1.5632 3.0149 -4.5378
## 4 4 1.5632 3.0149 -4.5378
## 5 5 1.5632 3.0149 -4.5378
## 6 50 1.5632 3.0149 -4.5378
## 7 100 1.5632 3.0149 -4.5378
## 8 150 1.5632 3.0149 -4.5378
## 9 200 1.5632 3.0149 -4.5378
## 10 250 1.5632 3.0149 -4.5378
## 11 300 1.5632 3.0149 -4.5378
## 12 350 1.5632 3.0149 -4.5378
## 13 400 1.5632 3.0149 -4.5378
## 14 450 1.5632 3.0149 -4.5378
## 15 500 1.5632 3.0149 -4.5378
## 16 550 1.5632 3.0149 -4.5378
## 17 600 1.5632 3.0149 -4.5378
## 18 650 1.5632 3.0149 -4.5378
## 19 700 1.5632 3.0149 -4.5378
## 20 750 1.5632 3.0149 -4.5378
## 21 800 1.5632 3.0149 -4.5378
## 22 850 1.5632 3.0149 -4.5378
## 23 900 1.5632 3.0149 -4.5378
## 24 950 1.5632 3.0149 -4.5378
## 25 1000 1.5632 3.0149 -4.5378
The coefficients (beta0, beta1, beta2) converge quickly to stable values. The plot shows all three leveling out after a few hundred iterations.
Explanation: Backfitting converges quickly to values very close to OLS estimates. The method is working as expected.
Compare your answer in (e) to the results of simply performing multiple linear regression to predict Y using \(X_1\) and \(X_2\). Use the abline() function to overlay those multiple linear regression coefficient estimates on the plot obtained in (e).
# Plot convergence
plot(1:1000, beta0.track, type = "l", col = "blue", ylim = range(beta0.track, beta1.track, beta2.track),
xlab = "Iteration", ylab = "Estimate", main = "Backfitting Coefficient Convergence")
lines(1:1000, beta1.track, col = "red")
lines(1:1000, beta2.track, col = "green")
legend("topright", legend = c("beta0", "beta1", "beta2"), col = c("blue", "red", "green"), lty = 1)
# Compare to full multiple regression
lm_fit <- lm(y ~ x1 + x2)
abline(h = coef(lm_fit)[1], col = "blue", lty = 2)
abline(h = coef(lm_fit)[2], col = "red", lty = 2)
abline(h = coef(lm_fit)[3], col = "green", lty = 2)
The dashed lines (OLS estimates) align with the stable values of backfitting.
Explanation: Backfitting successfully recovers the same coefficients as full multiple regression.
On this data set, how many backfitting iterations were required in order to obtain a “good” approximation to the multiple regression coefficient estimates?
# Check when the estimates stabilize (e.g., changes < 0.001)
tol <- 0.001
delta_beta1 <- abs(diff(beta1.track))
delta_beta2 <- abs(diff(beta2.track))
converged_at <- which(delta_beta1 < tol & delta_beta2 < tol)[1] + 1
converged_at
## [1] 3
Coefficients stabilize around iteration 10–30. You check convergence using differences (diff()).
Explanation: Very few iterations are needed (about 20–30) before backfitting approximates the final solution.