library(ISLR2)
library(boot)
set.seed(100)
cv_errors <- numeric(20)
for (d in 1:20) {
glm_model <- glm(wage ~ poly(age, d), data = Wage)
cv_errors[d] <- cv.glm(Wage, glm_model, K = 10)$delta[1]
}
plot(cv_errors, type = "l", xlab = "Degree of Polynomial", ylab = "Cross-Validation Error", main = "CV Error vs Degree")
From the plot, degree 3 seems optimal i.e., CV error stabilizes after degree 3.
lm_models <- list()
for (d in 1:5) {
lm_models[[d]] <- lm(wage ~ poly(age, d), data = Wage)
}
anova(lm_models[[1]], lm_models[[2]], lm_models[[3]], lm_models[[4]], lm_models[[5]], test = "F")
Degree 2 (quadratic) and degree 3 (cubic) significantly improve the model.
Beyond degree 3, higher polynomials are not statistically justified (p-values > 0.05).
Cross-validation and ANOVA both suggest using a 3rd degree polynomial.
cv_step_errors <- numeric(20)
for (cuts in 1:20) {
if (cuts == 1) {
glm_fit <- glm(wage ~ 1, data = Wage) # No age effect
} else {
Wage$cut_age <- cut(Wage$age, cuts)
glm_fit <- glm(wage ~ cut_age, data = Wage)
}
set.seed(100)
cv_step_errors[cuts] <- cv.glm(Wage, glm_fit, K = 10)$delta[1]
}
plot(cv_step_errors, type = "l", xlab = "Number of Cuts", ylab = "Crossvalidation Error", main = "CV Error for Step Functions")
From this CV error plot we can say that 8 cuts gives the best balance between bias and variance.
library(gam)
## Loading required package: splines
## Loading required package: foreach
## Loaded gam 1.22-5
plot(Wage$jobclass, Wage$wage, col = "skyblue", main = "Wage by Job Class")
plot(Wage$maritl, Wage$wage, col = "orange", main = "Wage by Marital Status")
gam_fit <- gam(wage ~ s(age, 4) + education + jobclass + maritl, data = Wage)
summary(gam_fit)
##
## Call: gam(formula = wage ~ s(age, 4) + education + jobclass + maritl,
## data = Wage)
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -112.806 -19.773 -2.918 14.360 209.875
##
## (Dispersion Parameter for gaussian family taken to be 1208.038)
##
## Null Deviance: 5222086 on 2999 degrees of freedom
## Residual Deviance: 3607202 on 2986 degrees of freedom
## AIC: 29819.86
##
## Number of Local Scoring Iterations: NA
##
## Anova for Parametric Effects
## Df Sum Sq Mean Sq F value Pr(>F)
## s(age, 4) 1 199870 199870 165.450 < 2.2e-16 ***
## education 4 1097688 274422 227.163 < 2.2e-16 ***
## jobclass 1 12811 12811 10.605 0.001141 **
## maritl 4 106390 26597 22.017 < 2.2e-16 ***
## Residuals 2986 3607202 1208
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Anova for Nonparametric Effects
## Npar Df Npar F Pr(F)
## (Intercept)
## s(age, 4) 3 23.054 9.437e-15 ***
## education
## jobclass
## maritl
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(gam_fit, se = TRUE, col = "darkgreen")
Job class and marital status show wage differences. GAM shows that age has a non-linear effect; education and job class are strong predictors. The plot of s(age, 4) shows that wage increases with age until around 50, then stabilizes and slowly declines beyond 60. The widening confidence bands at younger and older ages suggest more variability due to fewer observations.
library(ISLR2)
library(splines)
fit_poly3 <- lm(nox ~ poly(dis, 3), data = Boston)
summary(fit_poly3)
##
## Call:
## lm(formula = nox ~ poly(dis, 3), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.121130 -0.040619 -0.009738 0.023385 0.194904
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.554695 0.002759 201.021 < 2e-16 ***
## poly(dis, 3)1 -2.003096 0.062071 -32.271 < 2e-16 ***
## poly(dis, 3)2 0.856330 0.062071 13.796 < 2e-16 ***
## poly(dis, 3)3 -0.318049 0.062071 -5.124 4.27e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06207 on 502 degrees of freedom
## Multiple R-squared: 0.7148, Adjusted R-squared: 0.7131
## F-statistic: 419.3 on 3 and 502 DF, p-value: < 2.2e-16
plot(Boston$dis, Boston$nox, col = "gray", main = "Cubic Polynomial Fit")
lines(sort(Boston$dis), predict(fit_poly3, newdata = data.frame(dis = sort(Boston$dis))), col = "blue", lwd = 2)
The negative coefficient for the first term and positive/negative
coefficients for the higher-order terms confirm a non-linear U-shaped
curvature in the fit.The very low residual standard error (0.06207)
suggests the model predictions are close to observed values on average.
A high F-statistic and extremely low p-value (< 2.2e-16) confirm that
the overall model is statistically significant.
rss <- numeric(10)
for (d in 1:10) {
fit <- lm(nox ~ poly(dis, d), data = Boston)
rss[d] <- sum(residuals(fit)^2)
}
plot(1:10, rss, type = "b", xlab = "Degree", ylab = "RSS", main = "Polynomial Degree vs RSS")
The RSS decreases rapidly from degrees 1 to 3, indicating improved model fit.
library(boot)
cv_errors_poly <- numeric(10)
for (d in 1:10) {
glm_fit <- glm(nox ~ poly(dis, d), data = Boston)
cv_errors_poly[d] <- cv.glm(Boston, glm_fit, K = 10)$delta[1]
}
plot(1:10, cv_errors_poly, type = "b", xlab = "Degree", ylab = "CV Error", main = "CV Error by Degree")
The cross-validation (CV) error is lowest at degree 3, suggesting that a cubic polynomial provides the best generalization to unseen data.Degrees beyond 4 do not improve performance and may overfit.
fit_spline4 <- lm(nox ~ bs(dis, df = 4), data = Boston)
summary(fit_spline4)
##
## Call:
## lm(formula = nox ~ bs(dis, df = 4), data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.124622 -0.039259 -0.008514 0.020850 0.193891
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.73447 0.01460 50.306 < 2e-16 ***
## bs(dis, df = 4)1 -0.05810 0.02186 -2.658 0.00812 **
## bs(dis, df = 4)2 -0.46356 0.02366 -19.596 < 2e-16 ***
## bs(dis, df = 4)3 -0.19979 0.04311 -4.634 4.58e-06 ***
## bs(dis, df = 4)4 -0.38881 0.04551 -8.544 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.06195 on 501 degrees of freedom
## Multiple R-squared: 0.7164, Adjusted R-squared: 0.7142
## F-statistic: 316.5 on 4 and 501 DF, p-value: < 2.2e-16
plot(Boston$dis, Boston$nox, col = "gray", main = "Regression Spline (df=4)")
lines(sort(Boston$dis), predict(fit_spline4, newdata = data.frame(dis = sort(Boston$dis))), col = "red", lwd = 2)
Using a spline with 4 degrees of freedom provides a flexible yet interpretable fit between nox and distance. All spline terms are statistically significant (p < 0.01). The plot shows a sharp decline in nox up to 5–6 miles, then levels off—reflecting higher pollution near urban centers. The model captures key trends without overfitting. Knots are placed automatically based on quantiles of the predictor.
rss_spline <- numeric(8)
dfs <- 3:10
for (i in 1:length(dfs)) {
fit <- lm(nox ~ bs(dis, df = dfs[i]), data = Boston)
rss_spline[i] <- sum(residuals(fit)^2)
}
plot(dfs, rss_spline, type = "b", xlab = "Degrees of Freedom", ylab = "RSS", main = "Spline DF vs RSS")
The RSS decreases sharply from df = 3 to df = 5, then levels off, indicating minimal gains in fit beyond 5 degrees of freedom. This suggests that around 5–6 df captures the key non-linear structure without adding unnecessary complexity.
range_dis <- range(Boston$dis)
cv_errors_spline <- numeric(10)
for (df in 3:10) {
glm_fit <- glm(nox ~ bs(dis, df = df, Boundary.knots = range_dis), data = Boston)
cv_errors_spline[df] <- cv.glm(Boston, glm_fit, K = 10)$delta[1]
}
plot(3:10, cv_errors_spline[3:10], type = "b", xlab = "Degrees of Freedom", ylab = "CV Error", main = "Spline CV Error")
Cross-validation suggests that 5 degrees of freedom minimizes the prediction error. While df = 3 and 4 have slightly higher CV error, and df > 5 do not improve performance, df = 5 provides the best balance of flexibility and generalization.
set.seed(1)
n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- 1 + 2 * x1 + 3 * x2 + rnorm(n)
beta1 <- 0
beta0 <- beta2 <- 0
b0 <- b1 <- b2 <- numeric(1000)
for (i in 1:1000) {
# Update beta2 keping beta1 fixed as per c.
a <- y - beta1 * x1
beta2 <- coef(lm(a ~ x2))[2]
# Update beta1 keeping beta2 fixed as per d.
a <- y - beta2 * x2
beta1 <- coef(lm(a ~ x1))[2]
# Update beta0 using new beta1 and beta2
beta0 <- mean(y - beta1 * x1 - beta2 * x2)
b0[i] <- beta0
b1[i] <- beta1
b2[i] <- beta2
}
plot(b1, type = "l", col = "blue", ylim = range(c(b0, b1, b2)),
ylab = "Coefficient Values", xlab = "Iteration",
main = "Backfitting Estimates Over Time")
lines(b2, col = "red")
lines(b0, col = "green")
legend("bottomright", legend = c("beta0", "beta1", "beta2"),
col = c("green", "blue", "red"), lty = 1)
plot(b1, type = "l", col = "blue", ylim = range(c(b0, b1, b2)),
ylab = "Coefficient Values", xlab = "Iteration",
main = "Backfitting Estimates Over Time")
lines(b2, col = "red")
lines(b0, col = "green")
legend("bottomright", legend = c("beta0", "beta1", "beta2"),
col = c("green", "blue", "red"), lty = 1)
lm_fit <- lm(y ~ x1 + x2)
abline(h = coef(lm_fit)[1], col = "green", lty = 2)
abline(h = coef(lm_fit)[2], col = "blue", lty = 2)
abline(h = coef(lm_fit)[3], col = "red", lty = 2)
A: The dashed horizontal lines from the multiple linear regression model (lm) align closely with the final values from backfitting, confirming convergence. All three coefficients (beta0, beta1, beta2) stabilize quickly i.e., within the first 15–30 iterations — suggesting that even a few iterations are sufficient for a good approximation.