Chapter 7 BY DIPEN CHAWLA

Suppose that a curve ĝ g^ is computed to smoothly fit a set of nn points using the following formula: ĝ =argming(Σ(𝐲𝐢−𝐠(𝐱𝐢))𝟐𝒏𝒊=𝟏+λ∫[g(m)(x)]2dx) where g(m) represents the mth derivative of g (and g(0) = g). Provide example sketches of ĝ in each of the following scenarios.

Excercise 2 (A) λ=∞,m=0 In this case ĝ =0 because a large smoothing parameter forces g(0)(x)→0g(0)

library(ggplot2)
set.seed(3)

var1 = runif(50)
eps = rnorm(50)
var2 = sin(12*(var1 + 0.2)) / (var1 + 0.2) + eps
generating_function <- function(var1) {sin(12*(var1 + 0.2)) / (var1 + 0.2)}
data_frame = data.frame(var1, var2)

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_hline(aes(yintercept = 0, linetype = "g(X)"), col = "red", size = 0.8) +
  scale_color_manual(values = "blue") +
  theme(legend.position = "bottom", legend.title = element_blank())

Excercise 2 (B) λ = ∞, m = 1 In this case ĝ(x) =c because a large smoothing parameter forces g(1)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_hline(aes(yintercept = mean(var2), linetype = "g(X)"), col = "red", size = 0.8) +
  scale_color_manual(values = "blue") +
  theme(legend.position = "bottom", legend.title = element_blank())

Excercise 2 (C) λ = ∞, m = 2 In this case ĝ =cx+d because a large smoothing parameter forces g(2)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_smooth(method = "lm", formula = "y ~ x", se = F, size = 0.8, aes(col = "g(X)")) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())

Excercise 2 (D) λ = ∞, m = 3 In this case ĝ =cx2+dx+e because a large smoothing parameter forces g(3)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_smooth(method = "lm", formula = "y ~ x + I(x^2)", se = F, size = 0.8, aes(col = "g(X)")) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())

#This means we would get g(x)=ax2+bx+c.

Excercise 2 (E) λ = 0, m = 3 The penalty term doesn’t play any role, so in this case g(x) is the interpolating spline

interp_spline = smooth.spline(x = data_frame$var1, y = data_frame$var2, all.knots = T, lambda = 0.0000000000001)
fitted = predict(interp_spline, x = seq(min(var1) - 0.02, max(var1) + 0.02, by = 0.0001))
fitted = data.frame(x = fitted$x, fitted_y = fitted$y)

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_line(data = fitted,
            aes(x = x, y = fitted_y, col = "g(X)"), size = 0.8) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())

# For this reason, we can achieve RSS = 0

seq_varX = seq(-2, 2, 0.01)
seq_varY = 1 + seq_varX + -2 * (seq_varX - 1)^2 * (seq_varX >= 1)
dataFrame = data.frame(seq_varX, seq_varY)

ggplot(dataFrame, aes(x = seq_varX, y = seq_varY)) +
  geom_vline(xintercept = 0,col = "green") +
  geom_vline(xintercept = 1, col = "blue") +
  geom_hline(yintercept = 0,col = "green") +
  geom_line(size = 1.5, col = "red")

For X<1, we have f(X)=1+X, so the slope and intercept are 1. Because of the indicator variable I(X≥1), the quadratic term only starts to become relevant for X≥1, which is why the curve is linear before then.

Taking the derivative, we can see that the slope for X≥1 will vary - f(X)=−4X+5

Setting this derivative equal to zero shows why the critical point in the graph occurs at X=5/4

Since the range given is [-2,2], Y can be simplified as Y= 1 + I(0<=x<=2) - (x-1) I (1<=x<=2)

XVar = seq(-2, 2, 0.01)
YVar = 1 + (XVar >= 0 & XVar <= 2) - (XVar - 1)*(XVar >= 1 & XVar <= 2) + 3*(X - 3)*(XVar >= 3 & X <= 4) + 3*(XVar > 4 & XVar <= 5)
data_frame_seq <- data.frame(XVar, YVar)

ggplot(data_frame_seq, aes(x = XVar, y = YVar)) +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0,) +
  geom_line(size = 1.5,col = "purple")

Exercise 5 - Consider two curves, ˆg1 and ˆg2, defined by ˆg1 = arg min g > 0n i=1 (yi − g(xi))2 + λ L 7 g(3)(x) 82 Dx ? , ˆg2 = arg min G > 0n i=1 (yi − g(xi))2 + λ L 7g(4)(x) dx ?, where g(m) represents the mth derivative of g.

➔ By using smoothing spline ĝ2 will result in a smaller training RSS, due to the to the order of the penalty term it will be a higher order polynomial. (so, it will be more flexible)

➔ Based on what was given above, we expect ĝ2 to be more flexible, so it may overfit the data. It will probably be ĝ1 that have the smaller test RSS.

➔ As Test RSS depends on distribution of test data, if we take nature of curve into account, since g2 is more flexible it may overfit the data and will have more test RSS

➔ If λ=0 we have ĝ1=ĝ2, hence they will have the same training and test data RSS.