Suppose that a curve ĝ g^ is computed to smoothly fit a set of nn points using the following formula: ĝ =argming(Σ(𝐲𝐢−𝐠(𝐱𝐢))𝟐𝒏𝒊=𝟏+λ∫[g(m)(x)]2dx) where g(m) represents the mth derivative of g (and g(0) = g). Provide example sketches of ĝ in each of the following scenarios.

Excercise 2 (A) λ=∞,m=0 In this case ĝ =0 because a large smoothing parameter forces g(0)(x)→0g(0)

library(ggplot2)
set.seed(3)

var1 = runif(50)
eps = rnorm(50)
var2 = sin(12*(var1 + 0.2)) / (var1 + 0.2) + eps
generating_function <- function(var1) {sin(12*(var1 + 0.2)) / (var1 + 0.2)}
data_frame = data.frame(var1, var2)

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_hline(aes(yintercept = 0, linetype = "g(X)"), col = "red", size = 0.8) +
  scale_color_manual(values = "blue") +
  theme(legend.position = "bottom", legend.title = element_blank())
plot of chunk unnamed-chunk-1

Excercise 2 (B) λ = ∞, m = 1 In this case ĝ(x) =c because a large smoothing parameter forces g(1)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_hline(aes(yintercept = mean(var2), linetype = "g(X)"), col = "red", size = 0.8) +
  scale_color_manual(values = "blue") +
  theme(legend.position = "bottom", legend.title = element_blank())
plot of chunk unnamed-chunk-2

Excercise 2 (C) λ = ∞, m = 2 In this case ĝ =cx+d because a large smoothing parameter forces g(2)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_smooth(method = "lm", formula = "y ~ x", se = F, size = 0.8, aes(col = "g(X)")) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())
plot of chunk unnamed-chunk-3

Excercise 2 (D) λ = ∞, m = 3 In this case ĝ =cx2+dx+e because a large smoothing parameter forces g(3)(x)→0

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_smooth(method = "lm", formula = "y ~ x + I(x^2)", se = F, size = 0.8, aes(col = "g(X)")) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())
plot of chunk unnamed-chunk-4
#This means we would get g(x)=ax2+bx+c.

Excercise 2 (E) λ = 0, m = 3 The penalty term doesn’t play any role, so in this case g(x) is the interpolating spline

interp_spline = smooth.spline(x = data_frame$var1, y = data_frame$var2, all.knots = T, lambda = 0.0000000000001)
fitted = predict(interp_spline, x = seq(min(var1) - 0.02, max(var1) + 0.02, by = 0.0001))
fitted = data.frame(x = fitted$x, fitted_y = fitted$y)

ggplot(data_frame, aes(x = var1, y = var2)) +
  geom_point(alpha = 0.5) +
  stat_function(fun = generating_function, aes(col = "Generating Function")) +
  geom_line(data = fitted,
            aes(x = x, y = fitted_y, col = "g(X)"), size = 0.8) +
  scale_color_manual(values = c("blue", "red")) +
  theme(legend.position = "bottom", legend.title = element_blank())
plot of chunk unnamed-chunk-5
# For this reason, we can achieve RSS = 0

Excercise 3

seq_varX = seq(-2, 2, 0.01)
seq_varY = 1 + seq_varX + -2 * (seq_varX - 1)^2 * (seq_varX >= 1)
dataFrame = data.frame(seq_varX, seq_varY)

ggplot(dataFrame, aes(x = seq_varX, y = seq_varY)) +
  geom_vline(xintercept = 0,col = "green") +
  geom_vline(xintercept = 1, col = "blue") +
  geom_hline(yintercept = 0,col = "green") +
  geom_line(size = 1.5, col = "red")
plot of chunk unnamed-chunk-6

For X<1, we have f(X)=1+X, so the slope and intercept are 1. Because of the indicator variable I(X≥1), the quadratic term only starts to become relevant for X≥1, which is why the curve is linear before then.

Substituting Y = 1 + x + -2(x-1)^2 * I(x >= 1)

When X > 1, f(X)=1+X−2(X−1)^2

=1+X−2(X^2−2X+1) = −2X^2+5X−1

Taking the derivative, we can see that the slope for X≥1 will vary - f(X)=−4X+5

slope = d/dx(f(x) = -4x + 5), slope is a function of x

Setting this derivative equal to zero shows why the critical point in the graph occurs at X=5/4

Similarly, for x < 1, f(x) = 1 + x, therefore slope & intercept is 1.

So curve is linear between −2 and 1 with y=1+x

So Quadratic between 1, and 2 with y=1+x−2(x−1)^2

Excercise 4

substituting and fitted model by putting in coefficient estimates and

Y = 1 + I(0<=x<=2) - (x-1) I (1<=x<=2) + 3(x-3) I(3<=x<=4) + 3I(4

Since the range given is [-2,2], Y can be simplified as Y= 1 + I(0<=x<=2) - (x-1) I (1<=x<=2)

XVar = seq(-2, 2, 0.01)
YVar = 1 + (XVar >= 0 & XVar <= 2) - (XVar - 1)*(XVar >= 1 & XVar <= 2) + 3*(X - 3)*(XVar >= 3 & X <= 4) + 3*(XVar > 4 & XVar <= 5)
data_frame_seq <- data.frame(XVar, YVar)

ggplot(data_frame_seq, aes(x = XVar, y = YVar)) +
  geom_vline(xintercept = 0) +
  geom_hline(yintercept = 0,) +
  geom_line(size = 1.5,col = "purple")
plot of chunk unnamed-chunk-7

If x < 0, f(x) = 1, slope is 0

If 0<= x <= 1, f(x) = 2, slope is 0

If 1<= x <= 2, f(x) = 3-x, slope is -1

Exercise 5 - Consider two curves, ˆg1 and ˆg2, defined by ˆg1 = arg min g > 0n i=1 (yi − g(xi))2 + λ L 7 g(3)(x) 82 Dx ? , ˆg2 = arg min G > 0n i=1 (yi − g(xi))2 + λ L 7g(4)(x) dx ?, where g(m) represents the mth derivative of g.

(a) As λ→∞, will ˆg1 or ˆg2 have the smaller training RSS?

➔ By using smoothing spline ĝ2 will result in a smaller training RSS, due to the to the order of the penalty term it will be a higher order polynomial. (so, it will be more flexible)

(b) As λ→∞, will ˆg1 or ˆg2 have the smaller test RSS?

➔ Based on what was given above, we expect ĝ2 to be more flexible, so it may overfit the data. It will probably be ĝ1 that have the smaller test RSS.

➔ As Test RSS depends on distribution of test data, if we take nature of curve into account, since g2 is more flexible it may overfit the data and will have more test RSS

(c) For λ = 0, will ˆg1 or ˆg2 have the smaller training and test RSS?

➔ If λ=0 we have ĝ1=ĝ2, hence they will have the same training and test data RSS.