Statistical Learning Exercise-7 (Chapter 9)

1. This problem involves hyperplanes in two dimensions.

(a) Sketch the hyperplane 1 + 3X1 − X2 = 0. Indicate the set of points for which 1 + 3X1 − X2 > 0, as well as the set of points for which 1 + 3X1 − X2 < 0.

x1 <- seq(-7, 7, by = 0.1)
x2 <- 1 + 3 * x1

plot(x1, x2, type = "l", col = "darkorange", lwd = 2,
     xlab = "X1", ylab = "X2", main = "Hyperplane: 1 + 3X1 - X2 = 0")
abline(h = 0, col = "lightgray", lty = 3)
abline(v = 0, col = "lightgray", lty = 3)

Interpretation: The line separates the plane into two regions. The area above the line corresponds to 1 + 3X1 − X2 > 0 and the area below to 1 + 3X1 − X2 < 0.

(b) On the same plot, sketch the hyperplane −2 + X1 + 2X2 = 0. Indicate the set of points for which −2+X1 +2X2 > 0, as well as the set of points for which −2 + X1 + 2X2 < 0.

x2_alt <- (2 - x1) / 2

plot(x1, x2, type = "l", col = "darkorange", lwd = 2, ylim = c(-6, 6),
     xlab = "X1", ylab = "X2", main = "Both Hyperplanes")
lines(x1, x2_alt, col = "deepskyblue", lwd = 2)

points(c(-5, -3, 1, 3, 5), c(5, 2, -1, 3, 0), col = "purple", pch = 16)
points(c(-4, -2, 2, 4), c(-3, -4, 2, 5), col = "forestgreen", pch = 17)

legend("bottomleft", legend = c("1 + 3X1 - X2 = 0", "-2 + X1 + 2X2 = 0"),
       col = c("darkorange", "deepskyblue"), lwd = 2)

Interpretation: Two hyperplanes are plotted. Points are color-coded based on which side of the second hyperplane they lie. This visualizes regions where the inequalities greater than 0 and less than 0 hold.

2. We now investigate a non-linear decision boundary.

(a) Sketch the curve (1 + X1)^2 + (2 − X2)^2 = 4.

x1_curve <- seq(-3, 1, length.out = 500)
x2_lower <- 2 - sqrt(4 - (1 + x1_curve)^2)
x2_upper <- 2 + sqrt(4 - (1 + x1_curve)^2)

plot(x1_curve, x2_lower, type = "l", col = "tomato", lwd = 2, ylim = c(-1,5),
     xlab = "X1", ylab = "X2", main = "Nonlinear Decision Boundary")
lines(x1_curve, x2_upper, col = "tomato", lwd = 2)
abline(h = 0, col = "gray80", lty = 2)
abline(v = 0, col = "gray80", lty = 2)

Interpretation: This curve is an ellipse centered around (−1,2). It defines the non-linear boundary where observations inside and outside the curve behave differently.

(b) On your sketch, indicate the set of points for which (1 + X1)^2 + (2 − X2)^2 > 4, as well as the set of points for which (1 + X1)^2 + (2 − X2)^2 ≤ 4.

plot(x1_curve, x2_lower, type = "l", col = "tomato", lwd = 2, ylim = c(-1,5),
     xlab = "X1", ylab = "X2", main = "Decision Regions")
lines(x1_curve, x2_upper, col = "tomato", lwd = 2)

# Points inside (<=4)
points(c(-2, -0.5, -1.5, 0.5), c(1.5, 2, 0.8, 2.2), col = "red", pch = 16)

# Points outside (>4)
points(c(-2.8, -1, 0, 1), c(3.5, 4.2, -0.5, 4.5), col = "blue", pch = 17)

legend("topright", legend = c("(1+X1)^2 + (2-X2)^2 > 4", "(1+X1)^2 + (2-X2)^2 ≤ 4"),
       col = c("blue", "red"), pch = c(17,16))

Interpretation: Points inside the ellipse (red) satisfy (1 + X1)^2 + (2 − X2)^2 ≤ 4 and points outside (blue) satisfy (1 + X1)^2 + (2 − X2)^2 > 4.

(c) Suppose that a classifier assigns an observation to the blue class if (1 + X1)^2 + (2 − X2)^2 > 4, and to the red class otherwise. To what class is the observation (0, 0) classified? (−1, 1)? (2, 2)? (3, 8)?

classify_point <- function(x1, x2) {
  value <- (1 + x1)^2 + (2 - x2)^2
  if (value > 4) {
    return("Blue")
  } else {
    return("Red")
  }
}

points_classification <- data.frame(
  X1 = c(0, -1, 2, 3),
  X2 = c(0, 1, 2, 8)
)

apply(points_classification, 1, function(pt) classify_point(pt[1], pt[2]))

## [1] "Blue" "Red"  "Blue" "Blue"

Interpretation:

(0,0): Blue
(-1,1): Red
(2,2): Blue
(3,8): Blue

Thus, only (−1,1) falls inside the red region (inside the curve).

(d) Argue that while the decision boundary in (c) is not linear in terms of X1 and X2, it is linear in terms of X1, X1^2, X2, and X2^2.

Expanding the boundary equation:
- \((1+X1)^2 + (2-X2)^2 = 4\)
- Expands to \(1 + 2X1 + X1^2 + 4 - 4X2 + X2^2 = 4\)
- Simplifying: \(X1^2 + X2^2 + 2X1 - 4X2 + 1 = 0\)

Thus, while the boundary is non-linear in \(X1\) and \(X2\), it is linear when treated as a function of \(X1\), \(X1^2\), \(X2\), and \(X2^2\).