1. This problem involves hyperplanes in two dimensions.

a). Sketch the hyperplane 1+3X1−X2=0. Indicate the set of points for which 1+3X1−X2>0,as well as the set of points for which 1+3X1−X2<0.

b). On the same plot, sketch the hyperplane −2+X1+2X2=0. Indicate the set of points for which −2+X1+2X2>0,as well as the set of points for which −2+X1+2X2<0.

plot(NULL, xlim = c(-2, 3), ylim = c(-2, 5), xlab = "X1", ylab = "X2",
     main = "Exercise 1: Hyperplanes and Regions")
grid()

curve(3 * x + 1, from = -2, to = 3, col = "red", lwd = 2, add = TRUE)
text(2, 3, "1 + 3X1 - X2 = 0", col = "red", pos = 4)

polygon(x = c(-2, 3, 3, -2), y = c(-2, -2, 3*3+1, 3*(-2)+1), border = NA, col = rgb(1, 0, 0, 0.1))

text(0, -1.5, "1 + 3X1 - X2 > 0", col = "red")
text(0, 4.5, "1 + 3X1 - X2 < 0", col = "red")

curve((2 - x) / 2, from = -2, to = 3, col = "blue", lwd = 2, add = TRUE)
text(-1, 2.5, "-2 + X1 + 2X2 = 0", col = "blue", pos = 4)

polygon(x = c(-2, 3, 3, -2), y = c(-2, -2, (2 - 3)/2, (2 - (-2))/2), border = NA, col = rgb(0, 0, 1, 0.1))

text(-1.5, 0, "-2 + X1 + 2X2 < 0", col = "blue")
text(1.5, 4.5, "-2 + X1 + 2X2 > 0", col = "blue")

2. We have seen that in p=2 dimensions, a linear decision boundary takes the form β0+β1X1+β2X2=0. We now investigate a non-linear decision boundary.

a). Sketch the curve (1+X1)2+(2−X2)2=4.

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
x1 <- seq(-4, 3, length.out = 200)
x2 <- seq(-2, 6, length.out = 200)
grid <- expand.grid(X1 = x1, X2 = x2)

grid$decision <- (1 + grid$X1)^2 + (2 - grid$X2)^2

ggplot(grid, aes(x = X1, y = X2)) +
  geom_contour(aes(z = decision), breaks = 4, color = "black") +
  labs(title = "Circle: (1 + X1)^2 + (2 - X2)^2 = 4",
       x = "X1", y = "X2") +
  theme_minimal()

b). On your sketch, indicate the set of points for which (1+X1)2+(2−X2)2>4, as well as the set of points for which (1+X1)2+(2−X2)2≤4.

grid$class <- ifelse(grid$decision > 4, "Blue Region (> 4)", "Red Region (≤ 4)")

ggplot(grid, aes(x = X1, y = X2)) +
  geom_tile(aes(fill = class), alpha = 0.3) +
  geom_contour(aes(z = decision), breaks = 4, color = "black") +
  scale_fill_manual(values = c("Red Region (≤ 4)" = "red", "Blue Region (> 4)" = "skyblue")) +
  labs(title = "Shaded Regions Based on Circle Inequality",
       x = "X1", y = "X2", fill = "Region") +
  theme_minimal()

c). Suppose that a classifier assigns an observation to the blue class if (1+X1)2+(2−X2)2>4, and to the red class otherwise. To what class is the observation (0,0) classified? (−1,1)?(2,2)?(3,8)?

points <- data.frame(
  X1 = c(0, -1, 2, 3),
  X2 = c(0, 1, 2, 8)
)

points$decision_value <- (1 + points$X1)^2 + (2 - points$X2)^2
points$class <- ifelse(points$decision_value > 4, "Blue", "Red")
points
##   X1 X2 decision_value class
## 1  0  0              5  Blue
## 2 -1  1              1   Red
## 3  2  2              9  Blue
## 4  3  8             52  Blue
x1 <- seq(-4, 3, length.out = 200)
x2 <- seq(-2, 6, length.out = 200)
grid <- expand.grid(X1 = x1, X2 = x2)

grid$decision <- (1 + grid$X1)^2 + (2 - grid$X2)^2

grid$class <- ifelse(grid$decision > 4, "Blue Region (> 4)", "Red Region (≤ 4)")

points <- data.frame(
  X1 = c(0, -1, 2, 3),
  X2 = c(0, 1, 2, 8),
  label = c("(0,0)", "(-1,1)", "(2,2)", "(3,8)")
)

points$decision_value <- (1 + points$X1)^2 + (2 - points$X2)^2
points$class <- ifelse(points$decision_value > 4, "Blue", "Red")

ggplot(grid, aes(x = X1, y = X2)) +
  geom_tile(aes(fill = class), alpha = 0.3) +
  geom_contour(aes(z = decision), breaks = 4, color = "black") +
  geom_point(data = points, aes(x = X1, y = X2), color = "black", size = 3) +
  geom_text(data = points, aes(x = X1, y = X2, label = label), vjust = -1, color = "black") +
  scale_fill_manual(values = c("Red Region (≤ 4)" = "red", "Blue Region (> 4)" = "skyblue")) +
  labs(title = "Classified Points on Nonlinear Decision Boundary",
       x = "X1", y = "X2", fill = "Region") +
  theme_minimal()

d). Argue that while the decision boundary in (c) is not linear in terms of X1 and X2, it is linear in terms of X1,X2 1,X2, and X^2v2

Although the decision boundary in part (c) appears non-linear in \(X_1\) and \(X_2\), we can expand the expression:

\[ (1 + X_1)^2 + (2 - X_2)^2 = 4 \]

Expanding both squared terms:

\[ (1 + X_1)^2 = X_1^2 + 2X_1 + 1 \] \[ (2 - X_2)^2 = X_2^2 - 4X_2 + 4 \]

Combining terms:

\[ X_1^2 + X_2^2 + 2X_1 - 4X_2 + 1 + 4 = 4 \Rightarrow X_1^2 + X_2^2 + 2X_1 - 4X_2 + 5 = 4 \Rightarrow X_1^2 + X_2^2 + 2X_1 - 4X_2 + 1 = 0 \]

This final equation is linear in terms of the variables:

  • \(X_1\)
  • \(X_2\)
  • \(X_1^2\)
  • \(X_2^2\)

Therefore, while the original decision boundary is non-linear in the original feature space, it is linear in an expanded feature space that includes quadratic terms. This principle underlies many nonlinear learning methods such as polynomial kernel SVMs.