HW8

###Chapter 09 (page 398): 5, 7, 8

library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(e1071)

## Warning: package 'e1071' was built under R version 4.4.2

library(ISLR)

## Warning: package 'ISLR' was built under R version 4.4.3

##5. We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows: x1 <- runif(500) - 0.5 x2 <- runif(500) - 0.5 y <- 1 * (x1^2 - x2^2 > 0)

n = 500
x1 = runif(n) - 0.5
x2 = runif(n) - 0.5
y = as.factor(1 * (x1^2-x2^2 > 0))

data = data.frame(x1 = x1, x2 = x2, y = y)

head(data)

##            x1         x2 y
## 1  0.36840210  0.2062501 1
## 2 -0.13449459 -0.1185102 1
## 3 -0.18676613  0.2847052 0
## 4 -0.23368905  0.4791380 0
## 5  0.48259083  0.0208252 1
## 6 -0.08751612 -0.2908338 0

(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.

ggplot(data, aes(x=x1, y=x2, color=y))+
  geom_point() +
  theme_minimal() +
  labs(title = "simulated Data with Quadratic Decision Boundary", 
       color = "Class")

(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.

model = glm(y~., data = data, family = binomial)
summary(model)

## 
## Call:
## glm(formula = y ~ ., family = binomial, data = data)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.0002589  0.0894631  -0.003    0.998
## x1           0.0247395  0.3036886   0.081    0.935
## x2           0.0373975  0.3132706   0.119    0.905
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 693.15  on 499  degrees of freedom
## Residual deviance: 693.13  on 497  degrees of freedom
## AIC: 699.13
## 
## Number of Fisher Scoring iterations: 3

(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

probs = predict(model, type = "response")
class = ifelse(probs > 0.5, 1, 0)
data$class = as.factor(class)

ggplot(data, aes(x = x1, y = x2, color = class)) +
  geom_point() +
  theme_minimal() +
  labs(
    title = "Logistic Regression: Linear Decision Boundary",
    color = "Predicted Class"
  )

(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).

nonlinear = glm(y ~ x1 +x2 + I(x1^2) + I(x2^2) + I(x1 * x2), data = data, family = binomial)

## Warning: glm.fit: algorithm did not converge

## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

summary(nonlinear)

## 
## Call:
## glm(formula = y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2), family = binomial, 
##     data = data)
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)     -7.847    321.087  -0.024    0.981
## x1              28.180   6458.847   0.004    0.997
## x2             -67.365   6942.674  -0.010    0.992
## I(x1^2)      32994.070 634893.083   0.052    0.959
## I(x2^2)     -32265.876 620224.869  -0.052    0.959
## I(x1 * x2)   -1350.520  34386.015  -0.039    0.969
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6.9315e+02  on 499  degrees of freedom
## Residual deviance: 1.0904e-05  on 494  degrees of freedom
## AIC: 12
## 
## Number of Fisher Scoring iterations: 25

(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

predicted_probs = predict(nonlinear, type = "response")


data$class = as.factor(ifelse(predicted_probs > 0.5, 1, 0))

ggplot(data, aes(x = x1, y = x2, color = class)) +
  geom_point(size = 1.5) +
  stat_contour(
    aes(z = as.numeric(class)), 
    breaks = 0.5, 
    color = "black", 
    size = 1
  ) +
  theme_minimal() +
  labs(
    title = "Non-linear Logistic Regression Classification",
    subtitle = "With Non-linear Decision Boundary",
    color = "Predicted Class"
  )

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: `stat_contour()`: Zero contours were generated

## Warning in min(x): no non-missing arguments to min; returning Inf

## Warning in max(x): no non-missing arguments to max; returning -Inf

(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svc = svm(
  y ~ x1 + x2,
  data = data,
  kernel = "linear",
  cost = 1,
  scale = TRUE
)

data$svc_pred = predict(svc, newdata = data)

ggplot(data, aes(x = x1, y = x2, color = svc_pred)) +
  geom_point(size = 1.5) +
  theme_minimal() +
  labs(
    title = "Support Vector Classifier (linear Kernel)",
    color = "Predicted Class"
  )

(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

svm_rbf = svm(
  y ~ x1 + x2,
  data = data,
  kernel = "radial",
  cost = 1,
  gamma = 1,
  scale = TRUE
)

data$svm_rbf = predict(svm_rbf, newdata = data)
ggplot(data, aes(x = x1, y = x2, color = svm_rbf)) +
  geom_point(size = 1.5) +
  theme_minimal() + 
  labs(
    title = "SVM with Radial Basis Function Kernel",
    color = "Predicted Class"
  )

(i) Comment on your results. 1. linear logistic regression: glm(y ~ x1 + x2) Poor performance because it cannot capture the non-linear quadratic boundary that generated the data 2. non-linear logistic regression: glm(y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1 * x2)) Excellent performance as it seperates the classes nearly perfectly. 3. linear: SVM svm(y ~ x1 + x2, kernel = “linear”) Poor performance because it also fails to separate data with non-linear boundary 4. SVM with radial kernel: svm(y ~ x1 + x2, kernel = “radial”) Very good performance, adapts to the curved boundary without needing manual feature engineering.

##7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set. (a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

data("Auto")
Auto$mpg_high = as.factor(ifelse(Auto$mpg > median(Auto$mpg), 1, 0))
table(Auto$mpg_high)

## 
##   0   1 
## 196 196

(b) Fit a support vector classifier to the data with various valuesof cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results. Note you will need to fit the classifier without the gas mileage variable to produce sensible results.

auto_data = Auto |>
  select(-mpg, -name)

set.seed(123)
tune_results = tune(svm,mpg_high ~ ., data = auto_data, kernel = "linear", 
                    ranges = list(cost = c(0.01, 0.1, 1, 10, 100)), 
                    scale = TRUE)
summary(tune_results)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.08397436 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.09166667 0.03588344
## 2 1e-01 0.09660256 0.04809549
## 3 1e+00 0.08397436 0.05047803
## 4 1e+01 0.08903846 0.05486624
## 5 1e+02 0.08903846 0.05486624

(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

set.seed(123)
tune_results = tune(svm,mpg_high ~ ., data = auto_data, kernel = "radial", 
                    ranges = list(cost = c(0.01, 0.1, 1, 10),
                                  gamma = c(0.01, 0.1, 1)), 
                    scale = TRUE)
summary(tune_results)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1     1
## 
## - best performance: 0.07121795 
## 
## - Detailed performance results:
##     cost gamma      error dispersion
## 1   0.01  0.01 0.58173077 0.04740051
## 2   0.10  0.01 0.11461538 0.05967456
## 3   1.00  0.01 0.08910256 0.03791275
## 4  10.00  0.01 0.09403846 0.04842472
## 5   0.01  0.10 0.15794872 0.10966981
## 6   0.10  0.10 0.09166667 0.03588344
## 7   1.00  0.10 0.09166667 0.03588344
## 8  10.00  0.10 0.08378205 0.05786777
## 9   0.01  1.00 0.58173077 0.04740051
## 10  0.10  1.00 0.08666667 0.03633961
## 11  1.00  1.00 0.07121795 0.04852869
## 12 10.00  1.00 0.08910256 0.04901413

set.seed(123)
tune_results = tune(svm,mpg_high ~ ., data = auto_data, kernel = "polynomial", 
                    ranges = list(cost = c(0.01, 0.1, 1, 10),
                                  degree = c(2, 3, 4)), 
                    scale = TRUE)
summary(tune_results)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##    10      3
## 
## - best performance: 0.07628205 
## 
## - Detailed performance results:
##     cost degree      error dispersion
## 1   0.01      2 0.46467949 0.12730541
## 2   0.10      2 0.27032051 0.08264456
## 3   1.00      2 0.23698718 0.10692050
## 4  10.00      2 0.18839744 0.08402108
## 5   0.01      3 0.26019231 0.08606441
## 6   0.10      3 0.18858974 0.10830522
## 7   1.00      3 0.10692308 0.04847066
## 8  10.00      3 0.07628205 0.05332556
## 9   0.01      4 0.37762821 0.10511711
## 10  0.10      4 0.27294872 0.09102554
## 11  1.00      4 0.20903846 0.07060112
## 12 10.00      4 0.16801282 0.08096483

with the linear model the best outcome was 1, for the radial model the best outcome was cost 1/gamma 1, and for the polynomial, the best outcome was cost 10/degree 3. the radial kernel was best for non-linear boundaries and is better for hiden complexity in the data. the polynomial kernel for middle amount of non-lineararity but can over-fit at high degrees or fail at low degrees. Cross-validation helps.

(d) Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit , dat , x1 ∼ x4) in order to plot just the first and fourth variables. However, you must replace x1 and x4 with the correct variable names.

library(e1071)
set.seed(123)

# Linear kernel
svm_linear <- svm(mpg_high ~ horsepower + weight, data = Auto, kernel = "linear", cost = 1)

# Radial kernel
svm_rbf <- svm(mpg_high ~ horsepower + weight, data = Auto, kernel = "radial", cost = 10, gamma = 0.1)

# Polynomial kernel
svm_poly <- svm(mpg_high ~ horsepower + weight, data = Auto, kernel = "polynomial", cost = 1, degree = 3)

# Generate grid
x_range <- seq(min(Auto$horsepower), max(Auto$horsepower), length = 100)
y_range <- seq(min(Auto$weight), max(Auto$weight), length = 100)
grid <- expand.grid(horsepower = x_range, weight = y_range)

grid$pred_linear <- predict(svm_linear, newdata = grid)
grid$pred_rbf    <- predict(svm_rbf, newdata = grid)
grid$pred_poly   <- predict(svm_poly, newdata = grid)

library(ggplot2)

plot_decision <- function(grid, pred_col, model_title) {
  ggplot() +
    geom_tile(data = grid, aes_string(x = "horsepower", y = "weight", fill = pred_col), alpha = 0.4) +
    geom_point(data = Auto, aes(x = horsepower, y = weight, color = mpg_high), size = 1.5) +
    theme_minimal() +
    labs(
      title = model_title,
      fill = "Predicted Class",
      color = "True Class"
    )
}

p1 <- plot_decision(grid, "pred_linear", "SVM with Linear Kernel")

## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

p2 <- plot_decision(grid, "pred_rbf", "SVM with Radial Kernel")
p3 <- plot_decision(grid, "pred_poly", "SVM with Polynomial Kernel")

# Use patchwork or gridExtra to display side by side
library(patchwork)

## Warning: package 'patchwork' was built under R version 4.4.2

p1 + p2 + p3

svm_full <- svm(mpg_high ~ ., data = auto_data, kernel = "radial", cost = 10, gamma = 0.1)
plot(svm_full, auto_data, horsepower ~ weight)

plot(svm_full, auto_data, acceleration ~ displacement)

plot(svm_full, auto_data, horsepower ~ weight)

plot(svm_full, auto_data, displacement ~ acceleration)

plot(svm_full, auto_data, cylinders ~ weight)

##8. This problem involves the OJ data set which is part of the ISLR2 package. (a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

data("OJ")
set.seed(123)  # for reproducibility

# Sample 800 random row indices for training
train_indices <- sample(1:nrow(OJ), 800)

# Split into training and test sets
oj_train <- OJ[train_indices, ]
oj_test  <- OJ[-train_indices, ]

nrow(oj_train)

## [1] 800

nrow(oj_test)

## [1] 270

(b) Fit a support vector classifier to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

# Fit SVM with linear kernel and cost = 0.01
svm_linear_oj <- svm(
  Purchase ~ .,
  data = oj_train,
  kernel = "linear",
  cost = 0.01,
  scale = TRUE
)

# View model summary
summary(svm_linear_oj)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "linear", cost = 0.01, 
##     scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  442
## 
##  ( 220 222 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

(c) What are the training and test error rates?

# Predict on training data
train_pred <- predict(svm_linear_oj, newdata = oj_train)

# Predict on test data
test_pred <- predict(svm_linear_oj, newdata = oj_test)

# Training error rate
train_error <- mean(train_pred != oj_train$Purchase)

# Test error rate
test_error <- mean(test_pred != oj_test$Purchase)

# Show results
train_error

## [1] 0.165

test_error

## [1] 0.1777778

(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

tune_result <- tune(
  svm,
  Purchase ~ .,
  data = oj_train,
  kernel = "linear",
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)

# Show tuning summary
summary(tune_result)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     5
## 
## - best performance: 0.16625 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17625 0.03143004
## 2  0.10 0.17250 0.03425801
## 3  1.00 0.16875 0.03596391
## 4  5.00 0.16625 0.03007514
## 5 10.00 0.17250 0.02751262

the best outcome is: 5.00 0.16625 0.03007514

(e) Compute the training and test error rates using this new value for cost.

best_svm <- tune_result$best.model
train_pred_best <- predict(best_svm, newdata = oj_train)
train_error_best <- mean(train_pred_best != oj_train$Purchase)

# Predict on test set
test_pred_best <- predict(best_svm, newdata = oj_test)
test_error_best <- mean(test_pred_best != oj_test$Purchase)

# Show results
train_error_best

## [1] 0.16375

test_error_best

## [1] 0.162963

(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

svm_radial_01 <- svm(
  Purchase ~ .,
  data = oj_train,
  kernel = "radial",
  cost = 0.01,
  scale = TRUE  # Standardizes features
)

# Summary of the model
summary(svm_radial_01)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "radial", cost = 0.01, 
##     scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  629
## 
##  ( 313 316 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

# Predict on training and test sets
train_pred_r01 <- predict(svm_radial_01, newdata = oj_train)
test_pred_r01  <- predict(svm_radial_01, newdata = oj_test)

# Error rates
train_error_r01 <- mean(train_pred_r01 != oj_train$Purchase)
test_error_r01  <- mean(test_pred_r01 != oj_test$Purchase)

train_error_r01

## [1] 0.39125

test_error_r01

## [1] 0.3851852

set.seed(123)

tune_radial <- tune(
  svm,
  Purchase ~ .,
  data = oj_train,
  kernel = "radial",
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)

summary(tune_radial)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.16125 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39125 0.04411554
## 2  0.10 0.17625 0.06108112
## 3  1.00 0.16125 0.04875178
## 4  5.00 0.16500 0.03717451
## 5 10.00 0.17000 0.04794383

# Extract best model
best_radial <- tune_radial$best.model

# Predictions
train_pred_r_best <- predict(best_radial, newdata = oj_train)
test_pred_r_best  <- predict(best_radial, newdata = oj_test)

# Error rates
train_error_r_best <- mean(train_pred_r_best != oj_train$Purchase)
test_error_r_best  <- mean(test_pred_r_best != oj_test$Purchase)

train_error_r_best

## [1] 0.13875

test_error_r_best

## [1] 0.1888889

(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree = 2.

svm_poly_01 <- svm(
  Purchase ~ .,
  data = oj_train,
  kernel = "polynomial",
  degree = 2,
  cost = 0.01,
  scale = TRUE
)

# Summary of the model
summary(svm_poly_01)

## 
## Call:
## svm(formula = Purchase ~ ., data = oj_train, kernel = "polynomial", 
##     degree = 2, cost = 0.01, scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  631
## 
##  ( 313 318 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

# Predictions
train_pred_p01 <- predict(svm_poly_01, newdata = oj_train)
test_pred_p01  <- predict(svm_poly_01, newdata = oj_test)

# Error rates
train_error_p01 <- mean(train_pred_p01 != oj_train$Purchase)
test_error_p01  <- mean(test_pred_p01 != oj_test$Purchase)

train_error_p01

## [1] 0.3725

test_error_p01

## [1] 0.3740741

set.seed(123)

tune_poly <- tune(
  svm,
  Purchase ~ .,
  data = oj_train,
  kernel = "polynomial",
  degree = 2,
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)

# View tuning results
summary(tune_poly)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     5
## 
## - best performance: 0.16875 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39000 0.04281744
## 2  0.10 0.31000 0.05361903
## 3  1.00 0.19250 0.06645801
## 4  5.00 0.16875 0.05958479
## 5 10.00 0.17125 0.06010696

# Extract best polynomial SVM
best_poly <- tune_poly$best.model

# Predict
train_pred_p_best <- predict(best_poly, newdata = oj_train)
test_pred_p_best  <- predict(best_poly, newdata = oj_test)

# Error rates
train_error_p_best <- mean(train_pred_p_best != oj_train$Purchase)
test_error_p_best  <- mean(test_pred_p_best != oj_test$Purchase)

train_error_p_best

## [1] 0.14625

test_error_p_best

## [1] 0.2037037

(h) Overall, which approach seems to give the best results on this data? kernel best cost cv error dispersion train error test error linear 5 0.16625 0.03007514 0.16375 0.162963 radial 1 0.16125 0.04875178 0.13875 0.1888889 polynomial 5 0.16875 0.05958479 0.14625 0.2037037

the radial model has the lowest error and training error so i will say that approach seems the best for this question.

HW8

Joel Fry

2025-05-02