1. Generate a simulated two-class data set with 100 observations and two features in which there is a visible but non-linear separation between the two classes. Show that in this setting, a support vector machine with a polynomial kernel (with degree greater than 1) or a radial kernel will outperform a support vector classifer on the training data. Which technique performs best on the test data? Make plots and report training and test error rates in order to back up your assertions.
library(e1071)
library(ggplot2)
library(gridExtra)

set.seed(123)

n <- 100
x1 <- rnorm(n)
x2 <- rnorm(n)
y <- ifelse(x1^2 + x2^2 > 1.5, 1, -1)
data <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))

train_idx <- sample(1:n, n * 0.7)
train <- data[train_idx, ]
test <- data[-train_idx, ]

svm_linear <- svm(y ~ ., data = train, kernel = "linear", cost = 1)
svm_poly   <- svm(y ~ ., data = train, kernel = "polynomial", cost = 1, degree = 3)
svm_rbf    <- svm(y ~ ., data = train, kernel = "radial", cost = 1, gamma = 1)

pred_train_linear <- predict(svm_linear, train)
pred_test_linear  <- predict(svm_linear, test)

pred_train_poly <- predict(svm_poly, train)
pred_test_poly  <- predict(svm_poly, test)

pred_train_rbf <- predict(svm_rbf, train)
pred_test_rbf  <- predict(svm_rbf, test)

train_error <- c(
  linear = mean(pred_train_linear != train$y),
  poly   = mean(pred_train_poly != train$y),
  rbf    = mean(pred_train_rbf != train$y)
)

test_error <- c(
  linear = mean(pred_test_linear != test$y),
  poly   = mean(pred_test_poly != test$y),
  rbf    = mean(pred_test_rbf != test$y)
)

print("Training Error Rates:")
## [1] "Training Error Rates:"
print(train_error)
##     linear       poly        rbf 
## 0.34285714 0.32857143 0.01428571
print("Test Error Rates:")
## [1] "Test Error Rates:"
print(test_error)
##     linear       poly        rbf 
## 0.33333333 0.16666667 0.06666667
plot_svm <- function(model, data, title) {
  grid <- expand.grid(
    x1 = seq(min(data$x1), max(data$x1), length = 100),
    x2 = seq(min(data$x2), max(data$x2), length = 100)
  )
  grid$pred <- predict(model, grid)

  ggplot(data, aes(x = x1, y = x2, color = y)) +
    geom_point(size = 2) +
    geom_contour(data = grid, aes(z = as.numeric(pred)), breaks = 1.5, color = "black") +
    ggtitle(title) +
    theme_minimal()
}

p1 <- plot_svm(svm_linear, train, "Linear SVM")
p2 <- plot_svm(svm_poly, train, "Polynomial SVM (degree=3)")
p3 <- plot_svm(svm_rbf, train, "RBF SVM (gamma=1)")

grid.arrange(p1, p2, p3, nrow = 1)

RBF SVM performs best on both training and test data due to its flexibility with non-linear boundaries.

  1. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
  1. Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
library(ISLR2)
library(e1071)
library(caret)
## Loading required package: lattice
data(Auto)
Auto1 <- na.omit(Auto)
Auto1$hgm <- as.factor(ifelse(Auto1$mpg > median(Auto1$mpg), 1, 0))  

mpg01 is created. This binary variable will now be used as the response in classification models such as SVM.

  1. Fit a support vector classifer to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results. Note you will need to fit the classifer without the gas mileage variable to produce sensible results.
Auto1 <- Auto1[, !(names(Auto1) %in% c("mpg", "name"))]  

set.seed(1234)

ctrl <- trainControl(method = "cv", number = 10)

svm_linear <- train(
  hgm ~ ., data = Auto1,
  method = "svmLinear",
  preProcess = c("center", "scale"),
  trControl = ctrl,
  tuneGrid = expand.grid(C = 10^seq(-2, 3))
)

svm_linear$results
best_linear <- svm_linear$bestTune
results <- svm_linear$results

ggplot(results, aes(x = log10(C), y = Accuracy)) +
  geom_line(color = "blue") +
  geom_point(size = 3, color = "red") +
  labs(title = "SVM Linear Kernel: Cost vs Cross-Validation Accuracy",
       x = "log10(Cost)",
       y = "Cross-Validation Accuracy") +
  theme_minimal()

From this plot we can say that the optimal cost lies around log10(Cost) = 2, i.e., C = 100. The SVM classifier was tuned over a range of cost values using 10-fold cross validation. The best performing models are at C = 100 and C = 1000, both achieving the accuracy of 91.3 %. Lower values of C such as 0.01 and 0.1 resulted in slightly lower accuracy, indicates underfitting due to a softer margin. High C values penalize misclassification more, leading to tighter margins that fit the training data better.

  1. Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with diferent values of gamma and degree and cost. Comment on your results.
grid_radial <- expand.grid(
  sigma = c(0.01, 0.05, 0.1),
  C = c(0.1, 1, 10)
)
svm_radial <- train(
  hgm ~ ., data = Auto1,
  method = "svmRadial",
  preProcess = c("center", "scale"),
  trControl = ctrl,
  tuneGrid = grid_radial
)

grid_poly <- expand.grid(
  degree = c(2, 3),
  scale = 1,
  C = c(0.1, 1, 10)
)
svm_poly <- train(
  hgm ~ ., data = Auto1,
  method = "svmPoly",
  preProcess = c("center", "scale"),
  trControl = ctrl,
  tuneGrid = grid_poly
)

# comparision
best_radial <- svm_radial$bestTune
best_poly <- svm_poly$bestTune

radial_error <- 1 - max(svm_radial$results$Accuracy)
poly_error <- 1 - max(svm_poly$results$Accuracy)

cat("Best Radial Kernel CV Error:", round(radial_error, 4), "\n")
## Best Radial Kernel CV Error: 0.0741
print(best_radial)
##   sigma  C
## 6  0.05 10
cat("Best Polynomial Kernel CV Error:", round(poly_error, 4), "\n")
## Best Polynomial Kernel CV Error: 0.074
print(best_poly)
##   degree scale   C
## 4      3     1 0.1

For the radial kernel, we tuned over multiple values of the cost parameter (C = 0.1, 1, 10) and sigma (0.01, 0.05, 0.1). The best-performing model was obtained with C = 10 and sigma = 0.05, achieving a cv error of 0.0716, which corresponds to 92.84% accuracy. For the polynomial kernel, tuning was performed over degree = 2, 3, C = 0.1, 1, 10, and a fixed scale = 1. The best polynomial SVM used degree = 3, C = 0.1, and scale = 1, producing a slightly better cross-validation error of 0.0712, or 92.88% accuracy. The polynomial kernel had a marginal edge in performance, though both kernels performed almost same.

  1. Make some plots to back up your assertions in (b) and (c).

Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit, dat) where svmfit contains your ftted model and dat is a data frame containing your data, you can type plot(svmfit, dat, x1 ∼ x4) in order to plot just the frst and fourth variables. However, you must replace x1 and x4 with the correct variable names. To fnd out more, type ?plot.svm.

# Linear
svmfit_2d <- svm(hgm ~ horsepower + acceleration, data = Auto1,
                 kernel = "linear", cost = 1, scale = TRUE)
plot(svmfit_2d, Auto1, horsepower ~ acceleration,
     main = "Linear SVM Decision Boundary (Horsepower vs Acceleration)")

# Polynomial
svmfit_poly <- svm(hgm ~ horsepower + acceleration, data = Auto1,
                   kernel = "polynomial", degree = 3, cost = 0.1, scale = TRUE)
plot(svmfit_poly, Auto1, horsepower ~ acceleration,
     main = "Polynomial SVM (Degree = 3) Decision Boundary")

# Radial
svmfit_rbf <- svm(hgm ~ horsepower + acceleration, data = Auto1,
                  kernel = "radial", gamma = 0.05, cost = 10, scale = TRUE)
plot(svmfit_rbf, Auto1, horsepower ~ acceleration,
     main = "Radial SVM Decision Boundary")

As the above result shows, these plots tells the same. In the plots, SVM decision boundary clearly separates high and low gas mileage cars using just two features: horsepower and acceleration. The background shading shows the predicted class regions, while the data points reveal that most misclassifications occur near the boundary, especially where the feature values overlap.

  1. This problem involves the OJ data set which is part of the ISLR2 package.
  1. Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
data(OJ)
set.seed(1234)

train_indices <- sample(1:nrow(OJ), 800)

train_data <- OJ[train_indices, ]
test_data <- OJ[-train_indices, ]
  1. Fit a support vector classifer to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
svm_oj <- svm(Purchase ~ ., data = train_data, kernel = "linear", cost = 0.01, scale = TRUE)

summary(svm_oj)
## 
## Call:
## svm(formula = Purchase ~ ., data = train_data, kernel = "linear", 
##     cost = 0.01, scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  437
## 
##  ( 219 218 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

The support vector classifier with a linear kernel and cost = 0.01 got 437 support vectors out of 800 training observations, nearly evenly split between the two classes (‘CH’ and ‘MM’). The high number of support vectors suggests that the classes are not easily separable. The low cost encourages a wider margin but may lead to underfitting, which may reduce model accuracy.

  1. What are the training and test error rates?
train_pred <- predict(svm_oj, train_data)
train_error <- mean(train_pred != train_data$Purchase)

test_pred <- predict(svm_oj, test_data)
test_error <- mean(test_pred != test_data$Purchase)

cat("Training Error Rate:", round(train_error, 4), "\n")
## Training Error Rate: 0.1688
cat("Test Error Rate:", round(test_error, 4), "\n")
## Test Error Rate: 0.1593

The training error rate is 16.88%, and the test error rate is 15.93%. This indicates that the model generalizes slightly better on test data than on the training set, which may be due to the soft margin imposed by the low cost (cost = 0.01). The relatively close values suggest the model is not overfitting and is performing reasonably well.

  1. Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
tune_result <- tune(
  svm,
  Purchase ~ ., data = train_data,
  kernel = "linear",
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)
summary(tune_result)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.17 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17125 0.03866254
## 2  0.10 0.17000 0.04297932
## 3  1.00 0.17250 0.04401704
## 4  5.00 0.17500 0.04124790
## 5 10.00 0.17375 0.04185375

The best-performing cost was 0.10, with a corresponding cross-validation error of 17.00%. The error rates across all tested values were fairly close, ranging from 17.00% to 17.50%, indicating that the model’s performance is not sensitive to changes in the cost parameter within this range. cost 0.10 slightly outperforms both lower and higher values.

  1. Compute the training and test error rates using this new value for cost.
best_svm <- tune_result$best.model

train_pred_best <- predict(best_svm, train_data)
train_error_best <- mean(train_pred_best != train_data$Purchase)

test_pred_best <- predict(best_svm, test_data)
test_error_best <- mean(test_pred_best != test_data$Purchase)

cat("Training Error Rate (for cost = 0.1):", round(train_error_best, 4), "\n")
## Training Error Rate (for cost = 0.1): 0.165
cat("Test Error Rate (for cost = 0.1):", round(test_error_best, 4), "\n")
## Test Error Rate (for cost = 0.1): 0.163

Using the model with cost = 0.1, the training error rate is 16.5%, and the test error rate is 16.3%. These are very close, indicating that the model is not overfitting and generalizes well to unseen data. Compared to the model with cost = 0.01, this tuned model slightly reduced both training and test errors, which says the benefit of tuning.

  1. Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
svm_radial_01 <- svm(Purchase ~ ., data = train_data, kernel = "radial", cost = 0.01, scale = TRUE)
summary(svm_radial_01)
## 
## Call:
## svm(formula = Purchase ~ ., data = train_data, kernel = "radial", 
##     cost = 0.01, scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  636
## 
##  ( 319 317 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train_pred_r01 <- predict(svm_radial_01, train_data)
test_pred_r01 <- predict(svm_radial_01, test_data)
train_err_r01 <- mean(train_pred_r01 != train_data$Purchase)
test_err_r01 <- mean(test_pred_r01 != test_data$Purchase)

cat("Training Error (Radial, cost=0.01):", round(train_err_r01, 4), "\n")
## Training Error (Radial, cost=0.01): 0.3962
cat("Test Error (Radial, cost=0.01):", round(test_err_r01, 4), "\n")
## Test Error (Radial, cost=0.01): 0.3704
set.seed(1234)
tune_radial <- tune(
  svm,
  Purchase ~ ., data = train_data,
  kernel = "radial",
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)
summary(tune_radial)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     5
## 
## - best performance: 0.1875 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39625 0.05466120
## 2  0.10 0.20625 0.05212498
## 3  1.00 0.18875 0.04267529
## 4  5.00 0.18750 0.03118048
## 5 10.00 0.20000 0.03632416
best_radial_svm <- tune_radial$best.model

train_pred_best_r <- predict(best_radial_svm, train_data)
test_pred_best_r <- predict(best_radial_svm, test_data)
train_err_best_r <- mean(train_pred_best_r != train_data$Purchase)
test_err_best_r <- mean(test_pred_best_r != test_data$Purchase)

cat("Training Error (Best Radial):", round(train_err_best_r, 4), "\n")
## Training Error (Best Radial): 0.1475
cat("Test Error (Best Radial):", round(test_err_best_r, 4), "\n")
## Test Error (Best Radial): 0.163

Using an initial cost of 0.01, the model gave a training error of 39.62% and a test error of 37.04%, indicating underfitting due to an overly soft margin. After tuning, the best performance was achieved with cost = 5, yielding a cross-validation error of 18.75%.

The tuned model significantly improved generalization, with a training error of 14.75% and a test error of 16.3%. These results perform better than both the initial radial SVM and tuned linear SVM, showing that the radial kernel provides a more flexible boundary and captures non-linear relationships better.

  1. Repeat parts(b)through(e)using a support vector machine with a polynomial kernel.Set degree = 2.
svm_poly_01 <- svm(Purchase ~ ., data = train_data, 
                   kernel = "polynomial", degree = 2, 
                   cost = 0.01, scale = TRUE)
summary(svm_poly_01)
## 
## Call:
## svm(formula = Purchase ~ ., data = train_data, kernel = "polynomial", 
##     degree = 2, cost = 0.01, scale = TRUE)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  640
## 
##  ( 323 317 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM
train_pred_poly01 <- predict(svm_poly_01, train_data)
test_pred_poly01 <- predict(svm_poly_01, test_data)
train_err_poly01 <- mean(train_pred_poly01 != train_data$Purchase)
test_err_poly01 <- mean(test_pred_poly01 != test_data$Purchase)

cat("Training Error (Poly, cost=0.01):", round(train_err_poly01, 4), "\n")
## Training Error (Poly, cost=0.01): 0.3825
cat("Test Error (Poly, cost=0.01):", round(test_err_poly01, 4), "\n")
## Test Error (Poly, cost=0.01): 0.3407
set.seed(1234)
tune_poly <- tune(
  svm,
  Purchase ~ ., data = train_data,
  kernel = "polynomial",
  degree = 2,
  ranges = list(cost = c(0.01, 0.1, 1, 5, 10)),
  scale = TRUE
)
summary(tune_poly)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.18375 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.39625 0.06096732
## 2  0.10 0.34250 0.04901814
## 3  1.00 0.20250 0.03525699
## 4  5.00 0.18625 0.04185375
## 5 10.00 0.18375 0.03283481
best_poly <- tune_poly$best.model

train_pred_best_poly <- predict(best_poly, train_data)
test_pred_best_poly <- predict(best_poly, test_data)
train_err_best_poly <- mean(train_pred_best_poly != train_data$Purchase)
test_err_best_poly <- mean(test_pred_best_poly != test_data$Purchase)

cat("Training Error (Best Poly):", round(train_err_best_poly, 4), "\n")
## Training Error (Best Poly): 0.1562
cat("Test Error (Best Poly):", round(test_err_best_poly, 4), "\n")
## Test Error (Best Poly): 0.1556

With an initial cost of 0.01, the model resulted in a training error of 38.25% and a test error of 34.07%, indicating significant underfitting due to a soft margin and limited model flexibility. The optimal cost was found to be 10, yielding a cross-validation error of 18.38%. The resulting tuned model achieved a training error of 15.62% and a test error of 15.56%, showing substantial improvement over the initial poly model and better performance compared to both the linear and radial kernel SVMs.

  1. Overall, which approach seems to give the best results on this data?

A: Based on the test error rates after tuning each model, the polynomial kernel (degree = 2) gave the best overall performance. It achieved the lowest test error rate of 15.56%, slightly outperforming both the radial SVM (16.3%) and the linear SVM (16.3%).

While all three models performed similarly after tuning, the polynomial kernel offered a better balance between flexibility and generalization, suggesting that the relationship between predictors and the target (Purchase) is non-linear, but does not require overly complex boundaries. Therefore, the polynomial SVM (degree = 2, cost = 10) seems to be the most effective approach.