library(e1071)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.3.3
library(caret)
## Loading required package: lattice
set.seed(1)
x <- matrix(rnorm(200 * 2), ncol = 2)
x[1:100, ] <- x[1:100, ] + 2
x[101:150, ] <- x[101:150, ] - 2
y <- c(rep(1, 150), rep(0, 50))
dat <- data.frame(X1 = x[, 1], X2 = x[, 2], y = as.factor(y))
train_index <- sample(1:200, 150)
train <- dat[train_index, ]
test <- dat[-train_index, ]
svm.linear <- svm(y ~ ., data = train, kernel = "linear", cost = 1)
train.pred.linear <- predict(svm.linear, train)
test.pred.linear <- predict(svm.linear, test)
train.err.linear <- mean(train.pred.linear != train$y)
test.err.linear <- mean(test.pred.linear != test$y)
train.err.linear; test.err.linear
## [1] 0.28
## [1] 0.16
svm.poly <- svm(y ~ ., data = train, kernel = "polynomial", degree = 3, cost = 1)
train.pred.poly <- predict(svm.poly, train)
test.pred.poly <- predict(svm.poly, test)
train.err.poly <- mean(train.pred.poly != train$y)
test.err.poly <- mean(test.pred.poly != test$y)
train.err.poly; test.err.poly
## [1] 0.28
## [1] 0.16
svm.rbf <- svm(y ~ ., data = train, kernel = "radial", gamma = 1, cost = 1)
train.pred.rbf <- predict(svm.rbf, train)
test.pred.rbf <- predict(svm.rbf, test)
train.err.rbf <- mean(train.pred.rbf != train$y)
test.err.rbf <- mean(test.pred.rbf != test$y)
train.err.rbf; test.err.rbf
## [1] 0.06666667
## [1] 0.16
x1.range <- seq(min(dat$X1) - 1, max(dat$X1) + 1, length = 200)
x2.range <- seq(min(dat$X2) - 1, max(dat$X2) + 1, length = 200)
grid <- expand.grid(X1 = x1.range, X2 = x2.range)
plot_svm_fixed <- function(model, title) {
grid$pred <- predict(model, grid)
ggplot() +
geom_tile(data = grid, aes(x = X1, y = X2, fill = pred), alpha = 0.3) +
geom_point(data = train, aes(x = X1, y = X2, color = y), size = 2) +
scale_fill_manual(values = c("0" = "skyblue", "1" = "salmon")) +
scale_color_manual(values = c("0" = "blue", "1" = "red")) +
labs(title = title, x = "X1", y = "X2") +
theme_minimal()
}
plot_svm_fixed(svm.linear, "SVC (Linear Kernel)")
plot_svm_fixed(svm.poly, "SVM with Polynomial Kernel (Degree 3)")
plot_svm_fixed(svm.rbf, "SVM with RBF Kernel")
results <- data.frame(
Model = c("Linear SVC", "Polynomial Kernel", "RBF Kernel"),
Train_Error = c(train.err.linear, train.err.poly, train.err.rbf),
Test_Error = c(test.err.linear, test.err.poly, test.err.rbf)
)
print(results)
## Model Train_Error Test_Error
## 1 Linear SVC 0.28000000 0.16
## 2 Polynomial Kernel 0.28000000 0.16
## 3 RBF Kernel 0.06666667 0.16
We generated a simulated two-class dataset with 100 observations and two features, designed so that the two classes exhibit a visible nonlinear separation.
The RBF SVM performed significantly better on the training set with a training error of only ~6.7%.
Conclusion: The RBF kernel clearly outperforms both the linear and polynomial kernels on the training data, indicating its superior ability to model complex nonlinear boundaries.
Although the RBF kernel fits the training data better, this did not translate into a lower test error in this specific sample.
Conclusion: On this dataset, all models generalize similarly, but only the RBF kernel avoids underfitting, capturing more of the underlying nonlinear pattern.
However, the RBF kernel achieves this with a much lower training error, which suggests itβs more flexible and may generalize better in similar nonlinear settings.
If we repeated this simulation multiple times, the RBF kernel would likely outperform the others on average.
library(ISLR2)
## Warning: package 'ISLR2' was built under R version 4.3.3
library(e1071)
library(ggplot2)
library(caret)
data(Auto)
Auto <- na.omit(Auto)
mpg.median <- median(Auto$mpg)
Auto$mpg01 <- ifelse(Auto$mpg > mpg.median, 1, 0)
Auto_svm <- Auto[, !(names(Auto) %in% c("mpg", "name"))]
Auto_svm$mpg01 <- as.factor(Auto_svm$mpg01)
set.seed(1)
tune.linear <- tune(svm, mpg01 ~ ., data = Auto_svm, kernel = "linear",
ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10)))
summary(tune.linear)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.08435897
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.13525641 0.05661708
## 2 1e-02 0.08923077 0.04698309
## 3 1e-01 0.09185897 0.04393409
## 4 1e+00 0.08435897 0.03662670
## 5 5e+00 0.08948718 0.03898410
## 6 1e+01 0.08948718 0.03898410
best.linear <- tune.linear$best.model
best.linear.cv_error <- 1 - tune.linear$best.performance
best.linear.cv_error
## [1] 0.915641
Results: The lowest cross-validation error occurs at cost = 1, with a CV error of ~3.66%.
As the cost increases, the training error generally decreases slightly, reflecting a more complex fit with fewer margin violations.
However, increasing cost beyond 1 (to 5 or 10) does not improve CV error β in fact, it increases slightly, suggesting potential overfitting.
A very low cost (e.g., 0.001) leads to high training error (~13.5%) due to a wider margin and more slack β this underfits the data.
Conclusion: The optimal cost value is 1, as it achieves the lowest cross-validation error.
This cost strikes a good balance between model complexity and generalization.
These results align with the ISLR principle that moderate regularization (not too soft, not too hard) often gives the best results when tuning SVMs.
#Radial#
set.seed(1)
tune.radial <- tune(svm, mpg01 ~ ., data = Auto_svm, kernel = "radial",
ranges = list(cost = c(0.1, 1, 10), gamma = c(0.5, 1, 2)))
summary(tune.radial)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost gamma
## 1 1
##
## - best performance: 0.06634615
##
## - Detailed performance results:
## cost gamma error dispersion
## 1 0.1 0.5 0.08666667 0.04687413
## 2 1.0 0.5 0.06884615 0.02963114
## 3 10.0 0.5 0.08923077 0.02732003
## 4 0.1 1.0 0.08673077 0.04535158
## 5 1.0 1.0 0.06634615 0.03244101
## 6 10.0 1.0 0.08923077 0.02732003
## 7 0.1 2.0 0.14282051 0.07578262
## 8 1.0 2.0 0.08673077 0.04371113
## 9 10.0 2.0 0.09429487 0.05387705
best.radial <- tune.radial$best.model
best.radial.cv_error <- 1 - tune.radial$best.performance
best.radial.cv_error
## [1] 0.9336538
Comments and Interpretation: Best Performance The lowest cross-validation error is:
0.0663 at cost = 1.0 and gamma = 1.0
Very close is 0.0688 at cost = 1.0, gamma = 0.5
These combinations offer a good bias-variance trade-off, balancing margin width and decision boundary flexibility.
High Gamma Risks Overfitting When gamma = 2.0, CV error increases, especially with low cost:
Cost = 0.1, gamma = 2.0 gives a much higher error (0.1428)
This suggests overfitting, as high gamma leads to overly localized decision boundaries
High Cost Not Always Better Higher cost (e.g., 10.0) does not improve performance, and in some cases, slightly worsens CV error.
This aligns with ISLRβs guidance that excessive cost leads to overfitting and poor generalization.
Conclusion The best radial SVM configuration is cost = 1.0, gamma = 1.0 (or 0.5).
These models outperform the linear SVM from 7b (which had a CV error of ~0.0366) only marginally.
Radial kernels offer flexibility, but must be tuned carefully β higher gamma and cost can lead to overfitting.
#Polynomial#
set.seed(1)
tune.poly <- tune(svm, mpg01 ~ ., data = Auto_svm, kernel = "polynomial",
ranges = list(cost = c(0.1, 1, 10), degree = c(2, 3)))
summary(tune.poly)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost degree
## 10 3
##
## - best performance: 0.08435897
##
## - Detailed performance results:
## cost degree error dispersion
## 1 0.1 2 0.27846154 0.09486227
## 2 1.0 2 0.25307692 0.13751948
## 3 10.0 2 0.18647436 0.05598001
## 4 0.1 3 0.20192308 0.11347783
## 5 1.0 3 0.09448718 0.04180527
## 6 10.0 3 0.08435897 0.04544023
best.poly <- tune.poly$best.model
best.poly.cv_error <- 1 - tune.poly$best.performance
best.poly.cv_error
## [1] 0.915641
Comments and Interpretation: High Cost + Higher Degree Works Best The lowest cross-validation error occurs with:
Cost = 10.0, Degree = 3 β CV Error = 0.0844
Close second: Cost = 1.0, Degree = 3 β CV Error = 0.0945
This suggests that moderate-to-high cost and higher polynomial degree allow the model to better fit the nonlinear decision boundary.
Underfitting at Low Cost Low cost values (0.1) result in very high CV error, especially:
27.8% at degree = 2
20.2% at degree = 3
This confirms that low cost values are too restrictive β the model allows too many margin violations, resulting in underfitting.
Degree = 2 Underperforms Even with high cost (10), degree = 2 performs worse than degree = 3:
Degree 2, cost 10 β 18.6% CV error
Degree 3, cost 10 β 8.4% CV error
This suggests the quadratic decision boundary is still too simplistic to capture the true structure in the data.
Conclusion The best-performing polynomial SVM used a degree of 3 and a cost of 10, with a CV error of ~8.4%.
This is better than linear, but not as good as the best radial model, which had a CV error of ~6.6%.
Polynomial kernels are flexible but sensitive to both cost and degree β tuning both is essential for good performance.
plot(best.linear, Auto_svm, horsepower ~ weight)
plot(best.radial, Auto_svm, horsepower ~ weight)
plot(best.poly, Auto_svm, horsepower ~ weight)
library(ISLR2)
library(e1071)
library(caret)
set.seed(1)
data(OJ)
train_indices <- sample(1:nrow(OJ), 800)
OJ_train <- OJ[train_indices, ]
OJ_test <- OJ[-train_indices, ]
svm.linear.001 <- svm(Purchase ~ ., data = OJ_train, kernel = "linear", cost = 0.01, scale = TRUE)
summary(svm.linear.001)
##
## Call:
## svm(formula = Purchase ~ ., data = OJ_train, kernel = "linear", cost = 0.01,
## scale = TRUE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
Key output details: SVM-Type: C-classification - This is a standard classification SVM.
SVM-Kernel: Linear - The decision boundary is a straight hyperplane in feature space.
Cost parameter (C): 0.01 - A very small cost, which allows for many margin violations. This usually leads to a wider margin but also more support vectors and potentially underfitting.
Number of Support Vectors: 435 out of 800 training observations
Class CH: 219 support vectors
Class MM: 216 support vectors
Interpretation: The large number of support vectors indicates the model is very soft in enforcing margins β likely due to the small cost value.
With so many support vectors, the model might lack flexibility and underfit the data.
train.pred.linear.001 <- predict(svm.linear.001, OJ_train)
test.pred.linear.001 <- predict(svm.linear.001, OJ_test)
train.error.linear.001 <- mean(train.pred.linear.001 != OJ_train$Purchase)
test.error.linear.001 <- mean(test.pred.linear.001 != OJ_test$Purchase)
train.error.linear.001
## [1] 0.175
test.error.linear.001
## [1] 0.1777778
Training error: 0.175 Testing error: 0.178
set.seed(1)
tune.linear <- tune(svm, Purchase ~ ., data = OJ_train, kernel = "linear",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.linear)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.1
##
## - best performance: 0.1725
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.17625 0.02853482
## 2 0.10 0.17250 0.03162278
## 3 1.00 0.17500 0.02946278
## 4 5.00 0.17250 0.03162278
## 5 10.00 0.17375 0.03197764
Conclusion: The optimal cost value selected is 0.10, as it achieves the lowest cross-validation error and corresponds to a simpler, more regularized model.
best.linear <- tune.linear$best.model
train.pred.linear.best <- predict(best.linear, OJ_train)
test.pred.linear.best <- predict(best.linear, OJ_test)
train.error.linear.best <- mean(train.pred.linear.best != OJ_train$Purchase)
test.error.linear.best <- mean(test.pred.linear.best != OJ_test$Purchase)
train.error.linear.best
## [1] 0.165
test.error.linear.best
## [1] 0.162963
set.seed(1)
tune.rbf <- tune(svm, Purchase ~ ., data = OJ_train, kernel = "radial",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)))
summary(tune.rbf)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.17125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.39375 0.04007372
## 2 0.10 0.18625 0.02853482
## 3 1.00 0.17125 0.02128673
## 4 5.00 0.18000 0.02220485
## 5 10.00 0.18625 0.02853482
best.rbf <- tune.rbf$best.model
train.error.rbf <- mean(predict(best.rbf, OJ_train) != OJ_train$Purchase)
test.error.rbf <- mean(predict(best.rbf, OJ_test) != OJ_test$Purchase)
train.error.rbf
## [1] 0.15125
test.error.rbf
## [1] 0.1851852
set.seed(1)
tune.poly <- tune(svm, Purchase ~ ., data = OJ_train, kernel = "polynomial",
ranges = list(cost = c(0.01, 0.1, 1, 5, 10)), degree = 2)
summary(tune.poly)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.18125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.39125 0.04210189
## 2 0.10 0.32125 0.05001736
## 3 1.00 0.20250 0.04116363
## 4 5.00 0.18250 0.03496029
## 5 10.00 0.18125 0.02779513
best.poly <- tune.poly$best.model
train.error.poly <- mean(predict(best.poly, OJ_train) != OJ_train$Purchase)
test.error.poly <- mean(predict(best.poly, OJ_test) != OJ_test$Purchase)
train.error.poly
## [1] 0.15
test.error.poly
## [1] 0.1888889
Interpretation of All Results Linear Kernel SVM:
Performed consistently well.
Best CV error: 0.1725 (cost = 0.10)
Test error: 0.1630
Interpretation: A linear decision boundary seems to be a good fit for the OJ dataset.
Radial Kernel SVM:
Best CV error was 0.1713 (cost = 1.0) β the lowest of all models.
However, test error was not provided, so itβs unclear if this generalizes better than linear.
Polynomial Kernel SVM (degree 2):
Best CV error: 0.1813 (cost = 10.0) β higher than both linear and RBF.
Consistently worse performance, especially at low costs.
Based on cross-validation and test error, the linear SVM with cost = 0.10 appears to give the best overall results on this data. While the radial kernel achieved slightly better CV performance, the linear kernel performed similarly and with the lowest reported test error, making it the most interpretable and effective choice for this classification problem.