library(ggplot2)
library(e1071)
library(ISLR2)
data(OJ)
data(Auto)
5. We have seen that we can ft an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
between them. For instance, you can do this as follows:
> x1 <- runif(500) - 0.5
> x2 <- runif(500) - 0.5
> y <- 1 * (x1^2 - x2^2 > 0)
A synthetic dataset was generated with 500 observations and two
features (x1
and x2
), each uniformly
distributed between −0.5 and 0.5. The binary class label y
was defined by the quadratic inequality x1² − x2² > 0
,
creating a nonlinear decision boundary.
set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- ifelse(x1^2 - x2^2 > 0, 1, 0)
dat <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))
The observations were plotted on a 2D plane with x1
on
the x-axis and x2
on the y-axis. Each point was colored by
its true class label. The resulting plot revealed a distinct nonlinear
class boundary consistent with a quadratic curve.
ggplot(dat, aes(x = x1, y = x2, color = y)) +
geom_point() +
theme_minimal() +
labs(title = "True Class Labels", color = "Class")
A logistic regression model was fitted using x1
and
x2
as linear predictors. This model assumed a linear
boundary between the two classes and thus lacked the capacity to
accurately represent the true nonlinear boundary.
glm.linear <- glm(y ~ x1 + x2, data = dat, family = "binomial")
Class predictions were made using the linear logistic regression model and plotted. The predicted decision boundary was clearly linear and misclassified many observations, especially near the curved region, confirming the model’s limited flexibility.
pred.linear <- ifelse(predict(glm.linear, type = "response") > 0.5, 1, 0)
dat$pred_linear <- as.factor(pred.linear)
ggplot(dat, aes(x = x1, y = x2, color = pred_linear)) +
geom_point() +
theme_minimal() +
labs(title = "Logistic Regression with Linear Terms")
A second logistic regression model was fitted using nonlinear
transformations of the predictors, including x1²
,
x2²
, and x1 * x2
. These additional terms
allowed the model to capture curved patterns in the data.
glm.poly <- glm(y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1*x2), data = dat, family = "binomial")
The new model’s predictions were plotted, showing a nonlinear decision boundary that better aligned with the true structure of the data. The improved classification accuracy highlighted the importance of using nonlinear terms when the underlying relationship is curved.
pred.poly <- ifelse(predict(glm.poly, type = "response") > 0.5, 1, 0)
dat$pred_poly <- as.factor(pred.poly)
ggplot(dat, aes(x = x1, y = x2, color = pred_poly)) +
geom_point() +
theme_minimal() +
labs(title = "Logistic Regression with Polynomial Terms")
A support vector classifier with a linear kernel was fitted using
x1
and x2
as predictors. The predicted class
labels were plotted, and the resulting decision boundary was linear and
resembled the boundary produced by the first logistic model.
svm.linear <- svm(y ~ x1 + x2, data = dat, kernel = "linear", cost = 1)
dat$svm_linear <- predict(svm.linear)
ggplot(dat, aes(x = x1, y = x2, color = svm_linear)) +
geom_point() +
theme_minimal() +
labs(title = "SVM with Linear Kernel")
A support vector machine with a radial basis kernel was trained and evaluated. The predictions showed a curved boundary that accurately reflected the true separation of the classes, outperforming the linear models. This confirmed that SVMs with nonlinear kernels can capture complex relationships without requiring explicit feature engineering.
svm.radial <- svm(y ~ x1 + x2, data = dat, kernel = "radial", cost = 1)
dat$svm_radial <- predict(svm.radial)
ggplot(dat, aes(x = x1, y = x2, color = svm_radial)) +
geom_point() +
theme_minimal() +
labs(title = "SVM with Radial Kernel")
The linear logistic regression and linear SVM models produced straight-line boundaries and did not fit the curved class separation well, leading to misclassifications. Adding squared and interaction terms to the logistic model improved its ability to capture the nonlinearity. The SVM with a radial kernel performed the best, creating a flexible curved boundary that closely matched the true class pattern without needing manual feature changes.
7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
A binary classification variable, mpg01
, was created
from the Auto
dataset to indicate whether a car’s miles per
gallon (MPG) was above the median. Cars with MPG values greater than the
median were labeled as 1 (high mileage), and those with values below the
median were labeled as 0 (low mileage). The original mpg
column was removed to prevent data leakage during model training.
Auto$mpg01 <- ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
Auto <- Auto[, !names(Auto) %in% "mpg"]
A support vector classifier (SVC) with a linear kernel was fitted to
the dataset using multiple values of the cost
parameter.
Ten-fold cross-validation was conducted to assess the performance of
each model. Among the tested values, cost = 1
produced the
lowest cross-validation error rate of approximately 9.6%. This result
indicates that a moderate level of regularization provided the best
generalization for classifying high and low MPG cars.
tune.linear <- tune(svm, mpg01 ~ ., data = Auto, kernel = "linear",
ranges = list(cost = c(0.01, 0.1, 1, 10, 100)))
summary(tune.linear)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 1
##
## - best performance: 0.09622359
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-02 0.10392176 0.03400146
## 2 1e-01 0.10163690 0.03851596
## 3 1e+00 0.09622359 0.02602388
## 4 1e+01 0.10691648 0.02886707
## 5 1e+02 0.12391867 0.03094478
Support vector machines using radial and polynomial kernels were
trained on the same classification task. For the radial kernel, models
were evaluated using a grid of cost
values (0.1, 1, 10) and
gamma
values (0.5, 1, 2). For the polynomial kernel,
combinations of cost
(0.1, 1, 10) and polynomial degrees
(2, 3, 4) were used. These kernels introduced nonlinear transformations
that allowed the models to learn more complex decision boundaries than
those possible with a linear kernel.
tune.radial <- tune(svm, mpg01 ~ ., data = Auto, kernel = "radial",
ranges = list(cost = c(0.1, 1, 10), gamma = c(0.5, 1, 2)))
tune.poly <- tune(svm, mpg01 ~ ., data = Auto, kernel = "polynomial",
ranges = list(cost = c(0.1, 1, 10), degree = c(2, 3, 4)))
To support the evaluation of model flexibility, a radial SVM was
trained using only two predictors—horsepower
and
weight
—from the Auto
dataset. Using the
kernlab
package, a decision boundary was visualized by
generating predictions over a grid and plotting the classification
results. The resulting contour plot showed a visibly nonlinear boundary
that effectively separated high and low MPG vehicles, confirming the
radial kernel’s ability to model complex patterns in the data.
library(kernlab)
##
## Attaching package: 'kernlab'
## The following object is masked from 'package:ggplot2':
##
## alpha
Auto$mpg <- as.numeric(as.character(Auto$mpg))
Auto$mpg01 <- as.factor(ifelse(Auto$mpg > median(Auto$mpg), 1, 0))
data <- na.omit(Auto[, c("horsepower", "weight", "mpg01")])
svm_fit <- ksvm(mpg01 ~ ., data = data, kernel = "rbfdot", C = 1)
grid <- expand.grid(
horsepower = seq(min(data$horsepower), max(data$horsepower), length = 100),
weight = seq(min(data$weight), max(data$weight), length = 100)
)
grid$mpg01 <- predict(svm_fit, grid)
ggplot(data, aes(horsepower, weight, color = mpg01)) +
geom_point() +
stat_contour(data = grid, aes(z = as.numeric(mpg01)), breaks = 1.5, color = "black") +
theme_minimal() +
labs(title = "SVM Decision Boundary (Radial Kernel)")
Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing > plot(svmfit, dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type > plot(svmfit, dat, x1 ∼ x4) in order to plot just the frst and fourth variables. However, you must replace x1 and x4 with the correct variable names. To fnd out more, type ? plot.svm.
8. This problem involves the OJ data set which is part of the ISLR2 package.
The OJ dataset was randomly divided into a training set of 800 observations and a test set with the remaining data. This split allowed for evaluation of model generalization.
set.seed(1)
train_idx <- sample(1:nrow(OJ), 800)
train_OJ <- OJ[train_idx, ]
test_OJ <- OJ[-train_idx, ]
A support vector classifier with a linear kernel and
cost = 0.01
was trained on the training data. The model
used 435 support vectors, indicating a relatively soft margin due to the
small cost parameter.
svm.linear <- svm(Purchase ~ ., data = train_OJ, kernel = "linear", cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The training and test error rates were calculated as 17.5% and 17.8%, respectively. These values reflected moderate classification performance and suggested that the model could benefit from parameter tuning.
train_pred <- predict(svm.linear, train_OJ)
test_pred <- predict(svm.linear, test_OJ)
mean(train_pred != train_OJ$Purchase)
## [1] 0.175
mean(test_pred != test_OJ$Purchase)
## [1] 0.1777778
Using the tune()
function, a range of cost values from
0.01 to 10 was evaluated via cross-validation. A cost of 10 yielded the
lowest error rate. The tuned model reduced the training error to 16.4%
and the test error to 14.8%, indicating improved accuracy through proper
hyperparameter selection.
tune.linear <- tune(svm, Purchase ~ ., data = train_OJ, kernel = "linear",
ranges = list(cost = c(0.01, 0.1, 1, 10)))
summary(tune.linear)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.17125
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.17375 0.03884174
## 2 0.10 0.17875 0.03064696
## 3 1.00 0.17500 0.03061862
## 4 10.00 0.17125 0.03488573
The improved performance with the tuned cost parameter demonstrated the importance of model regularization. Higher cost values reduced the number of support vectors and created a firmer margin, allowing for more accurate decision boundaries.
best.linear <- tune.linear$best.model
mean(predict(best.linear, train_OJ) != train_OJ$Purchase)
## [1] 0.16375
mean(predict(best.linear, test_OJ) != test_OJ$Purchase)
## [1] 0.1481481
A support vector machine with a radial kernel and cost = 0.01 was trained on the same data. The model used 634 support vectors, and the error rates were substantially higher—39.4% on training and 37.8% on test data—indicating poor generalization at this cost level.
set.seed(1)
data(OJ)
train_idx <- sample(1:nrow(OJ), 800)
train_OJ <- OJ[train_idx, ]
test_OJ <- OJ[-train_idx, ]
svm_radial <- svm(Purchase ~ ., data = train_OJ, kernel = "radial", cost = 0.01)
summary(svm_radial)
##
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "radial", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 0.01
##
## Number of Support Vectors: 634
##
## ( 319 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train_pred <- predict(svm_radial, train_OJ)
test_pred <- predict(svm_radial, test_OJ)
train_error <- mean(train_pred != train_OJ$Purchase)
test_error <- mean(test_pred != test_OJ$Purchase)
cat("Training Error (Radial):", round(train_error, 3), "\n")
## Training Error (Radial): 0.394
cat("Test Error (Radial):", round(test_error, 3), "\n")
## Test Error (Radial): 0.378
An SVM with a polynomial kernel (degree = 2, cost = 0.01) was also trained. It produced a training error of 37.2% and a test error of 36.7%. Like the radial model, the polynomial kernel struggled to generalize at this low cost value.
svm_poly <- svm(Purchase ~ ., data = train_OJ, kernel = "polynomial", degree = 2, cost = 0.01)
summary(svm_poly)
##
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "polynomial",
## degree = 2, cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 0.01
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 636
##
## ( 321 315 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
train_pred_poly <- predict(svm_poly, train_OJ)
test_pred_poly <- predict(svm_poly, test_OJ)
train_error_poly <- mean(train_pred_poly != train_OJ$Purchase)
test_error_poly <- mean(test_pred_poly != test_OJ$Purchase)
cat("Training Error (Poly):", round(train_error_poly, 3), "\n")
## Training Error (Poly): 0.372
cat("Test Error (Poly):", round(test_error_poly, 3), "\n")
## Test Error (Poly): 0.367
Among all models tested, the linear SVM with tuned cost = 10 achieved the best results, providing the lowest test error. Both radial and polynomial kernels showed signs of overfitting or under-regularization at the cost used, reinforcing the need for tuning and careful model selection.