ISLR Chapter 9, 9.7 Exercises 5, 7, and 8

library(ggplot2)
library(e1071)
library(ISLR2)
data(OJ)
data(Auto)

5. We have seen that we can ft an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary

between them. For instance, you can do this as follows:

> x1 <- runif(500) - 0.5

> x2 <- runif(500) - 0.5

> y <- 1 * (x1^2 - x2^2 > 0)

A synthetic dataset was generated with 500 observations and two features (x1 and x2), each uniformly distributed between −0.5 and 0.5. The binary class label y was defined by the quadratic inequality x1² − x2² > 0, creating a nonlinear decision boundary.

set.seed(1)
x1 <- runif(500) - 0.5
x2 <- runif(500) - 0.5
y <- ifelse(x1^2 - x2^2 > 0, 1, 0)
dat <- data.frame(x1 = x1, x2 = x2, y = as.factor(y))

Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y axis.

The observations were plotted on a 2D plane with x1 on the x-axis and x2 on the y-axis. Each point was colored by its true class label. The resulting plot revealed a distinct nonlinear class boundary consistent with a quadratic curve.

ggplot(dat, aes(x = x1, y = x2, color = y)) + 
  geom_point() + 
  theme_minimal() +
  labs(title = "True Class Labels", color = "Class")

Fit a logistic regression model to the data, using X1 and X2 as predictors.

A logistic regression model was fitted using x1 and x2 as linear predictors. This model assumed a linear boundary between the two classes and thus lacked the capacity to accurately represent the true nonlinear boundary.

glm.linear <- glm(y ~ x1 + x2, data = dat, family = "binomial")

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

Class predictions were made using the linear logistic regression model and plotted. The predicted decision boundary was clearly linear and misclassified many observations, especially near the curved region, confirming the model’s limited flexibility.

pred.linear <- ifelse(predict(glm.linear, type = "response") > 0.5, 1, 0)
dat$pred_linear <- as.factor(pred.linear)

ggplot(dat, aes(x = x1, y = x2, color = pred_linear)) + 
  geom_point() + 
  theme_minimal() +
  labs(title = "Logistic Regression with Linear Terms")

Now ft a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X21 , X1×X2, log(X2), and so forth).

A second logistic regression model was fitted using nonlinear transformations of the predictors, including x1², x2², and x1 * x2. These additional terms allowed the model to capture curved patterns in the data.

glm.poly <- glm(y ~ x1 + x2 + I(x1^2) + I(x2^2) + I(x1*x2), data = dat, family = "binomial")

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

The new model’s predictions were plotted, showing a nonlinear decision boundary that better aligned with the true structure of the data. The improved classification accuracy highlighted the importance of using nonlinear terms when the underlying relationship is curved.

pred.poly <- ifelse(predict(glm.poly, type = "response") > 0.5, 1, 0)
dat$pred_poly <- as.factor(pred.poly)

ggplot(dat, aes(x = x1, y = x2, color = pred_poly)) + 
  geom_point() + 
  theme_minimal() +
  labs(title = "Logistic Regression with Polynomial Terms")

Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

A support vector classifier with a linear kernel was fitted using x1 and x2 as predictors. The predicted class labels were plotted, and the resulting decision boundary was linear and resembled the boundary produced by the first logistic model.

svm.linear <- svm(y ~ x1 + x2, data = dat, kernel = "linear", cost = 1)
dat$svm_linear <- predict(svm.linear)

ggplot(dat, aes(x = x1, y = x2, color = svm_linear)) + 
  geom_point() + 
  theme_minimal() +
  labs(title = "SVM with Linear Kernel")

Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

A support vector machine with a radial basis kernel was trained and evaluated. The predictions showed a curved boundary that accurately reflected the true separation of the classes, outperforming the linear models. This confirmed that SVMs with nonlinear kernels can capture complex relationships without requiring explicit feature engineering.

svm.radial <- svm(y ~ x1 + x2, data = dat, kernel = "radial", cost = 1)
dat$svm_radial <- predict(svm.radial)

ggplot(dat, aes(x = x1, y = x2, color = svm_radial)) + 
  geom_point() + 
  theme_minimal() +
  labs(title = "SVM with Radial Kernel")

Comment on your results

The linear logistic regression and linear SVM models produced straight-line boundaries and did not fit the curved class separation well, leading to misclassifications. Adding squared and interaction terms to the logistic model improved its ability to capture the nonlinearity. The SVM with a radial kernel performed the best, creating a flexible curved boundary that closely matched the true class pattern without needing manual feature changes.

7. In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

A binary classification variable, mpg01, was created from the Auto dataset to indicate whether a car’s miles per gallon (MPG) was above the median. Cars with MPG values greater than the median were labeled as 1 (high mileage), and those with values below the median were labeled as 0 (low mileage). The original mpg column was removed to prevent data leakage during model training.

Auto$mpg01 <- ifelse(Auto$mpg > median(Auto$mpg), 1, 0)
Auto <- Auto[, !names(Auto) %in% "mpg"]

Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results. Note you will need to ft the classifier without the gas mileage variable to produce sensible results.

A support vector classifier (SVC) with a linear kernel was fitted to the dataset using multiple values of the cost parameter. Ten-fold cross-validation was conducted to assess the performance of each model. Among the tested values, cost = 1 produced the lowest cross-validation error rate of approximately 9.6%. This result indicates that a moderate level of regularization provided the best generalization for classifying high and low MPG cars.

tune.linear <- tune(svm, mpg01 ~ ., data = Auto, kernel = "linear", 
                    ranges = list(cost = c(0.01, 0.1, 1, 10, 100)))
summary(tune.linear)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.09622359 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.10392176 0.03400146
## 2 1e-01 0.10163690 0.03851596
## 3 1e+00 0.09622359 0.02602388
## 4 1e+01 0.10691648 0.02886707
## 5 1e+02 0.12391867 0.03094478

Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

Support vector machines using radial and polynomial kernels were trained on the same classification task. For the radial kernel, models were evaluated using a grid of cost values (0.1, 1, 10) and gamma values (0.5, 1, 2). For the polynomial kernel, combinations of cost (0.1, 1, 10) and polynomial degrees (2, 3, 4) were used. These kernels introduced nonlinear transformations that allowed the models to learn more complex decision boundaries than those possible with a linear kernel.

tune.radial <- tune(svm, mpg01 ~ ., data = Auto, kernel = "radial", 
                    ranges = list(cost = c(0.1, 1, 10), gamma = c(0.5, 1, 2)))

tune.poly <- tune(svm, mpg01 ~ ., data = Auto, kernel = "polynomial", 
                  ranges = list(cost = c(0.1, 1, 10), degree = c(2, 3, 4)))

Make some plots to back up your assertions in (b) and (c).

To support the evaluation of model flexibility, a radial SVM was trained using only two predictors—horsepower and weight—from the Auto dataset. Using the kernlab package, a decision boundary was visualized by generating predictions over a grid and plotting the classification results. The resulting contour plot showed a visibly nonlinear boundary that effectively separated high and low MPG vehicles, confirming the radial kernel’s ability to model complex patterns in the data.

library(kernlab)

## 
## Attaching package: 'kernlab'

## The following object is masked from 'package:ggplot2':
## 
##     alpha

Auto$mpg <- as.numeric(as.character(Auto$mpg))

Auto$mpg01 <- as.factor(ifelse(Auto$mpg > median(Auto$mpg), 1, 0))

data <- na.omit(Auto[, c("horsepower", "weight", "mpg01")])


svm_fit <- ksvm(mpg01 ~ ., data = data, kernel = "rbfdot", C = 1)

grid <- expand.grid(
  horsepower = seq(min(data$horsepower), max(data$horsepower), length = 100),
  weight = seq(min(data$weight), max(data$weight), length = 100)
)
grid$mpg01 <- predict(svm_fit, grid)

ggplot(data, aes(horsepower, weight, color = mpg01)) +
  geom_point() +
  stat_contour(data = grid, aes(z = as.numeric(mpg01)), breaks = 1.5, color = "black") +
  theme_minimal() +
  labs(title = "SVM Decision Boundary (Radial Kernel)")

Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing > plot(svmfit, dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type > plot(svmfit, dat, x1 ∼ x4) in order to plot just the frst and fourth variables. However, you must replace x1 and x4 with the correct variable names. To fnd out more, type ? plot.svm.

8. This problem involves the OJ data set which is part of the ISLR2 package.

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

The OJ dataset was randomly divided into a training set of 800 observations and a test set with the remaining data. This split allowed for evaluation of model generalization.

set.seed(1)
train_idx <- sample(1:nrow(OJ), 800)
train_OJ <- OJ[train_idx, ]
test_OJ <- OJ[-train_idx, ]

Fit a support vector classifer to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

A support vector classifier with a linear kernel and cost = 0.01 was trained on the training data. The model used 435 support vectors, indicating a relatively soft margin due to the small cost parameter.

svm.linear <- svm(Purchase ~ ., data = train_OJ, kernel = "linear", cost = 0.01)
summary(svm.linear)

## 
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

What are the training and test error rates?

The training and test error rates were calculated as 17.5% and 17.8%, respectively. These values reflected moderate classification performance and suggested that the model could benefit from parameter tuning.

train_pred <- predict(svm.linear, train_OJ)
test_pred <- predict(svm.linear, test_OJ)

mean(train_pred != train_OJ$Purchase)

## [1] 0.175

mean(test_pred != test_OJ$Purchase)

## [1] 0.1777778

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

Using the tune() function, a range of cost values from 0.01 to 10 was evaluated via cross-validation. A cost of 10 yielded the lowest error rate. The tuned model reduced the training error to 16.4% and the test error to 14.8%, indicating improved accuracy through proper hyperparameter selection.

tune.linear <- tune(svm, Purchase ~ ., data = train_OJ, kernel = "linear",
                    ranges = list(cost = c(0.01, 0.1, 1, 10)))
summary(tune.linear)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##    10
## 
## - best performance: 0.17125 
## 
## - Detailed performance results:
##    cost   error dispersion
## 1  0.01 0.17375 0.03884174
## 2  0.10 0.17875 0.03064696
## 3  1.00 0.17500 0.03061862
## 4 10.00 0.17125 0.03488573

Compute the training and test error rates using this new value for cost.

The improved performance with the tuned cost parameter demonstrated the importance of model regularization. Higher cost values reduced the number of support vectors and created a firmer margin, allowing for more accurate decision boundaries.

best.linear <- tune.linear$best.model
mean(predict(best.linear, train_OJ) != train_OJ$Purchase)

## [1] 0.16375

mean(predict(best.linear, test_OJ) != test_OJ$Purchase)

## [1] 0.1481481

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value or gamma.

A support vector machine with a radial kernel and cost = 0.01 was trained on the same data. The model used 634 support vectors, and the error rates were substantially higher—39.4% on training and 37.8% on test data—indicating poor generalization at this cost level.

set.seed(1)
data(OJ)
train_idx <- sample(1:nrow(OJ), 800)
train_OJ <- OJ[train_idx, ]
test_OJ <- OJ[-train_idx, ]

svm_radial <- svm(Purchase ~ ., data = train_OJ, kernel = "radial", cost = 0.01)

summary(svm_radial)

## 
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "radial", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  634
## 
##  ( 319 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

train_pred <- predict(svm_radial, train_OJ)
test_pred <- predict(svm_radial, test_OJ)

train_error <- mean(train_pred != train_OJ$Purchase)
test_error <- mean(test_pred != test_OJ$Purchase)

cat("Training Error (Radial):", round(train_error, 3), "\n")

## Training Error (Radial): 0.394

cat("Test Error (Radial):", round(test_error, 3), "\n")

## Test Error (Radial): 0.378

Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree = 2.

An SVM with a polynomial kernel (degree = 2, cost = 0.01) was also trained. It produced a training error of 37.2% and a test error of 36.7%. Like the radial model, the polynomial kernel struggled to generalize at this low cost value.

svm_poly <- svm(Purchase ~ ., data = train_OJ, kernel = "polynomial", degree = 2, cost = 0.01)

summary(svm_poly)

## 
## Call:
## svm(formula = Purchase ~ ., data = train_OJ, kernel = "polynomial", 
##     degree = 2, cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  636
## 
##  ( 321 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

train_pred_poly <- predict(svm_poly, train_OJ)
test_pred_poly <- predict(svm_poly, test_OJ)

train_error_poly <- mean(train_pred_poly != train_OJ$Purchase)
test_error_poly <- mean(test_pred_poly != test_OJ$Purchase)

cat("Training Error (Poly):", round(train_error_poly, 3), "\n")

## Training Error (Poly): 0.372

cat("Test Error (Poly):", round(test_error_poly, 3), "\n")

## Test Error (Poly): 0.367

Overall, which approach seems to give the best results on this data?

Among all models tested, the linear SVM with tuned cost = 10 achieved the best results, providing the lowest test error. Both radial and polynomial kernels showed signs of overfitting or under-regularization at the cost used, reinforcing the need for tuning and careful model selection.

ISLR Chapter 9, 9.7 Exercises 5, 7, and 8

Karrine45

2025-07-27