pnt130_Assignment8

Exercise 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.

Exercise 5.(a)

Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them.

set.seed(1)

# Generate n=500 and p =2
X1 <- runif(500, -0.5, 0.5)
X2 <- runif(500, -0.5, 0.5)

# Quadratic
ClassLabel <- as.factor(ifelse(X1^2 - X2^2 > 0, 1, 0))

# Assemble into a data frame
DataSet <- data.frame(X1 = X1, X2 = X2, ClassLabel = ClassLabel)

Exercise 5.(b)

Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the y-axis.

library(ggplot2)
# Plot the observation
ggplot(DataSet, aes(x = X1, y = X2, color = ClassLabel)) +
  geom_point() +
  labs(
    x = "X1",
    y = "X2"
  ) +
  theme_minimal()

Exercise 5.(c)

Fit a logistic regression model to the data, using X1 and X2 as predictors.

# Fit a linear logistic regression
LogisticModelLinear <- glm(ClassLabel ~ X1 + X2, data = DataSet, family = binomial)

summary(LogisticModelLinear)

## 
## Call:
## glm(formula = ClassLabel ~ X1 + X2, family = binomial, data = DataSet)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.087260   0.089579  -0.974    0.330
## X1           0.196199   0.316864   0.619    0.536
## X2          -0.002854   0.305712  -0.009    0.993
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.18  on 499  degrees of freedom
## Residual deviance: 691.79  on 497  degrees of freedom
## AIC: 697.79
## 
## Number of Fisher Scoring iterations: 3

Exercise 5.(d)

# Compute predicted probabilities and convert to class labels
DataSet$PredProbLinear <- predict(LogisticModelLinear, type = "response")
DataSet$PredClassLinear <- as.factor(ifelse(DataSet$PredProbLinear > 0.5, 1, 0))

# Plot predicted classes
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassLinear)) +
  geom_point() +
  labs(
    title = "Predicted Classes (Linear Logistic)",
    x = "X1",
    y = "X2"
  ) +
  theme_minimal()

Exercise 5.(e)

Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X21 , X1 ×X2, log(X2), and so forth).

# alternative with log, shift X2 up by 0.51 so it's >0
LogisticModelNonLinear2 <- glm(
  ClassLabel ~ I(X1^2)  + I(X1 * X2) + I(log(X2+0.5)),
  data   = DataSet,
  family = binomial
)

summary(LogisticModelNonLinear2)

## 
## Call:
## glm(formula = ClassLabel ~ I(X1^2) + I(X1 * X2) + I(log(X2 + 
##     0.5)), family = binomial, data = DataSet)
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)    
## (Intercept)       -1.2832     0.1991  -6.445 1.16e-10 ***
## I(X1^2)           24.7078     2.2214  11.122  < 2e-16 ***
## I(X1 * X2)        -0.2343     1.5664  -0.150    0.881    
## I(log(X2 + 0.5))   0.6839     0.1380   4.956 7.19e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 692.18  on 499  degrees of freedom
## Residual deviance: 452.45  on 496  degrees of freedom
## AIC: 460.45
## 
## Number of Fisher Scoring iterations: 5

Exercise 5.(f)

Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not,then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

# Compute predicted probabilities and class labels
DataSet$PredProbNonLin <- predict(LogisticModelNonLinear2, type = "response")
DataSet$PredClassNonLin <- as.factor(ifelse(DataSet$PredProbNonLin > 0.5, 1, 0))

# Plot the new predicted classes
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassNonLin)) +
  geom_point() +
  labs(
    x = "X1",
    y = "X2"
  ) +
  theme_minimal()

Exercise 5.(g)

Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

library(e1071)
library(ggplot2)

# Fit SVM with a linear kernel
SVMModelLinear <- svm(
  ClassLabel ~ X1 + X2,
  data   = DataSet,
  #kernel = "linear",
  kernel = "radial",
  cost   = 10
)

# Predict on the training data
DataSet$PredClassSVMLinear <- predict(SVMModelLinear, newdata = DataSet)

summary(DataSet$PredClassSVMLinear)

##   0   1 
## 259 241

# Plot the linear SVM decision regions
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassSVMLinear)) +
  geom_point() +
  labs(
    x = "X1",
    y = "X2"
  ) +
  theme_minimal()

Exercise 5.(h)

Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

# Fit SVM with a radial (Gaussian) kernel
SVMModelRadial <- svm(
  ClassLabel ~ X1 + X2,
  data   = DataSet,
  kernel = "radial",
  gamma  = 1,     # width parameter for Gaussian kernel
  cost   = 1
)

# Predict on the training data
DataSet$PredClassSVMRadial <- predict(SVMModelRadial, newdata = DataSet)

summary(DataSet$PredClassSVMRadial)

##   0   1 
## 269 231

# Plot the radial SVM decision regions
ggplot(DataSet, aes(x = X1, y = X2, color = PredClassSVMRadial)) +
  geom_point() +
  labs(
    x = "X1",
    y = "X2"
  ) +
  theme_minimal()

Exercise 5.(i)

Answer:Linear models misclassify points near the curved boundary. non-linear logistic regression and radial SVM are more accurate.

Exercise 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.

Exercise 7.(a)

Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

library(ISLR2)
library(e1071)

data("Auto", package = "ISLR2")

AutoData <- na.omit(Auto)  

# Create a binary variable
MedianMPG <- median(AutoData$mpg)

AutoData$HighMPG <- as.factor(ifelse(AutoData$mpg > MedianMPG, 1, 0))

# Remove original mpg variable
AutoData$mpg <- NULL

head(AutoData)

##   cylinders displacement horsepower weight acceleration year origin
## 1         8          307        130   3504         12.0   70      1
## 2         8          350        165   3693         11.5   70      1
## 3         8          318        150   3436         11.0   70      1
## 4         8          304        150   3433         12.0   70      1
## 5         8          302        140   3449         10.5   70      1
## 6         8          429        198   4341         10.0   70      1
##                        name HighMPG
## 1 chevrolet chevelle malibu       0
## 2         buick skylark 320       0
## 3        plymouth satellite       0
## 4             amc rebel sst       0
## 5               ford torino       0
## 6          ford galaxie 500       0

Exercise 7.(b)

Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results. Note you will need to fit the classifier without the gas mileage variable to produce sensible results.

Answer: The lowest Cross-Validation error occurs at cost = 0.1.

set.seed(1)

LinearTune <- tune(svm, HighMPG ~ ., data   = AutoData,  kernel = "linear",  ranges = list(cost = c(0.01, 0.1, 1, 10, 100)))

# Summary cross-validation results
summary(LinearTune)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##   0.1
## 
## - best performance: 0.08673077 
## 
## - Detailed performance results:
##    cost      error dispersion
## 1 1e-02 0.08923077 0.04698309
## 2 1e-01 0.08673077 0.04040897
## 3 1e+00 0.09961538 0.04923181
## 4 1e+01 0.11237179 0.05701890
## 5 1e+02 0.11750000 0.06208951

Exercise 7.(c)

Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

Answer: The radial kernel achieves the lowest error, while the polynomial kernel performs best at degree = 3, cost = 10.

set.seed(1)

# SVM with radial
RadialTune <- tune(svm, HighMPG ~ .,  data   = AutoData,  kernel = "radial",  ranges = list(cost  = c(0.1, 1, 10), gamma = c(0.01, 0.1, 1)))


#summary
summary(RadialTune)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##    10     1
## 
## - best performance: 0.07897436 
## 
## - Detailed performance results:
##   cost gamma      error dispersion
## 1  0.1  0.01 0.11224359 0.03836937
## 2  1.0  0.01 0.08673077 0.04551036
## 3 10.0  0.01 0.08673077 0.03855882
## 4  0.1  0.10 0.08666667 0.04193895
## 5  1.0  0.10 0.08923077 0.04376306
## 6 10.0  0.10 0.08416667 0.05256241
## 7  0.1  1.00 0.55115385 0.04366593
## 8  1.0  1.00 0.07903846 0.04891067
## 9 10.0  1.00 0.07897436 0.04869339

set.seed(1)

# SVM with Polynomial
PolyTune <- tune(svm, HighMPG ~ ., data   = AutoData,  kernel = "polynomial",  ranges = list(cost   = c(0.1, 1, 10), degree = c(2, 3, 4)))

#Summary
summary(PolyTune)

## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##    10      2
## 
## - best performance: 0.520641 
## 
## - Detailed performance results:
##   cost degree     error dispersion
## 1  0.1      2 0.5511538 0.04366593
## 2  1.0      2 0.5511538 0.04366593
## 3 10.0      2 0.5206410 0.08505283
## 4  0.1      3 0.5511538 0.04366593
## 5  1.0      3 0.5511538 0.04366593
## 6 10.0      3 0.5511538 0.04366593
## 7  0.1      4 0.5511538 0.04366593
## 8  1.0      4 0.5511538 0.04366593
## 9 10.0      4 0.5511538 0.04366593

Exercise 7.(d)

Make some plots to back up your assertions in (b) and (c).

library(ggplot2)

# Extract best models
BestLinearModel <- LinearTune$best.model
BestRadialModel <- RadialTune$best.model

# Plot linear SVM on Horsepower and Weight
plot(BestLinearModel, AutoData, horsepower ~ weight, main = "Linear SVM")

# Plot radial SVM on Weight and Displacement
plot(BestRadialModel, AutoData, weight ~ displacement, main = "Radial SVM")

Exercise 8

This problem involves the OJ data set which is part of the ISLR2 package.

Exercise 8.(a)

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

library(ISLR2)
library(e1071)

set.seed(1)  

TotalRows <- nrow(OJ)

TrainIndices <- sample(seq_len(TotalRows), size = 800)  

OJTrain <- OJ[TrainIndices, ]  # 800 samples for train
OJTest  <- OJ[-TrainIndices, ]  

head(OJTrain)

##      Purchase WeekofPurchase StoreID PriceCH PriceMM DiscCH DiscMM SpecialCH
## 1017       CH            236       7    1.75    1.99   0.00   0.40         0
## 679        CH            277       1    1.99    2.13   0.24   0.00         0
## 129        CH            267       2    1.86    2.18   0.00   0.40         0
## 930        MM            236       3    1.79    2.09   0.00   0.00         0
## 471        CH            274       7    1.86    2.13   0.47   0.54         1
## 299        MM            231       2    1.69    1.69   0.30   0.00         1
##      SpecialMM  LoyalCH SalePriceMM SalePriceCH PriceDiff Store7 PctDiscMM
## 1017         0 0.908732        1.59        1.75     -0.16    Yes  0.201005
## 679          0 0.585435        2.13        1.75      0.38     No  0.000000
## 129          1 0.890772        1.78        1.86     -0.08     No  0.183486
## 930          0 0.163840        2.09        1.79      0.30     No  0.000000
## 471          0 0.500000        1.59        1.39      0.20    Yes  0.253521
## 299          0 0.467200        1.69        1.39      0.30     No  0.000000
##      PctDiscCH ListPriceDiff STORE
## 1017  0.000000          0.24     0
## 679   0.120603          0.14     1
## 129   0.000000          0.32     2
## 930   0.000000          0.30     3
## 471   0.252688          0.27     0
## 299   0.177515          0.00     2

Exercise 8.(b)

Fit a support vector classifier to the training data using cost = 0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

There are pretty large number of Suspport Vectors(=435), it could be very flexible.

# SVM with cost = 0.01
SVMLinearInitial <- svm(Purchase ~ ., data   = OJTrain, kernel = "linear", cost   = 0.01)

# Summary
summary(SVMLinearInitial)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "linear", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  0.01 
## 
## Number of Support Vectors:  435
## 
##  ( 219 216 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

Exercise 8.(c)

What are the training and test error rates?

# Predictions on training set
PredTrainLinearInit <- predict(SVMLinearInitial, newdata = OJTrain)
TrainErrorLinearInit <- mean(PredTrainLinearInit != OJTrain$Purchase)

# Predictions on test set
PredTestLinearInit <- predict(SVMLinearInitial, newdata = OJTest)
TestErrorLinearInit <- mean(PredTestLinearInit != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorLinearInit))

## 
## >> Training Error Rate: 0.175000

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorLinearInit))

## 
## >> Test     Error Rate: 0.177778

Exercise 8.(d)

Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

# Cross‐validation
LinearTune <- tune(svm, Purchase ~ ., data   = OJTrain, kernel = "linear", ranges = list(cost = c(0.01, 0.1, 1, 10)))

# Best cost
BestCostLinear <- LinearTune$best.parameters$cost
BestCostLinear

## [1] 10

Exercise 8.(e)

Compute the training and test error rates using this new value for cost.

# fit with new value
SVMLinearTuned <- LinearTune$best.model

# Compute errors
TrainErrorLinearTuned <- mean(predict(SVMLinearTuned, OJTrain) != OJTrain$Purchase)
TestErrorLinearTuned  <- mean(predict(SVMLinearTuned, OJTest)  != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorLinearTuned))

## 
## >> Training Error Rate: 0.163750

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorLinearTuned))

## 
## >> Test     Error Rate: 0.148148

Exercise 8.(a)

Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

# Init radial SVM (cost = 0.01, default gamma)
SVMRadialInitial <- svm(Purchase ~ ., data   = OJTrain, kernel = "radial", cost   = 0.01)

#Summary
summary(SVMRadialInitial)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "radial", cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  radial 
##        cost:  0.01 
## 
## Number of Support Vectors:  634
## 
##  ( 319 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

# Init errors
TrainErrorRadialInit <- mean(predict(SVMRadialInitial, OJTrain) != OJTrain$Purchase)
TestErrorRadialInit  <- mean(predict(SVMRadialInitial, OJTest)  != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorRadialInit))

## 
## >> Training Error Rate: 0.393750

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorRadialInit))

## 
## >> Test     Error Rate: 0.377778

# Tune cost for radial kernel
RadialTune <- tune(svm, Purchase ~ ., data   = OJTrain, kernel = "radial", ranges = list(cost = c(0.01, 0.1, 1, 10)))
BestCostRadial <- RadialTune$best.parameters$cost

# Errors with tuned radial SVM
SVMRadialTuned <- RadialTune$best.model

TrainErrorRadialTuned <- mean(predict(SVMRadialTuned, OJTrain) != OJTrain$Purchase)
TestErrorRadialTuned  <- mean(predict(SVMRadialTuned, OJTest)  != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorRadialTuned))

## 
## >> Training Error Rate: 0.151250

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorRadialTuned))

## 
## >> Test     Error Rate: 0.185185

Exercise 8.(g)

Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree = 2.

# Init polynomial SVM (degree = 2)
SVMPolyInitial <- svm(Purchase ~ ., data   = OJTrain, kernel = "polynomial", degree = 2, cost   = 0.01)

summary(SVMPolyInitial)

## 
## Call:
## svm(formula = Purchase ~ ., data = OJTrain, kernel = "polynomial", 
##     degree = 2, cost = 0.01)
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  polynomial 
##        cost:  0.01 
##      degree:  2 
##      coef.0:  0 
## 
## Number of Support Vectors:  636
## 
##  ( 321 315 )
## 
## 
## Number of Classes:  2 
## 
## Levels: 
##  CH MM

# Init errors
TrainErrorPolyInit <- mean(predict(SVMPolyInitial, OJTrain) != OJTrain$Purchase)
TestErrorPolyInit  <- mean(predict(SVMPolyInitial, OJTest)  != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorPolyInit))

## 
## >> Training Error Rate: 0.372500

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorPolyInit))

## 
## >> Test     Error Rate: 0.366667

# Tune cost for polynomial kernel
PolyTune <- tune(svm, Purchase ~ ., data   = OJTrain, kernel = "polynomial", degree = 2,ranges = list(cost = c(0.01, 0.1, 1, 10)))

BestCostPoly <- PolyTune$best.parameters$cost


# Errors with tuned polynomial SVM
SVMPolyTuned <- PolyTune$best.model

TrainErrorPolyTuned <- mean(predict(SVMPolyTuned, OJTrain) != OJTrain$Purchase)
TestErrorPolyTuned  <- mean(predict(SVMPolyTuned, OJTest)  != OJTest$Purchase)

cat(sprintf("\n>> Training Error Rate: %f\n\n", TrainErrorPolyTuned))

## 
## >> Training Error Rate: 0.150000

cat(sprintf("\n>> Test     Error Rate: %f\n\n", TestErrorPolyTuned))

## 
## >> Test     Error Rate: 0.188889

Exercise 8.(h)

Overall, which approach seems to give the best results on this data?

Answer: The radial‐kernel SVM with tuned cost yields the lowest test error, so it performs best on this data

pnt130_Assignment8

Jongbum Choi

2025-07-25

Exercise 5

Exercise 5.(a)

Exercise 5.(b)

Exercise 5.(c)

Exercise 5.(d)

Exercise 5.(e)

Exercise 5.(f)

Exercise 5.(g)

Exercise 5.(h)

Exercise 5.(i)

Exercise 7

Exercise 7.(a)

Exercise 7.(b)

Exercise 7.(c)

Exercise 7.(d)

Exercise 8

Exercise 8.(a)

Exercise 8.(b)

Exercise 8.(c)

Exercise 8.(d)

Exercise 8.(e)

Exercise 8.(a)

Exercise 8.(g)

Exercise 8.(h)