set.seed(421)
x1 = runif(500) - 0.5
x2 = runif(500) - 0.5
y = 1 * (x1^2 - x2^2 > 0)
The quadratic decision boundary between two classes is,
\(x_1^{2}\) - \(x_2^{2}\) = 0
plot(x1[y == 0], x2[y == 0], col = "green", xlab = "X1", ylab = "X2", pch = "*")
points(x1[y == 1], x2[y == 1], col = "blue", pch = 1)
The plot shows a non-linear decision boundary.
fit_lm = glm(y ~ x1 + x2, family = binomial)
summary(fit_lm)
##
## Call:
## glm(formula = y ~ x1 + x2, family = binomial)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.278 -1.227 1.089 1.135 1.175
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 0.11999 0.08971 1.338 0.181
## x1 -0.16881 0.30854 -0.547 0.584
## x2 -0.08198 0.31476 -0.260 0.795
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 691.35 on 499 degrees of freedom
## Residual deviance: 690.99 on 497 degrees of freedom
## AIC: 696.99
##
## Number of Fisher Scoring iterations: 3
Predictors \(X_1\) and \(X_2\) are not statistically significant predictors for the y variable at the level of significance \(\alpha\)=0.05. This coincides with the non-linear boundry is represented in the plot.
data = data.frame(x1 = x1, x2 = x2, y = y)
prob = predict(fit_lm, data, type = "response")
pred = ifelse(prob > 0.52, 1, 0)
pos = data[pred == 1, ]
neg = data[pred == 0, ]
plot(pos$x1, pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "*")
points(neg$x1, neg$x2, col = "green", pch = 1)
The threshold of 0.52 creates a new linear boundary.
fit_poly = glm(y ~ poly(x1, 2) + log(x2, 2) + I(x1 * x2), data = data, family = binomial)
summary(fit_poly)
##
## Call:
## glm(formula = y ~ poly(x1, 2) + log(x2, 2) + I(x1 * x2), family = binomial,
## data = data)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -4.2453 -0.2280 0.0006 0.1853 1.7565
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.253 1.016 -6.153 7.61e-10 ***
## poly(x1, 2)1 12.537 17.244 0.727 0.467
## poly(x1, 2)2 74.737 11.176 6.688 2.27e-11 ***
## log(x2, 2) -2.881 0.448 -6.432 1.26e-10 ***
## I(x1 * x2) -4.653 8.011 -0.581 0.561
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 333.89 on 240 degrees of freedom
## Residual deviance: 102.21 on 236 degrees of freedom
## (259 observations deleted due to missingness)
## AIC: 112.21
##
## Number of Fisher Scoring iterations: 7
The new model using non-linear predictors resulted in a few predictors being statistically significant. The predictors that are statistically significant are \({X_1}^2\) and the log of \(X_2\).
prob_p = predict(fit_poly, data, type = "response")
pred_p = ifelse(prob_p > 0.5, 1, 0)
pos_p = data[pred_p == 1, ]
neg_p = data[pred_p == 0, ]
plot(pos_p$x1, pos_p$x2, col = "blue", xlab = "X1", ylab = "X2", pch = 1)
points(neg_p$x1, neg_p$x2, col = "red", pch = 1)
The predictions are much better than linear model.The new model resulted in a non-linear boundary which is different than original non-linear boundary.
library(e1071)
svm.fit = svm(as.factor(y) ~ x1 + x2, data, kernel = "linear", cost = 0.1)
svm.pred = predict(svm.fit, data)
pos = data[svm.pred == 1, ]
neg = data[svm.pred == 0, ]
plot(pos$x1, pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "*")
points(neg$x1, neg$x2, col = "red", pch = 3)
The linear kernel creates a small linear boundary with a few values in one of the boundaries but majority of values are classified into one group.
library(e1071)
svm.fit = svm(as.factor(y) ~ x1 + x2, data, gamma=1)
svm.pred = predict(svm.fit, data)
pos = data[svm.pred == 1, ]
neg = data[svm.pred == 0, ]
plot(pos$x1, pos$x2, col = "blue", xlab = "X1", ylab = "X2", pch = "+")
points(neg$x1, neg$x2, col = "red", pch = 4)
The predicted labels of the non-linear boundary is similiar to the original decision boundary.
The experiment perfromed covers the idea of SVMS are important to use for finding non linear models. using cross validation would be easier with the parameter of gamma. Linear logistic regression has a poor effect on nonlinear boundaries, SVM linear kernels work well with small cost, and both nonlinear logistic regression and SVM have good effects on nonlinear boundaries.
library(ISLR)
mileage.median <- median(Auto$mpg)
Auto$mb <- ifelse(Auto$mpg > mileage.median, 1, 0)
Auto$mpglevel = as.factor(Auto$mb)
cost.grid <- c(0.001,0.01, 0.1, 1, 5, 10, 100)
set.seed(1)
tune.res <- tune(svm, mb ~ . - mpg, data = Auto, kernel = 'linear', ranges = list(cost = cost.grid))
summary(tune.res)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 5
##
## - best performance: 0.001749865
##
## - Detailed performance results:
## cost error dispersion
## 1 1e-03 0.094369989 0.0221743982
## 2 1e-02 0.038760453 0.0128111486
## 3 1e-01 0.002059194 0.0003321964
## 4 1e+00 0.001750708 0.0001172866
## 5 5e+00 0.001749865 0.0001196184
## 6 1e+01 0.001749865 0.0001196184
## 7 1e+02 0.001749865 0.0001196184
grid <- c(0.01, 0.1, 1, 5, 10)
set.seed(1)
tune.out <- tune(svm, mb ~ . - mpg, data = Auto, kernel = 'radial', ranges = list(cost = grid), gamma=c(0.01, 0.1, 1, 5, 10, 100))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.002002138
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.287651483 0.0351963527
## 2 0.10 0.078886860 0.0206536281
## 3 1.00 0.017484810 0.0071390321
## 4 5.00 0.002281495 0.0005543150
## 5 10.00 0.002002138 0.0004984504
grid <- c(0.01, 0.1, 1, 5, 10)
set.seed(1)
tune.out <- tune(svm, mb ~ . - mpg, data = Auto, kernel = 'polynomial', ranges = list(cost = grid),degree=c(2,3,4))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 10
##
## - best performance: 0.3026392
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01 0.4981933 0.03934104
## 2 0.10 0.4956527 0.03969561
## 3 1.00 0.4706992 0.04479729
## 4 5.00 0.3777593 0.07172821
## 5 10.00 0.3026392 0.07693095
svm.linear = svm(mpglevel ~ ., data = Auto, kernel = "linear", cost = 1)
svm.poly = svm(mpglevel ~ ., data = Auto, kernel = "polynomial", cost = 100, degree = 2)
svm.radial = svm(mpglevel ~ ., data = Auto, kernel = "radial", cost = 100, gamma = 0.01)
plotpairs = function(fit) {
for (name in names(Auto)[!(names(Auto) %in% c("mpg", "mpglevel", "name"))]) {
plot(fit, Auto, as.formula(paste("mpg~", name, sep = "")))
}
}
plotpairs(svm.linear)
plotpairs(svm.poly)
plotpairs(svm.radial)
set.seed(1)
s = sample(dim(OJ)[1], 800)
train = OJ[s, ]
test = OJ[-s, ]
library(e1071)
svm.linear = svm(Purchase ~ ., kernel = "linear", data = train, cost = 0.01)
summary(svm.linear)
##
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.01)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.01
##
## Number of Support Vectors: 435
##
## ( 219 216 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The support vector classifer creates 435 support vectors out of the 800 observations available in the dataset. Citrus Hill and Minute Maid are evenly split amongst the support vectors with 219 and 216 each.
train.pred = predict(svm.linear, train)
mean(train.pred != train$Purchase)
## [1] 0.175
test.pred = predict(svm.linear, test)
mean(test.pred != test$Purchase)
## [1] 0.1777778
The test error rate is approximately 17.8%.
set.seed(1)
tune.out = tune(svm, Purchase ~ ., data = train, kernel = "linear", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 3.162278
##
## - best performance: 0.16875
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.17625 0.02853482
## 2 0.01778279 0.17625 0.03143004
## 3 0.03162278 0.17125 0.02829041
## 4 0.05623413 0.17625 0.02853482
## 5 0.10000000 0.17250 0.03162278
## 6 0.17782794 0.17125 0.02829041
## 7 0.31622777 0.17125 0.02889757
## 8 0.56234133 0.17125 0.02703521
## 9 1.00000000 0.17500 0.02946278
## 10 1.77827941 0.17375 0.02729087
## 11 3.16227766 0.16875 0.03019037
## 12 5.62341325 0.17375 0.03304563
## 13 10.00000000 0.17375 0.03197764
The tuning selected cost=10 as the most optimal.
svm.linear = svm(Purchase ~ ., kernel = "linear", data = train, cost = tune.out$best.parameters$cost)
train.pred = predict(svm.linear, train)
table(train$Purchase, train.pred)
## train.pred
## CH MM
## CH 423 62
## MM 70 245
test.pred = predict(svm.linear, test)
mean(test.pred != test$Purchase)
## [1] 0.1518519
The test error rate decreased with the use of the optimal cost to approximately 15.19%.
set.seed(1)
svm.radial = svm(Purchase ~ ., data = train, kernel = "radial")
summary(svm.radial)
##
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "radial")
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 373
##
## ( 188 185 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The radial basis kernel with default gamma creates 373 support vectors out of the 800 observations available in the dataset. Citrus Hill and Minute Maid are split into 188 and 185 vectors respectively.
train.pred = predict(svm.radial, train)
table(train$Purchase, train.pred)
## train.pred
## CH MM
## CH 441 44
## MM 77 238
test.pred = predict(svm.radial, test)
mean(test.pred != test$Purchase)
## [1] 0.1851852
The test error for the radial basis kernel is approximately 18.52%.
set.seed(1)
tune.out = tune(svm, Purchase ~ ., data = train, kernel = "radial", ranges = list(cost = 10^seq(-2, 1, by = 0.25)))
summary(tune.out)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 0.5623413
##
## - best performance: 0.16875
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.39375 0.04007372
## 2 0.01778279 0.39375 0.04007372
## 3 0.03162278 0.35750 0.05927806
## 4 0.05623413 0.19500 0.02443813
## 5 0.10000000 0.18625 0.02853482
## 6 0.17782794 0.18250 0.03291403
## 7 0.31622777 0.17875 0.03230175
## 8 0.56234133 0.16875 0.02651650
## 9 1.00000000 0.17125 0.02128673
## 10 1.77827941 0.17625 0.02079162
## 11 3.16227766 0.17750 0.02266912
## 12 5.62341325 0.18000 0.02220485
## 13 10.00000000 0.18625 0.02853482
The tuning selected cost=0.5623413 as the most optimal.
svm.radial = svm(Purchase ~ ., data = train, kernel = "radial", cost = tune.out$best.parameters$cost)
train.pred = predict(svm.radial, train)
table(train$Purchase, train.pred)
## train.pred
## CH MM
## CH 437 48
## MM 71 244
test.pred<-predict(svm.radial, test)
mean(test.pred != test$Purchase)
## [1] 0.1777778
The test error rate for the basis radial kernel with the most optimal cost is approximately 17.78%; this is an increase from the default gamma.
svm.poly<-svm(Purchase~ ., kernel='polynomial', data=train, degree=2 )
summary(svm.poly)
##
## Call:
## svm(formula = Purchase ~ ., data = train, kernel = "polynomial",
## degree = 2)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: polynomial
## cost: 1
## degree: 2
## coef.0: 0
##
## Number of Support Vectors: 447
##
## ( 225 222 )
##
##
## Number of Classes: 2
##
## Levels:
## CH MM
The polynomial kernel with two degrees create 447 support vectors. 225 vectors belong to Citrus Hill, while 222 vectors belong to Minute Maid.
train.pred<-predict(svm.poly, train)
table(train$Purchase, train.pred)
## train.pred
## CH MM
## CH 449 36
## MM 110 205
test.pred<-predict(svm.poly, test)
mean(test.pred != test$Purchase)
## [1] 0.2222222
The test error for the polynomial kernels with two degrees is approximately 22.22%.
tunepoly<-tune(svm, Purchase ~ ., data=train, kernel='polynomial', degree=2, ranges=list(cost=10^seq(-2,1, by=.5)))
summary(tunepoly)
##
## Parameter tuning of 'svm':
##
## - sampling method: 10-fold cross validation
##
## - best parameters:
## cost
## 3.162278
##
## - best performance: 0.18375
##
## - Detailed performance results:
## cost error dispersion
## 1 0.01000000 0.39125 0.05239076
## 2 0.03162278 0.36125 0.05993341
## 3 0.10000000 0.32125 0.05001736
## 4 0.31622777 0.20250 0.04031129
## 5 1.00000000 0.20000 0.04249183
## 6 3.16227766 0.18375 0.03682259
## 7 10.00000000 0.18875 0.04143687
The tuning selected cost=3.162278 as the most optimal.
svm.poly<-svm(Purchase ~ ., kernel='polynomial', data=train, degree=2, cost=tunepoly$best.parameters$cost)
train.pred<-predict(svm.poly, train)
table(train$Purchase, train.pred)
## train.pred
## CH MM
## CH 451 34
## MM 90 225
test.pred<-predict(svm.poly, test)
mean(test.pred != test$Purchase)
## [1] 0.2037037
The test error for the polynomial kernel with the most optimal cost is ~14.4%; this is a significant reduction of error compared to the original polynomial kernel.
Overall the approach that gives the best results on this data is basis radial kernel with the default gamma. This approach resulted in the lowest misclassification rate for both the training and test datasets.