We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:
x1=runif (500) -0.5
x2=runif (500) -0.5
y=1*(x1^2-x2^2 > 0)
str(df)
'data.frame': 500 obs. of 3 variables:
$ x1: num -0.212 0.288 -0.091 0.383 0.44 ...
$ x2: num -0.146 -0.134 -0.213 -0.42 -0.135 ...
$ y : num 1 1 0 0 1 1 0 1 0 0 ...
(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.
library(ggplot2)
ggplot(df,aes(x1, x2, color = y)) +
geom_point(alpha = 1)
(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.
set.seed(123)
model_lr <- glm(y ~.,family='binomial',data=df)
summary(model_lr)
Call:
glm(formula = y ~ ., family = "binomial", data = df)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.227 -1.200 1.133 1.157 1.188
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.04792 0.08949 0.535 0.592
x1 -0.03999 0.31516 -0.127 0.899
x2 0.11509 0.30829 0.373 0.709
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 692.86 on 499 degrees of freedom
Residual deviance: 692.71 on 497 degrees of freedom
AIC: 698.71
Number of Fisher Scoring iterations: 3
(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.
attach(df)
preds_lr <- predict(model_lr,df, type = "response")
plot(x1,x2, col = ifelse(preds_lr>=0.5,'red','blue'),
pch = 19,main = "Predicted obs. with linear decision boundary")
The above plot with the predicted observations from logistic regression has got a linear decision boundary and we can see that.
(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).
set.seed(123)
model_lr2 <- glm(y ~ I(x1*x2)+poly(x2,2)+poly(x1,3),family='binomial',data=df)
summary(model_lr2)
(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.
preds_lr2 <- predict(model_lr2, df, type = "response")
plot(x1,x2, col = ifelse(preds_lr2>=0.5,'#2A363B','#FF847C'),
pch = 19,main = "Predicted obs. with linear decision boundary")
(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
e107
Error: object 'e107' not found
plot(x1,x2, col = ifelse(svm.preds!=0,'#8482b3','#82b2b3'),
pch = 19,main = "Predicted obs. with svm")
(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.
set.seed(123)
svmfit2=svm(y~., data=df, kernel ="radial", gamma=1, cost=1)
plot(x1,x2, col = ifelse(svm.preds2!=0,'#7b0749','#e7b5d2'),
pch = 19,main = "Predicted obs. with non-linear kernel")
(i) Comment on your results.
library(caret)
preds_lr <- as.factor(ifelse(preds_lr >= 0.5,1,0))
confusionMatrix(preds_lr,df$y)
Confusion Matrix and Statistics
Reference
Prediction 0 1
0 40 9
1 204 247
Accuracy : 0.574
95% CI : (0.5293, 0.6178)
No Information Rate : 0.512
P-Value [Acc > NIR] : 0.003128
Kappa : 0.1312
Mcnemar's Test P-Value : < 2.2e-16
Sensitivity : 0.1639
Specificity : 0.9648
Pos Pred Value : 0.8163
Neg Pred Value : 0.5477
Prevalence : 0.4880
Detection Rate : 0.0800
Detection Prevalence : 0.0980
Balanced Accuracy : 0.5644
'Positive' Class : 0
preds_lr2 <- as.factor(ifelse(preds_lr2 >= 0.5,1,0))
confusionMatrix(preds_lr2,df$y)
confusionMatrix(svm.preds,df$y)
confusionMatrix(svm.preds2, df$y)
From the confusion matrices of all the models built above we can see that the accuarcies of the Support vector machine models are higher. Also logistic Regression with non-linear functions of the predictors shows an accuracy of 100%.
In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto
data set.
(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.
library(ISLR)
data <- Auto
str(data)
'data.frame': 392 obs. of 9 variables:
$ mpg : num 18 15 18 16 17 15 14 14 14 15 ...
$ cylinders : num 8 8 8 8 8 8 8 8 8 8 ...
$ displacement: num 307 350 318 304 302 429 454 440 455 390 ...
$ horsepower : num 130 165 150 150 140 198 220 215 225 190 ...
$ weight : num 3504 3693 3436 3433 3449 ...
$ acceleration: num 12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
$ year : num 70 70 70 70 70 70 70 70 70 70 ...
$ origin : num 1 1 1 1 1 1 1 1 1 1 ...
$ name : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
summary(data) # 22.75
mpg cylinders displacement horsepower weight
Min. : 9.00 Min. :3.000 Min. : 68.0 Min. : 46.0 Min. :1613
1st Qu.:17.00 1st Qu.:4.000 1st Qu.:105.0 1st Qu.: 75.0 1st Qu.:2225
Median :22.75 Median :4.000 Median :151.0 Median : 93.5 Median :2804
Mean :23.45 Mean :5.472 Mean :194.4 Mean :104.5 Mean :2978
3rd Qu.:29.00 3rd Qu.:8.000 3rd Qu.:275.8 3rd Qu.:126.0 3rd Qu.:3615
Max. :46.60 Max. :8.000 Max. :455.0 Max. :230.0 Max. :5140
acceleration year origin name
Min. : 8.00 Min. :70.00 Min. :1.000 amc matador : 5
1st Qu.:13.78 1st Qu.:73.00 1st Qu.:1.000 ford pinto : 5
Median :15.50 Median :76.00 Median :1.000 toyota corolla : 5
Mean :15.54 Mean :75.98 Mean :1.577 amc gremlin : 4
3rd Qu.:17.02 3rd Qu.:79.00 3rd Qu.:2.000 amc hornet : 4
Max. :24.80 Max. :82.00 Max. :3.000 chevrolet chevette: 4
(Other) :365
From the output the median value of the mpg column is 22.75
Now we will create a binary variable which takes value for cars with gas mileage above the median and a 0 for cars with gas mileage below the median
data$class <- as.factor(ifelse(data$mpg > 22.75,1,0))
(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.
set.seed(123)
tune.out=tune(svm, class~., data=data, kernel = 'linear',ranges=list(cost=c(0.1,1,10,100,1000),
gamma=c(0.5,1,2,3,4)))
summary(tune.out)
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
- best performance: 0.01025641
- Detailed performance results:
NA
The best cost parameter we can use from the 10 fold cross validations is 1 with a 0.5 gamma value. the minimum error recorded was 0.07392109
. The cross validation errors for each combination of gamma and cost values are shown in the output above. We used the tune() to perform cross validation here.
(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.
set.seed(123)
tune.out.radial=tune(svm, class~., data=data, kernel ="radial",
ranges=list(cost=c(0.1,1,10,100,1000),
gamma=c(0.5,1,2,3,4)))
summary(tune.out.radial)
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
- best performance: 0.04576923
- Detailed performance results:
NA
set.seed(123)
tune.out.polynomial=tune(svm, class~., data=data, kernel ="polynomial",
ranges=list(cost=c(0.1,1,10,100,1000),
gamma=c(0.5,1,2,3,4)))
summary(tune.out.polynomial)
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
- best performance: 0.03570513
- Detailed performance results:
tune.out.polynomial$best.model
Call:
best.tune(method = svm, train.x = class ~ ., data = data, ranges = list(cost = c(0.1,
1, 10, 100, 1000), gamma = c(0.5, 1, 2, 3, 4)), kernel = "polynomial")
Parameters:
SVM-Type: C-classification
SVM-Kernel: polynomial
cost: 1
degree: 3
coef.0: 0
Number of Support Vectors: 83
SVM with radial kernel:
When I performed the SVM classification with radial kernel the best parameters was cost = 1 and gamma = 0.5. And the lowest error recorded was 0.04837794
.
SVM with polynomial kernel:
When I performed the SVM classification with polynomial kernel the best parameters was cost = 0.1 and gamma = 0.5. And the lowest error recorded was 0.0961884
.
So for the linear and the radial kernels the cost and the gamma values chosen are 1 and 0.5 with errors 0.07392109 and 0.04837794. While the polynomial kernel error was 0.0961884. which is bigger than the linear and radial kernel classification. The best model among the three is the model with the minimum error which is radial model with an error of 0.04837794
(d) Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit , dat , x1∼x4)
# creation of a function to create plots with all the variables one by one
plotpairs = function(svmfit){
for(name in names(data)[!(names(data) %in% c("mpg","class","name"))]){
plot(svmfit,data,as.formula(paste("mpg",name,sep = "~")))
}
}
# models with their kerenels and best parameters
set.seed(123)
svm.linear = svm(class~., data=data, kernal="linear", cost=1,gamma = 0.5)
svm.radial = svm(class~., data=data, kernal="radial", cost=1, gamma=0.5)
svm.polynomial = svm(class~., data=data, kernal="polynomial", cost=0.1, degree=3,
gamma = 0.5)
# plotting to back up my assertions
plotpairs(svm.linear)
plotpairs(svm.radial)
NA
plotpairs(svm.polynomial)
This problem involves the OJ data set which is part of the ISLR package.
(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.
oj <- OJ
str(oj)
'data.frame': 1070 obs. of 18 variables:
$ Purchase : Factor w/ 2 levels "CH","MM": 1 1 1 2 1 1 1 1 1 1 ...
$ WeekofPurchase: num 237 239 245 227 228 230 232 234 235 238 ...
$ StoreID : num 1 1 1 1 7 7 7 7 7 7 ...
$ PriceCH : num 1.75 1.75 1.86 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
$ PriceMM : num 1.99 1.99 2.09 1.69 1.69 1.99 1.99 1.99 1.99 1.99 ...
$ DiscCH : num 0 0 0.17 0 0 0 0 0 0 0 ...
$ DiscMM : num 0 0.3 0 0 0 0 0.4 0.4 0.4 0.4 ...
$ SpecialCH : num 0 0 0 0 0 0 1 1 0 0 ...
$ SpecialMM : num 0 1 0 0 0 1 1 0 0 0 ...
$ LoyalCH : num 0.5 0.6 0.68 0.4 0.957 ...
$ SalePriceMM : num 1.99 1.69 2.09 1.69 1.69 1.99 1.59 1.59 1.59 1.59 ...
$ SalePriceCH : num 1.75 1.75 1.69 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
$ PriceDiff : num 0.24 -0.06 0.4 0 0 0.3 -0.1 -0.16 -0.16 -0.16 ...
$ Store7 : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
$ PctDiscMM : num 0 0.151 0 0 0 ...
$ PctDiscCH : num 0 0 0.0914 0 0 ...
$ ListPriceDiff : num 0.24 0.24 0.23 0 0 0.3 0.3 0.24 0.24 0.24 ...
$ STORE : num 1 1 1 1 0 0 0 0 0 0 ...
set.seed(123)
intrain <- createDataPartition(oj$Purchase, p = 0.746, list = FALSE)
train <- oj[intrain,]
test <- oj[-intrain,]
(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.
set.seed(123)
svmfit=svm(Purchase~., kernel = "linear",data=train,cost=0.01)
summary(svmfit)
Call:
svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.01)
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 0.01
Number of Support Vectors: 431
( 216 215 )
Number of Classes: 2
Levels:
CH MM
The svm classifier used the kernel linear one and got 627 support vectors. For this model fit I used all the explanatory variables with Purchase as the target variable.
# predicting on the test data
pred=predict(svmfit,newdata =test)
confusionMatrix(pred,as.factor(test$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 136 25
MM 29 80
Accuracy : 0.8
95% CI : (0.7472, 0.846)
No Information Rate : 0.6111
P-Value [Acc > NIR] : 2.106e-11
Kappa : 0.5821
Mcnemar's Test P-Value : 0.6831
Sensitivity : 0.8242
Specificity : 0.7619
Pos Pred Value : 0.8447
Neg Pred Value : 0.7339
Prevalence : 0.6111
Detection Rate : 0.5037
Detection Prevalence : 0.5963
Balanced Accuracy : 0.7931
'Positive' Class : CH
# Predicting on the train data
pred.train=predict(svmfit,newdata =train)
confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 433 75
MM 55 237
Accuracy : 0.8375
95% CI : (0.8101, 0.8624)
No Information Rate : 0.61
P-Value [Acc > NIR] : < 2e-16
Kappa : 0.6545
Mcnemar's Test P-Value : 0.09563
Sensitivity : 0.8873
Specificity : 0.7596
Pos Pred Value : 0.8524
Neg Pred Value : 0.8116
Prevalence : 0.6100
Detection Rate : 0.5413
Detection Prevalence : 0.6350
Balanced Accuracy : 0.8235
'Positive' Class : CH
From the confusion matrices we got the testing and training error rates are 100-80 = 20
and 100-83.75 = 16.25
respectively. The svm preformed well balancing both the sensitivity and specificity scores.
(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.
set.seed(123)
new_tune=tune(svm, Purchase~., data=train, kernel ="linear",
ranges=list(cost=c(0.01,0.05,0.1,0.5,1,1.5,10)))
summary(new_tune)
Parameter tuning of ‘svm’:
- sampling method: 10-fold cross validation
- best parameters:
- best performance: 0.16625
- Detailed performance results:
new_tune$best.model
Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01,
0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "linear")
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 0.05
Number of Support Vectors: 356
The lowest error was 0.16625 with a best cost parameter value 0.05
(e) Compute the training and test error rates using this new value for cost.
set.seed(123)
svmfit.new=svm(Purchase~., kernel = "linear",data=train,cost=0.05)
summary(svmfit.new)
Call:
svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.05)
Parameters:
SVM-Type: C-classification
SVM-Kernel: linear
cost: 0.05
Number of Support Vectors: 356
( 179 177 )
Number of Classes: 2
Levels:
CH MM
# predicting on the test data
pred.new=predict(svmfit.new,newdata =test)
confusionMatrix(pred.new,as.factor(test$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 136 21
MM 29 84
Accuracy : 0.8148
95% CI : (0.7633, 0.8593)
No Information Rate : 0.6111
P-Value [Acc > NIR] : 4.049e-13
Kappa : 0.6157
Mcnemar's Test P-Value : 0.3222
Sensitivity : 0.8242
Specificity : 0.8000
Pos Pred Value : 0.8662
Neg Pred Value : 0.7434
Prevalence : 0.6111
Detection Rate : 0.5037
Detection Prevalence : 0.5815
Balanced Accuracy : 0.8121
'Positive' Class : CH
# Predicting on the train data
pred.new2=predict(svmfit.new,newdata =train)
confusionMatrix(pred.new2,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 433 75
MM 55 237
Accuracy : 0.8375
95% CI : (0.8101, 0.8624)
No Information Rate : 0.61
P-Value [Acc > NIR] : < 2e-16
Kappa : 0.6545
Mcnemar's Test P-Value : 0.09563
Sensitivity : 0.8873
Specificity : 0.7596
Pos Pred Value : 0.8524
Neg Pred Value : 0.8116
Prevalence : 0.6100
Detection Rate : 0.5413
Detection Prevalence : 0.6350
Balanced Accuracy : 0.8235
'Positive' Class : CH
(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.
confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 488 312
MM 0 0
Accuracy : 0.61
95% CI : (0.5752, 0.644)
No Information Rate : 0.61
P-Value [Acc > NIR] : 0.5155
Kappa : 0
Mcnemar's Test P-Value : <2e-16
Sensitivity : 1.00
Specificity : 0.00
Pos Pred Value : 0.61
Neg Pred Value : NaN
Prevalence : 0.61
Detection Rate : 0.61
Detection Prevalence : 1.00
Balanced Accuracy : 0.50
'Positive' Class : CH
Tuning the model
tune.radial$best.model
Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01,
0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "radial")
Parameters:
SVM-Type: C-classification
SVM-Kernel: radial
cost: 1
Number of Support Vectors: 365
confusionMatrix(pred.radialtrain,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 446 76
MM 42 236
Accuracy : 0.8525
95% CI : (0.826, 0.8764)
No Information Rate : 0.61
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6838
Mcnemar's Test P-Value : 0.002382
Sensitivity : 0.9139
Specificity : 0.7564
Pos Pred Value : 0.8544
Neg Pred Value : 0.8489
Prevalence : 0.6100
Detection Rate : 0.5575
Detection Prevalence : 0.6525
Balanced Accuracy : 0.8352
'Positive' Class : CH
(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2
confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 488 312
MM 0 0
Accuracy : 0.61
95% CI : (0.5752, 0.644)
No Information Rate : 0.61
P-Value [Acc > NIR] : 0.5155
Kappa : 0
Mcnemar's Test P-Value : <2e-16
Sensitivity : 1.00
Specificity : 0.00
Pos Pred Value : 0.61
Neg Pred Value : NaN
Prevalence : 0.61
Detection Rate : 0.61
Detection Prevalence : 1.00
Balanced Accuracy : 0.50
'Positive' Class : CH
Tuning the model
tune.poly$best.model
Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01,
0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "polynomial")
Parameters:
SVM-Type: C-classification
SVM-Kernel: polynomial
cost: 1.5
degree: 3
coef.0: 0
Number of Support Vectors: 391
set.seed(123)
svmfit.polyrefit=svm(Purchase~., kernel = "polynomial",data=train,cost=1.5, degree = 3)
summary(svmfit.polyrefit)
Call:
svm(formula = Purchase ~ ., data = train, kernel = "polynomial", cost = 1.5,
degree = 3)
Parameters:
SVM-Type: C-classification
SVM-Kernel: polynomial
cost: 1.5
degree: 3
coef.0: 0
Number of Support Vectors: 391
( 196 195 )
Number of Classes: 2
Levels:
CH MM
# predicting on the test data
pred.polytest=predict(svmfit.polyrefit,newdata =test)
confusionMatrix(pred.polytest,as.factor(test$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 140 30
MM 25 75
Accuracy : 0.7963
95% CI : (0.7433, 0.8427)
No Information Rate : 0.6111
P-Value [Acc > NIR] : 5.34e-11
Kappa : 0.5677
Mcnemar's Test P-Value : 0.5896
Sensitivity : 0.8485
Specificity : 0.7143
Pos Pred Value : 0.8235
Neg Pred Value : 0.7500
Prevalence : 0.6111
Detection Rate : 0.5185
Detection Prevalence : 0.6296
Balanced Accuracy : 0.7814
'Positive' Class : CH
# Predicting on the train data
pred.polytrain=predict(svmfit.polyrefit,newdata =train)
confusionMatrix(pred.polytrain,as.factor(train$Purchase))
Confusion Matrix and Statistics
Reference
Prediction CH MM
CH 458 96
MM 30 216
Accuracy : 0.8425
95% CI : (0.8154, 0.8671)
No Information Rate : 0.61
P-Value [Acc > NIR] : < 2.2e-16
Kappa : 0.6559
Mcnemar's Test P-Value : 7.011e-09
Sensitivity : 0.9385
Specificity : 0.6923
Pos Pred Value : 0.8267
Neg Pred Value : 0.8780
Prevalence : 0.6100
Detection Rate : 0.5725
Detection Prevalence : 0.6925
Balanced Accuracy : 0.8154
'Positive' Class : CH
(h) Overall, which approach seems to give the best results on this data?
I am creating a dataframe of test and train accuracies with the used models below. I subtracted the accuracy from 100% to get the error since total of accuracy + error = 100%. We can also find the classification error rate by just counting the total number of false positives and false negatives from the table in the confusion matrix and divide it by the total observations.
We can see that the svm classifier with a radial kernel has got the minimum error among all the kernels. I used the coast =1 after tuning and finding out the optimal parameter.