Problem 5

We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.
(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:

x1=runif (500) -0.5
x2=runif (500) -0.5
y=1*(x1^2-x2^2 > 0)
str(df)
'data.frame':   500 obs. of  3 variables:
 $ x1: num  -0.212 0.288 -0.091 0.383 0.44 ...
 $ x2: num  -0.146 -0.134 -0.213 -0.42 -0.135 ...
 $ y : num  1 1 0 0 1 1 0 1 0 0 ...

(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.

library(ggplot2)
ggplot(df,aes(x1, x2, color = y)) +
  geom_point(alpha = 1)

(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.

set.seed(123) 
model_lr <- glm(y ~.,family='binomial',data=df)
summary(model_lr)

Call:
glm(formula = y ~ ., family = "binomial", data = df)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-1.227  -1.200   1.133   1.157   1.188  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.04792    0.08949   0.535    0.592
x1          -0.03999    0.31516  -0.127    0.899
x2           0.11509    0.30829   0.373    0.709

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 692.86  on 499  degrees of freedom
Residual deviance: 692.71  on 497  degrees of freedom
AIC: 698.71

Number of Fisher Scoring iterations: 3

(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.

attach(df)
preds_lr <- predict(model_lr,df, type = "response")

plot(x1,x2, col = ifelse(preds_lr>=0.5,'red','blue'), 
     pch = 19,main = "Predicted obs. with linear decision boundary")

The above plot with the predicted observations from logistic regression has got a linear decision boundary and we can see that.

(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).

set.seed(123) 
model_lr2 <- glm(y ~ I(x1*x2)+poly(x2,2)+poly(x1,3),family='binomial',data=df)
summary(model_lr2)

(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.

preds_lr2 <- predict(model_lr2, df, type = "response")
plot(x1,x2, col = ifelse(preds_lr2>=0.5,'#2A363B','#FF847C'), 
     pch = 19,main = "Predicted obs. with linear decision boundary")

(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

e107
Error: object 'e107' not found
plot(x1,x2, col = ifelse(svm.preds!=0,'#8482b3','#82b2b3'), 
     pch = 19,main = "Predicted obs. with svm")

(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.

set.seed(123)
svmfit2=svm(y~., data=df, kernel ="radial", gamma=1, cost=1)
plot(x1,x2, col = ifelse(svm.preds2!=0,'#7b0749','#e7b5d2'), 
     pch = 19,main = "Predicted obs. with non-linear kernel")

(i) Comment on your results.

library(caret)
preds_lr <- as.factor(ifelse(preds_lr >= 0.5,1,0))
confusionMatrix(preds_lr,df$y)
Confusion Matrix and Statistics

          Reference
Prediction   0   1
         0  40   9
         1 204 247
                                          
               Accuracy : 0.574           
                 95% CI : (0.5293, 0.6178)
    No Information Rate : 0.512           
    P-Value [Acc > NIR] : 0.003128        
                                          
                  Kappa : 0.1312          
                                          
 Mcnemar's Test P-Value : < 2.2e-16       
                                          
            Sensitivity : 0.1639          
            Specificity : 0.9648          
         Pos Pred Value : 0.8163          
         Neg Pred Value : 0.5477          
             Prevalence : 0.4880          
         Detection Rate : 0.0800          
   Detection Prevalence : 0.0980          
      Balanced Accuracy : 0.5644          
                                          
       'Positive' Class : 0               
                                          
preds_lr2 <- as.factor(ifelse(preds_lr2 >= 0.5,1,0))
confusionMatrix(preds_lr2,df$y)
confusionMatrix(svm.preds,df$y)
confusionMatrix(svm.preds2, df$y)

From the confusion matrices of all the models built above we can see that the accuarcies of the Support vector machine models are higher. Also logistic Regression with non-linear functions of the predictors shows an accuracy of 100%.

Problem 7

In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the Auto data set.
(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

library(ISLR)
data <- Auto
str(data)
'data.frame':   392 obs. of  9 variables:
 $ mpg         : num  18 15 18 16 17 15 14 14 14 15 ...
 $ cylinders   : num  8 8 8 8 8 8 8 8 8 8 ...
 $ displacement: num  307 350 318 304 302 429 454 440 455 390 ...
 $ horsepower  : num  130 165 150 150 140 198 220 215 225 190 ...
 $ weight      : num  3504 3693 3436 3433 3449 ...
 $ acceleration: num  12 11.5 11 12 10.5 10 9 8.5 10 8.5 ...
 $ year        : num  70 70 70 70 70 70 70 70 70 70 ...
 $ origin      : num  1 1 1 1 1 1 1 1 1 1 ...
 $ name        : Factor w/ 304 levels "amc ambassador brougham",..: 49 36 231 14 161 141 54 223 241 2 ...
summary(data)   # 22.75
      mpg          cylinders      displacement     horsepower        weight    
 Min.   : 9.00   Min.   :3.000   Min.   : 68.0   Min.   : 46.0   Min.   :1613  
 1st Qu.:17.00   1st Qu.:4.000   1st Qu.:105.0   1st Qu.: 75.0   1st Qu.:2225  
 Median :22.75   Median :4.000   Median :151.0   Median : 93.5   Median :2804  
 Mean   :23.45   Mean   :5.472   Mean   :194.4   Mean   :104.5   Mean   :2978  
 3rd Qu.:29.00   3rd Qu.:8.000   3rd Qu.:275.8   3rd Qu.:126.0   3rd Qu.:3615  
 Max.   :46.60   Max.   :8.000   Max.   :455.0   Max.   :230.0   Max.   :5140  
                                                                               
  acceleration        year           origin                      name    
 Min.   : 8.00   Min.   :70.00   Min.   :1.000   amc matador       :  5  
 1st Qu.:13.78   1st Qu.:73.00   1st Qu.:1.000   ford pinto        :  5  
 Median :15.50   Median :76.00   Median :1.000   toyota corolla    :  5  
 Mean   :15.54   Mean   :75.98   Mean   :1.577   amc gremlin       :  4  
 3rd Qu.:17.02   3rd Qu.:79.00   3rd Qu.:2.000   amc hornet        :  4  
 Max.   :24.80   Max.   :82.00   Max.   :3.000   chevrolet chevette:  4  
                                                 (Other)           :365  

From the output the median value of the mpg column is 22.75
Now we will create a binary variable which takes value for cars with gas mileage above the median and a 0 for cars with gas mileage below the median

data$class <- as.factor(ifelse(data$mpg > 22.75,1,0))

(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

set.seed(123)
tune.out=tune(svm, class~., data=data, kernel = 'linear',ranges=list(cost=c(0.1,1,10,100,1000),
                                                   gamma=c(0.5,1,2,3,4)))
summary(tune.out)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:

- best performance: 0.01025641 

- Detailed performance results:
NA

The best cost parameter we can use from the 10 fold cross validations is 1 with a 0.5 gamma value. the minimum error recorded was 0.07392109. The cross validation errors for each combination of gamma and cost values are shown in the output above. We used the tune() to perform cross validation here.

(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

set.seed(123)
tune.out.radial=tune(svm, class~., data=data, kernel ="radial",
              ranges=list(cost=c(0.1,1,10,100,1000),
                          gamma=c(0.5,1,2,3,4)))
summary(tune.out.radial)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:

- best performance: 0.04576923 

- Detailed performance results:
NA
set.seed(123)
tune.out.polynomial=tune(svm, class~., data=data, kernel ="polynomial",
              ranges=list(cost=c(0.1,1,10,100,1000),
                          gamma=c(0.5,1,2,3,4)))
summary(tune.out.polynomial)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:

- best performance: 0.03570513 

- Detailed performance results:
tune.out.polynomial$best.model

Call:
best.tune(method = svm, train.x = class ~ ., data = data, ranges = list(cost = c(0.1, 
    1, 10, 100, 1000), gamma = c(0.5, 1, 2, 3, 4)), kernel = "polynomial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1 
     degree:  3 
     coef.0:  0 

Number of Support Vectors:  83

SVM with radial kernel:
When I performed the SVM classification with radial kernel the best parameters was cost = 1 and gamma = 0.5. And the lowest error recorded was 0.04837794.
SVM with polynomial kernel:
When I performed the SVM classification with polynomial kernel the best parameters was cost = 0.1 and gamma = 0.5. And the lowest error recorded was 0.0961884.

So for the linear and the radial kernels the cost and the gamma values chosen are 1 and 0.5 with errors 0.07392109 and 0.04837794. While the polynomial kernel error was 0.0961884. which is bigger than the linear and radial kernel classification. The best model among the three is the model with the minimum error which is radial model with an error of 0.04837794

(d) Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit , dat , x1∼x4)

# creation of a function to create plots with all the variables one by one
plotpairs = function(svmfit){
  for(name in names(data)[!(names(data) %in% c("mpg","class","name"))]){
    plot(svmfit,data,as.formula(paste("mpg",name,sep = "~")))
  }
}
# models with their kerenels and best parameters
set.seed(123)
svm.linear = svm(class~., data=data, kernal="linear", cost=1,gamma = 0.5)
svm.radial = svm(class~., data=data, kernal="radial", cost=1, gamma=0.5)
svm.polynomial = svm(class~., data=data, kernal="polynomial", cost=0.1, degree=3, 
                     gamma = 0.5)
# plotting to back up my assertions
plotpairs(svm.linear)

plotpairs(svm.radial)

NA

plotpairs(svm.polynomial)

Problem 8

This problem involves the OJ data set which is part of the ISLR package.
(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

oj <- OJ
str(oj)
'data.frame':   1070 obs. of  18 variables:
 $ Purchase      : Factor w/ 2 levels "CH","MM": 1 1 1 2 1 1 1 1 1 1 ...
 $ WeekofPurchase: num  237 239 245 227 228 230 232 234 235 238 ...
 $ StoreID       : num  1 1 1 1 7 7 7 7 7 7 ...
 $ PriceCH       : num  1.75 1.75 1.86 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
 $ PriceMM       : num  1.99 1.99 2.09 1.69 1.69 1.99 1.99 1.99 1.99 1.99 ...
 $ DiscCH        : num  0 0 0.17 0 0 0 0 0 0 0 ...
 $ DiscMM        : num  0 0.3 0 0 0 0 0.4 0.4 0.4 0.4 ...
 $ SpecialCH     : num  0 0 0 0 0 0 1 1 0 0 ...
 $ SpecialMM     : num  0 1 0 0 0 1 1 0 0 0 ...
 $ LoyalCH       : num  0.5 0.6 0.68 0.4 0.957 ...
 $ SalePriceMM   : num  1.99 1.69 2.09 1.69 1.69 1.99 1.59 1.59 1.59 1.59 ...
 $ SalePriceCH   : num  1.75 1.75 1.69 1.69 1.69 1.69 1.69 1.75 1.75 1.75 ...
 $ PriceDiff     : num  0.24 -0.06 0.4 0 0 0.3 -0.1 -0.16 -0.16 -0.16 ...
 $ Store7        : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 2 2 2 2 2 ...
 $ PctDiscMM     : num  0 0.151 0 0 0 ...
 $ PctDiscCH     : num  0 0 0.0914 0 0 ...
 $ ListPriceDiff : num  0.24 0.24 0.23 0 0 0.3 0.3 0.24 0.24 0.24 ...
 $ STORE         : num  1 1 1 1 0 0 0 0 0 0 ...
set.seed(123)
intrain <- createDataPartition(oj$Purchase, p = 0.746, list = FALSE)
train <- oj[intrain,]
test <- oj[-intrain,]

(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.

set.seed(123)
svmfit=svm(Purchase~., kernel = "linear",data=train,cost=0.01)
summary(svmfit)

Call:
svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.01)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  0.01 

Number of Support Vectors:  431

 ( 216 215 )


Number of Classes:  2 

Levels: 
 CH MM

The svm classifier used the kernel linear one and got 627 support vectors. For this model fit I used all the explanatory variables with Purchase as the target variable.

# predicting on the test data
pred=predict(svmfit,newdata =test)
confusionMatrix(pred,as.factor(test$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 136  25
        MM  29  80
                                         
               Accuracy : 0.8            
                 95% CI : (0.7472, 0.846)
    No Information Rate : 0.6111         
    P-Value [Acc > NIR] : 2.106e-11      
                                         
                  Kappa : 0.5821         
                                         
 Mcnemar's Test P-Value : 0.6831         
                                         
            Sensitivity : 0.8242         
            Specificity : 0.7619         
         Pos Pred Value : 0.8447         
         Neg Pred Value : 0.7339         
             Prevalence : 0.6111         
         Detection Rate : 0.5037         
   Detection Prevalence : 0.5963         
      Balanced Accuracy : 0.7931         
                                         
       'Positive' Class : CH             
                                         
# Predicting on the train data
pred.train=predict(svmfit,newdata =train)
confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 433  75
        MM  55 237
                                          
               Accuracy : 0.8375          
                 95% CI : (0.8101, 0.8624)
    No Information Rate : 0.61            
    P-Value [Acc > NIR] : < 2e-16         
                                          
                  Kappa : 0.6545          
                                          
 Mcnemar's Test P-Value : 0.09563         
                                          
            Sensitivity : 0.8873          
            Specificity : 0.7596          
         Pos Pred Value : 0.8524          
         Neg Pred Value : 0.8116          
             Prevalence : 0.6100          
         Detection Rate : 0.5413          
   Detection Prevalence : 0.6350          
      Balanced Accuracy : 0.8235          
                                          
       'Positive' Class : CH              
                                          

From the confusion matrices we got the testing and training error rates are 100-80 = 20 and 100-83.75 = 16.25 respectively. The svm preformed well balancing both the sensitivity and specificity scores.

(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.

set.seed(123)
new_tune=tune(svm, Purchase~., data=train, kernel ="linear",
              ranges=list(cost=c(0.01,0.05,0.1,0.5,1,1.5,10)))
summary(new_tune)

Parameter tuning of ‘svm’:

- sampling method: 10-fold cross validation 

- best parameters:

- best performance: 0.16625 

- Detailed performance results:
new_tune$best.model

Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01, 
    0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "linear")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  0.05 

Number of Support Vectors:  356

The lowest error was 0.16625 with a best cost parameter value 0.05

(e) Compute the training and test error rates using this new value for cost.

set.seed(123)
svmfit.new=svm(Purchase~., kernel = "linear",data=train,cost=0.05)
summary(svmfit.new)

Call:
svm(formula = Purchase ~ ., data = train, kernel = "linear", cost = 0.05)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  linear 
       cost:  0.05 

Number of Support Vectors:  356

 ( 179 177 )


Number of Classes:  2 

Levels: 
 CH MM
# predicting on the test data
pred.new=predict(svmfit.new,newdata =test)
confusionMatrix(pred.new,as.factor(test$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 136  21
        MM  29  84
                                          
               Accuracy : 0.8148          
                 95% CI : (0.7633, 0.8593)
    No Information Rate : 0.6111          
    P-Value [Acc > NIR] : 4.049e-13       
                                          
                  Kappa : 0.6157          
                                          
 Mcnemar's Test P-Value : 0.3222          
                                          
            Sensitivity : 0.8242          
            Specificity : 0.8000          
         Pos Pred Value : 0.8662          
         Neg Pred Value : 0.7434          
             Prevalence : 0.6111          
         Detection Rate : 0.5037          
   Detection Prevalence : 0.5815          
      Balanced Accuracy : 0.8121          
                                          
       'Positive' Class : CH              
                                          
# Predicting on the train data
pred.new2=predict(svmfit.new,newdata =train)
confusionMatrix(pred.new2,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 433  75
        MM  55 237
                                          
               Accuracy : 0.8375          
                 95% CI : (0.8101, 0.8624)
    No Information Rate : 0.61            
    P-Value [Acc > NIR] : < 2e-16         
                                          
                  Kappa : 0.6545          
                                          
 Mcnemar's Test P-Value : 0.09563         
                                          
            Sensitivity : 0.8873          
            Specificity : 0.7596          
         Pos Pred Value : 0.8524          
         Neg Pred Value : 0.8116          
             Prevalence : 0.6100          
         Detection Rate : 0.5413          
   Detection Prevalence : 0.6350          
      Balanced Accuracy : 0.8235          
                                          
       'Positive' Class : CH              
                                          

(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.

confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 488 312
        MM   0   0
                                         
               Accuracy : 0.61           
                 95% CI : (0.5752, 0.644)
    No Information Rate : 0.61           
    P-Value [Acc > NIR] : 0.5155         
                                         
                  Kappa : 0              
                                         
 Mcnemar's Test P-Value : <2e-16         
                                         
            Sensitivity : 1.00           
            Specificity : 0.00           
         Pos Pred Value : 0.61           
         Neg Pred Value :  NaN           
             Prevalence : 0.61           
         Detection Rate : 0.61           
   Detection Prevalence : 1.00           
      Balanced Accuracy : 0.50           
                                         
       'Positive' Class : CH             
                                         

Tuning the model

tune.radial$best.model

Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01, 
    0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "radial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  radial 
       cost:  1 

Number of Support Vectors:  365
confusionMatrix(pred.radialtrain,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 446  76
        MM  42 236
                                         
               Accuracy : 0.8525         
                 95% CI : (0.826, 0.8764)
    No Information Rate : 0.61           
    P-Value [Acc > NIR] : < 2.2e-16      
                                         
                  Kappa : 0.6838         
                                         
 Mcnemar's Test P-Value : 0.002382       
                                         
            Sensitivity : 0.9139         
            Specificity : 0.7564         
         Pos Pred Value : 0.8544         
         Neg Pred Value : 0.8489         
             Prevalence : 0.6100         
         Detection Rate : 0.5575         
   Detection Prevalence : 0.6525         
      Balanced Accuracy : 0.8352         
                                         
       'Positive' Class : CH             
                                         

(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2

confusionMatrix(pred.train,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 488 312
        MM   0   0
                                         
               Accuracy : 0.61           
                 95% CI : (0.5752, 0.644)
    No Information Rate : 0.61           
    P-Value [Acc > NIR] : 0.5155         
                                         
                  Kappa : 0              
                                         
 Mcnemar's Test P-Value : <2e-16         
                                         
            Sensitivity : 1.00           
            Specificity : 0.00           
         Pos Pred Value : 0.61           
         Neg Pred Value :  NaN           
             Prevalence : 0.61           
         Detection Rate : 0.61           
   Detection Prevalence : 1.00           
      Balanced Accuracy : 0.50           
                                         
       'Positive' Class : CH             
                                         

Tuning the model

tune.poly$best.model

Call:
best.tune(method = svm, train.x = Purchase ~ ., data = train, ranges = list(cost = c(0.01, 
    0.05, 0.1, 0.5, 1, 1.5, 10)), kernel = "polynomial")


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1.5 
     degree:  3 
     coef.0:  0 

Number of Support Vectors:  391
set.seed(123)
svmfit.polyrefit=svm(Purchase~., kernel = "polynomial",data=train,cost=1.5, degree = 3)
summary(svmfit.polyrefit)

Call:
svm(formula = Purchase ~ ., data = train, kernel = "polynomial", cost = 1.5, 
    degree = 3)


Parameters:
   SVM-Type:  C-classification 
 SVM-Kernel:  polynomial 
       cost:  1.5 
     degree:  3 
     coef.0:  0 

Number of Support Vectors:  391

 ( 196 195 )


Number of Classes:  2 

Levels: 
 CH MM
# predicting on the test data
pred.polytest=predict(svmfit.polyrefit,newdata =test)
confusionMatrix(pred.polytest,as.factor(test$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 140  30
        MM  25  75
                                          
               Accuracy : 0.7963          
                 95% CI : (0.7433, 0.8427)
    No Information Rate : 0.6111          
    P-Value [Acc > NIR] : 5.34e-11        
                                          
                  Kappa : 0.5677          
                                          
 Mcnemar's Test P-Value : 0.5896          
                                          
            Sensitivity : 0.8485          
            Specificity : 0.7143          
         Pos Pred Value : 0.8235          
         Neg Pred Value : 0.7500          
             Prevalence : 0.6111          
         Detection Rate : 0.5185          
   Detection Prevalence : 0.6296          
      Balanced Accuracy : 0.7814          
                                          
       'Positive' Class : CH              
                                          
# Predicting on the train data
pred.polytrain=predict(svmfit.polyrefit,newdata =train)
confusionMatrix(pred.polytrain,as.factor(train$Purchase))
Confusion Matrix and Statistics

          Reference
Prediction  CH  MM
        CH 458  96
        MM  30 216
                                          
               Accuracy : 0.8425          
                 95% CI : (0.8154, 0.8671)
    No Information Rate : 0.61            
    P-Value [Acc > NIR] : < 2.2e-16       
                                          
                  Kappa : 0.6559          
                                          
 Mcnemar's Test P-Value : 7.011e-09       
                                          
            Sensitivity : 0.9385          
            Specificity : 0.6923          
         Pos Pred Value : 0.8267          
         Neg Pred Value : 0.8780          
             Prevalence : 0.6100          
         Detection Rate : 0.5725          
   Detection Prevalence : 0.6925          
      Balanced Accuracy : 0.8154          
                                          
       'Positive' Class : CH              
                                          

(h) Overall, which approach seems to give the best results on this data?

I am creating a dataframe of test and train accuracies with the used models below. I subtracted the accuracy from 100% to get the error since total of accuracy + error = 100%. We can also find the classification error rate by just counting the total number of false positives and false negatives from the table in the confusion matrix and divide it by the total observations.

We can see that the svm classifier with a radial kernel has got the minimum error among all the kernels. I used the coast =1 after tuning and finding out the optimal parameter.

---
title: "Assignment 8"
author: "Mounica Miriyala"
date: "4/25/2021"
output:
  html_notebook:
    toc: yes
    toc_float: yes
  pdf_document:
    toc: yes
  html_document:
    toc: yes
    df_print: paged
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE,message = FALSE, warning = FALSE, cache = TRUE)
```  
  
### Problem 5  
**We have seen that we can fit an SVM with a non-linear kernel in order to perform classification using a non-linear decision boundary. We will now see that we can also obtain a non-linear decision boundary by performing logistic regression using non-linear transformations of the features.**  
**(a) Generate a data set with n = 500 and p = 2, such that the observations belong to two classes with a quadratic decision boundary between them. For instance, you can do this as follows:**  
```{r}
library(glmnet)
set.seed(123)
x1=runif (500) -0.5
x2=runif (500) -0.5
y=1*(x1^2-x2^2 > 0)
df <- data.frame(x1,x2,y)
```
  
  
```{r}
str(df)
df$y <- as.factor(df$y)
```

**(b) Plot the observations, colored according to their class labels. Your plot should display X1 on the x-axis, and X2 on the yaxis.**  
```{r}
library(ggplot2)
ggplot(df,aes(x1, x2, color = y)) +
  geom_point(alpha = 1)

```
  
**(c) Fit a logistic regression model to the data, using X1 and X2 as predictors.**  
```{r}
set.seed(123) 
model_lr <- glm(y ~.,family='binomial',data=df)
summary(model_lr)
```
  
**(d) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be linear.**  
```{r}
attach(df)
preds_lr <- predict(model_lr,df, type = "response")

plot(x1,x2, col = ifelse(preds_lr>=0.5,'red','blue'), 
     pch = 19,main = "Predicted obs. with linear decision boundary")

```  
  
The above plot with the predicted observations from logistic regression has got a linear decision boundary and we can see that.  
  
**(e) Now fit a logistic regression model to the data using non-linear functions of X1 and X2 as predictors (e.g. X2 1 , X1×X2, log(X2), and so forth).**  
```{r}
set.seed(123) 
model_lr2 <- glm(y ~ I(x1*x2)+poly(x2,2)+poly(x1,3),family='binomial',data=df)
summary(model_lr2)
```
  
**(f) Apply this model to the training data in order to obtain a predicted class label for each training observation. Plot the observations, colored according to the predicted class labels. The decision boundary should be obviously non-linear. If it is not, then repeat (a)-(e) until you come up with an example in which the predicted class labels are obviously non-linear.**  
```{r}
preds_lr2 <- predict(model_lr2, df, type = "response")
```  
  
```{r}
plot(x1,x2, col = ifelse(preds_lr2>=0.5,'#2A363B','#FF847C'), 
     pch = 19,main = "Predicted obs. with linear decision boundary")
```
  
**(g) Fit a support vector classifier to the data with X1 and X2 as predictors. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.**  
```{r}
library(e1071)
set.seed(123)
svmfit = svm(y~., data=df,cost =10, scale=FALSE)

svm.preds <- predict(svmfit,newdata=df,type='response')

```
  
```{r}
plot(x1,x2, col = ifelse(svm.preds!=0,'#8482b3','#82b2b3'), 
     pch = 19,main = "Predicted obs. with svm")
```
  
**(h) Fit a SVM using a non-linear kernel to the data. Obtain a class prediction for each training observation. Plot the observations, colored according to the predicted class labels.**  
```{r}
set.seed(123)
svmfit2 = svm(y~., data=df, kernel ="radial", gamma=1, cost=1)

svm.preds2 <- predict(svmfit2,newdata=df,type='response')
```
  
```{r}
plot(x1,x2, col = ifelse(svm.preds2!=0,'#7b0749','#e7b5d2'), 
     pch = 19,main = "Predicted obs. with non-linear kernel")
```
  
**(i) Comment on your results.**  
```{r}
library(caret)
preds_lr <- as.factor(ifelse(preds_lr >= 0.5,1,0))
confusionMatrix(preds_lr,df$y)
```
  
```{r}
preds_lr2 <- as.factor(ifelse(preds_lr2 >= 0.5,1,0))
confusionMatrix(preds_lr2,df$y)
```
  
```{r}
confusionMatrix(svm.preds,df$y)
```
  
```{r}
confusionMatrix(svm.preds2, df$y)
```
  
From the confusion matrices of all the models built above we can see that the accuarcies of the Support vector machine models are higher. Also logistic Regression with non-linear functions of the predictors shows an accuracy of 100%.  
  
### Problem 7 
**In this problem, you will use support vector approaches in order to predict whether a given car gets high or low gas mileage based on the `Auto` data set.**  
**(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.**  
```{r}
library(ISLR)
data <- Auto
str(data)
summary(data)   # 22.75
```
  
From the output the median value of the mpg column is `22.75`  
Now we will create a binary variable which takes value for cars with gas mileage above the median and a 0 for cars with gas mileage below the median  
```{r}
data$class <- as.factor(ifelse(data$mpg > 22.75,1,0))

```
  
**(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.**  
```{r}
set.seed(123)
tune.out=tune(svm, class~., data=data, kernel = 'linear',ranges=list(cost=c(0.1,1,10,100,1000),
                                                   gamma=c(0.5,1,2,3,4)))
summary(tune.out)
```
  
The best cost parameter we can use from the 10 fold cross validations is 1 with a 0.5 gamma value. the minimum error recorded was `0.07392109	`. The cross validation errors for each combination of gamma and cost values are shown in the output above. We used the tune() to perform cross validation here.  
   
**(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.**  
```{r}
set.seed(123)
tune.out.radial=tune(svm, class~., data=data, kernel ="radial",
              ranges=list(cost=c(0.1,1,10,100,1000),
                          gamma=c(0.5,1,2,3,4)))
summary(tune.out.radial)
```
  
```{r}
set.seed(123)
tune.out.polynomial=tune(svm, class~., data=data, kernel ="polynomial",
              ranges=list(cost=c(0.1,1,10,100,1000),
                          gamma=c(0.5,1,2,3,4)))
summary(tune.out.polynomial)
tune.out.polynomial$best.model
```
  
**SVM with radial kernel:**  
When I performed the SVM classification with radial kernel the best parameters was cost = 1 and gamma = 0.5. And the lowest error recorded was `0.04837794`.  
**SVM with polynomial kernel:**  
When I performed the SVM classification with polynomial kernel the best parameters was cost = 0.1	 and gamma = 0.5. And the lowest error recorded was `0.0961884`.  

So for the linear and the radial kernels the cost and the gamma values chosen are 1 and 0.5 with errors 0.07392109 and 0.04837794. While the polynomial kernel error was 0.0961884. which is bigger than the linear and radial kernel classification. The best model among the three is the model with the minimum error which is radial model with an error of `0.04837794`  
  
**(d) Make some plots to back up your assertions in (b) and (c). Hint: In the lab, we used the plot() function for svm objects only in cases with p = 2. When p > 2, you can use the plot() function to create plots displaying pairs of variables at a time. Essentially, instead of typing plot(svmfit , dat) where svmfit contains your fitted model and dat is a data frame containing your data, you can type plot(svmfit , dat , x1∼x4)**  
```{r}
# creation of a function to create plots with all the variables one by one
plotpairs = function(svmfit){
  for(i in names(data)[!(names(data) %in% c("mpg","class","name"))]){
    plot(svmfit,data,as.formula(paste("mpg",i,sep = "~")))
  }
}
```
  
```{r}
# models with their kerenels and best parameters
set.seed(123)
svm.linear = svm(class~., data=data, kernal="linear", cost=1,gamma = 0.5)
svm.radial = svm(class~., data=data, kernal="radial", cost=1, gamma=0.5)
svm.polynomial = svm(class~., data=data, kernal="polynomial", cost=0.1, degree=3, 
                     gamma = 0.5)
```
  
```{r}
# plotting to back up my assertions
plotpairs(svm.linear)
```  
  
```{r}
plotpairs(svm.radial)

```
  
```{r}
plotpairs(svm.polynomial)
```
  
### Problem 8  
**This problem involves the OJ data set which is part of the ISLR package.**  
**(a) Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.**  
```{r}
oj <- OJ
str(oj)
```  
  
```{r}
set.seed(123)
intrain <- createDataPartition(oj$Purchase, p = 0.746, list = FALSE)
train <- oj[intrain,]
test <- oj[-intrain,]
```
  
**(b) Fit a support vector classifier to the training data using cost=0.01, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics, and describe the results obtained.**  
```{r}
set.seed(123)
svmfit=svm(Purchase~., kernel = "linear",data=train,cost=0.01)
summary(svmfit)

```
  
The svm classifier used the  kernel linear one and got 627 support vectors. For this model fit I used all the explanatory variables with Purchase as the target variable.  
```{r}
# predicting on the test data
pred=predict(svmfit,newdata =test)
confusionMatrix(pred,as.factor(test$Purchase))

```
  
```{r}
# Predicting on the train data
pred.train=predict(svmfit,newdata =train)
confusionMatrix(pred.train,as.factor(train$Purchase))

```
  
From the confusion matrices we got the testing and training error rates are `100-80 = 20` and `100-83.75 = 16.25` respectively. The svm preformed well balancing both the sensitivity and specificity scores.  
  
**(d) Use the tune() function to select an optimal cost. Consider values in the range 0.01 to 10.**  
```{r}
set.seed(123)
new_tune=tune(svm, Purchase~., data=train, kernel ="linear",
              ranges=list(cost=c(0.01,0.05,0.1,0.5,1,1.5,10)))
summary(new_tune)
new_tune$best.model
```  
  
The lowest error was 0.16625	with a best cost parameter value 0.05  
  
**(e) Compute the training and test error rates using this new value for cost.**  
```{r}
set.seed(123)
svmfit.new=svm(Purchase~., kernel = "linear",data=train,cost=0.05)
summary(svmfit.new)
```
  
```{r}
# predicting on the test data
pred.new=predict(svmfit.new,newdata =test)
confusionMatrix(pred.new,as.factor(test$Purchase))
```
  
```{r}
# Predicting on the train data
pred.new2=predict(svmfit.new,newdata =train)
confusionMatrix(pred.new2,as.factor(train$Purchase))
```
  
**(f) Repeat parts (b) through (e) using a support vector machine with a radial kernel. Use the default value for gamma.**  
```{r}
set.seed(123)
svmfit.radialfit=svm(Purchase~., kernel = "radial",data=train,cost=0.01)
summary(svmfit.radialfit)

# predicting on the test data
pred=predict(svmfit.radialfit,newdata =test)
confusionMatrix(pred,as.factor(test$Purchase))

# Predicting on the train data
pred.train=predict(svmfit.radialfit,newdata =train)
confusionMatrix(pred.train,as.factor(train$Purchase))
```
  
**Tuning the model**  
  
```{r}
set.seed(123)
tune.radial=tune(svm, Purchase~., data=train, kernel ="radial",
              ranges=list(cost=c(0.01,0.05,0.1,0.5,1,1.5,10)))
summary(tune.radial)
tune.radial$best.model

```
  
```{r}
set.seed(123)
svmfit.radialrefit=svm(Purchase~., kernel = "radial",data=train,cost=1)
summary(svmfit.radialrefit)

# predicting on the test data
pred.radialtest=predict(svmfit.radialrefit,newdata =test)
confusionMatrix(pred.radialtest,as.factor(test$Purchase))

# Predicting on the train data
pred.radialtrain=predict(svmfit.radialrefit,newdata =train)
confusionMatrix(pred.radialtrain,as.factor(train$Purchase))
```
  
**(g) Repeat parts (b) through (e) using a support vector machine with a polynomial kernel. Set degree=2**  
```{r}
set.seed(123)
svmfit.polyfit=svm(Purchase~., kernel = "polynomial",data=train,cost=0.01, degree = 2)
summary(svmfit.polyfit)

# predicting on the test data
pred=predict(svmfit.polyfit,newdata =test)
confusionMatrix(pred,as.factor(test$Purchase))

# Predicting on the train data
pred.train=predict(svmfit.polyfit,newdata =train)
confusionMatrix(pred.train,as.factor(train$Purchase))
```
  
**Tuning the model**  
  
```{r}
set.seed(123)
tune.poly=tune(svm, Purchase~., data=train, kernel ="polynomial",
              ranges=list(cost=c(0.01,0.05,0.1,0.5,1,1.5,10)))
summary(tune.poly)
tune.poly$best.model

```

  
```{r}
set.seed(123)
svmfit.polyrefit=svm(Purchase~., kernel = "polynomial",data=train,cost=1.5, degree = 3)
summary(svmfit.polyrefit)

# predicting on the test data
pred.polytest=predict(svmfit.polyrefit,newdata =test)
confusionMatrix(pred.polytest,as.factor(test$Purchase))

# Predicting on the train data
pred.polytrain=predict(svmfit.polyrefit,newdata =train)
confusionMatrix(pred.polytrain,as.factor(train$Purchase))
```
  
**(h) Overall, which approach seems to give the best results on this data?**
  
I am creating a dataframe of test and train accuracies with the used models below. I subtracted the accuracy from 100% to get the error since total of accuracy + error = 100%. We can also find the classification error rate by just counting the total number of false positives and false negatives from the table in the confusion matrix and divide it by the total observations.    
```{r}
model.kernel <- c("linear","radial", "polynomial")
test_error <-  c(18.52, 18.15, 20.37)
train_error <- c(16.25, 14.75, 15.75)
table_accuracy <- data.frame(model.kernel,train_error,test_error)

table_accuracy
```

  
We can see that the svm classifier with a radial kernel has got the minimum error among all the kernels. I used the coast =1 after tuning and finding out the optimal parameter. 



  




