library(tidyverse)
library(e1071) #package with svm()
library(caret)

Assignment Prompt

(a) Perform an analysis of the dataset used in Homework #2 using the SVM algorithm.Compare the results with the results from previous homework.

Preparing the dataset exactly as I did in HW2.

stress <- read.csv("stress.csv", 
                 col.names = c("humidity", "temp", "steps", "stress_lvl"),
                 colClasses = c("numeric", "numeric", "numeric", "factor"))

#split into test/train set
set.seed(3190)
sample_set <- sample(nrow(stress), round(nrow(stress)*0.75), replace = FALSE)
stress_train <- stress[sample_set, ]
stress_test <- stress[-sample_set, ]

Building the SVM model, which is based on 3 predictor variables and is thus challenging to represent graphically.

svm_model <- svm(stress_lvl ~ ., data = stress_train, type = 'C-classification', kernel = "linear")

print(svm_model)
## 
## Call:
## svm(formula = stress_lvl ~ ., data = stress_train, type = "C-classification", 
##     kernel = "linear")
## 
## 
## Parameters:
##    SVM-Type:  C-classification 
##  SVM-Kernel:  linear 
##        cost:  1 
## 
## Number of Support Vectors:  136

Next I store the predictions and test against the test st of the stress data. It appears there is 99.8% accuracy. Previously the Random Forest model had an accuracy of 100% so that one is technically best, however this data is so clearly related the models used here are more complex than necessary - though a good practice.

test_pred <- predict(svm_model, newdata = stress_test)

confusionMatrix(table(test_pred, stress_test$stress_lvl))
## Confusion Matrix and Statistics
## 
##          
## test_pred   0   1   2
##         0 128   0   0
##         1   0 195   1
##         2   0   0 176
## 
## Overall Statistics
##                                           
##                Accuracy : 0.998           
##                  95% CI : (0.9889, 0.9999)
##     No Information Rate : 0.39            
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.997           
##                                           
##  Mcnemar's Test P-Value : NA              
## 
## Statistics by Class:
## 
##                      Class: 0 Class: 1 Class: 2
## Sensitivity             1.000   1.0000   0.9944
## Specificity             1.000   0.9967   1.0000
## Pos Pred Value          1.000   0.9949   1.0000
## Neg Pred Value          1.000   1.0000   0.9969
## Prevalence              0.256   0.3900   0.3540
## Detection Rate          0.256   0.3900   0.3520
## Detection Prevalence    0.256   0.3920   0.3520
## Balanced Accuracy       1.000   0.9984   0.9972

(b) Search for academic content (at least 3 articles) that compare the use of decision trees vs SVMs in your current area of expertise.

An example of SVM vs Decision Tree classification with regard to image (geographic) classification. In this case the SVM Radial Basis function had the highest accuracy.
https://scialert.net/fulltext/?doi=itj.2009.64.70

A paper (dated, 2003) on SVM vs Decision Trees with regard to classifying gene expression. In this case the accuracy was very close when the researchers used less than 50% of the data for a training set, with bagging and boosting sometimes outpacing SVMs. Interestingly, as the training set size increased, the accuracy of the SVM models kept improving while the decision trees did not and actually began to overfit. https://www.aaai.org/Papers/FLAIRS/2003/Flairs03-019.pdf

A paper attempting to predict student performance, which used 7 variables (GPA, major, type_of_school, etc.) compared KNN, SVM, and Decision Tree models. After tuning each model appropriately they could SVM had a 95% accuracy as compared to the Decision Tree’s 93% accuracy. Looking deeper, the Decision Tree actually had the best accuracy for one specific classifier, for non-active students. These were all within 1% of the SVM model though, so the authors chose to proceed with SVM’s better overall accuracy.
https://pdfs.semanticscholar.org/c50b/3969d9a84ec1cc756bb10f057087a6e7060e.pdf