library(e1071)
library(caTools)
library(caret)
library(tidyverse)
library(ggplot2)1) SVM on Plantgrowth
This dataset will have trouble being accurate in the SVM algorithm. There is only one feature to compared the three types of groups. From the model, there was 27 support vectors in the model. This can support the theory that is too small to create a distance between these groups for the svm algorithm
set.seed(3)
data<-PlantGrowth
#fitting our dataset in the svm()
svmmodel<-svm(group~.,data=data,kernel="linear",scale=FALSE)
#printing the model's summary
summary(svmmodel)##
## Call:
## svm(formula = group ~ ., data = data, kernel = "linear", scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 1
##
## Number of Support Vectors: 28
##
## ( 10 9 9 )
##
##
## Number of Classes: 3
##
## Levels:
## ctrl trt1 trt2
#plotting the support vector plane, cannot plot as its only one feature and needs at least two
#plot(svmmodel,data, weight~group)3) SVM on Iris
Unlike the PlantGrowth data set, the iris dataset has more than one features to based its classification on. Hence, the ability to plot the data points against the model is now possible. We do have to set up the SVM to cater to multiple groups, so we will have to specify in plotting there is three types of variables. Following the tips in Raut’s example of SVM, we will use a slice function to specify the plane’s dimensions for the plot
In our plot, we see the support vectors along the boundaries of virginica species and setosa. These can mean the support vectors of those species very influential in deciding which of the observations belongs to those species. It can be infer from the singular support vector from the versicolor species that is is more difficult to categorize its features.
data<-iris
#fitting our dataset in the svm()
svmmodel<-svm(Species~.,data=data,scale=FALSE)
#printing the model's summary
summary(svmmodel)##
## Call:
## svm(formula = Species ~ ., data = data, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 45
##
## ( 7 19 19 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
#Plotting the SV plane to see the margin between the species
plot(svmmodel,data, Petal.Width~Petal.Length,
slice=list(Sepal.Width=3, Sepal.Length=4))4) Splitting iris for test/train
When we split our iris data set into its 80/20 split, the model received a randomized pool to build its algorithm. Like I mentioned in the previous question, it appears the sestosa and virginica species has the most accurate predictions as the carry the majority of the support vectors. It can be theorized that these species observations have finer observations for the SVM algorithm to classify.
It is not a surprise that the inaccuracy in predictions lies in the versicolor species. There is not enough support vectors to define its features compared to the other species. The overlaps in data points between versicolor and virginica may influence the algorithm to favor the viriginca as the final decision.
#Spiting the iris dataset by 80/20 to test the svm accuracy
split<-sample.split(data,SplitRatio = 0.8)
train_set<-subset(data,split=="TRUE")
test_set<-subset(data,split=="FALSE")
#fitting our training data in the svm()
svmmodel<-svm(Species~.,data=train_set,scale=FALSE)
#seeing the summary of the model
summary(svmmodel)##
## Call:
## svm(formula = Species ~ ., data = train_set, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: radial
## cost: 1
##
## Number of Support Vectors: 35
##
## ( 5 15 15 )
##
##
## Number of Classes: 3
##
## Levels:
## setosa versicolor virginica
#Making predictions against our model
svmpred<-predict(svmmodel,test_set)
#creating the "confusion matrix"
cm<-table(test_set$Species,svmpred)
cm## svmpred
## setosa versicolor virginica
## setosa 10 0 0
## versicolor 0 10 0
## virginica 0 2 8
#calculating the accuracy of our model
acc<-sum(diag(cm))/nrow(test_set)
sprintf("The model is %0.2f%% accurate",acc)## [1] "The model is 0.93% accurate"