ArcLakeGroupSummary <- read_excel("~/Desktop/EPSRC Project /ArcLakeGroupSummary.xlsx")
dundeedata <- read_csv("~/Desktop/EPSRC Project /dundeedata.csv.xls")
colnames(dundeedata)[1]<-"GloboLakes_ID" # change the GloboLID column name to GloboLakes_ID to make the merge easier.
Data<-merge(ArcLakeGroupSummary, dundeedata, by = "GloboLakes_ID", all = TRUE )
Data<-subset(Data, Group!="NA") # The data set is back to the original 732 rows just with extra columns of information
Data$Group<-as.factor(Data$Group)
In order to compae different models and particular parameter values I decided to split the data into a training (80%) and test (20%) set in a stratified way. The training set will be used in stratified 5 fold cross-validation to compare the perfomance of different models. The model that performs best, i.e. has the lowest cross-validation error rate will be used toproduce a test error rate.
An illustration of the scheme to be used in comparing various models using stratification can be found below.
Some brief justifications for some of the decisions made:
The overall scheme - A good justification is explained here .
The 80-20 split - After having tried out a few different splits (i.e. 60-40, 70-30, etc…), I found that the SVMs with quadratic and radial kernels tended to overfit when it came to performing cross-validation on the training set.
The 5-fold cross-validation - A lower amount of folds induces a lower variance for the estimate of the cross-validation error rate. Also, as the training set consists 80% of all observations each of the 5 folds account for 16% of the entire data set which is compariable to the test set 20% size.
The stratification through out - Ideally we would want our classification method to wrongly / correctly predict an observation’s class based on the method being trained on that particular class. An illustration of an extreme problem that could be created if we didn’t stratify is given below. In addition, as we are using a single seed stratifying will give us a more stable estimate of our CV and test error rates.
In order to use each method, I first prepare a suitable data frame - splitting it into training and test sets and then splitting the training set into 5 folds.
Data1<-data.frame(Data[,c("Group","PC1","PC2")])
# Stratify the entire training set into training and test sets
set.seed(234)
library(caret)
train.index<-createDataPartition(Data1$Group, p=0.8, list = FALSE)
train.set<-Data1[train.index, ]
test.set<-Data1[-train.index, ]
# Stratify the training set into 5 folds
folds <- createFolds(y=factor(train.set$Group), k = 5, list = FALSE)
train.set$fold <- folds
Originally, I only considered QDA. However, including the performance of LDA may make for an interesting comparison.
# Using LDA to produce the CV error rate
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
lda.fit<-lda(formula = Group~PC1+PC2, data=train.data)
lda.y <- valid.data$Group
lda.predy<-predict(lda.fit, valid.data)$class
ith.test.error<- mean(lda.y!=lda.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.04745763
# Using QDA to produce the CV error rate
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
qda.fit<-qda(formula = Group~PC1+PC2, data=train.data)
qda.y <- valid.data$Group
qda.predy<-predict(qda.fit, valid.data)$class
ith.test.error<- mean(qda.y!=qda.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.04067797
In comparison, QDA performed better than LDA with each method producing a cross-validation error rate of 4.07% and 4.75%, respectively.
In exploring different cost values for the SVM with a linear kernel, I initially explored a couple hundred values. However, as the computation of all of these models was intense and noting that better performance was induced by smaller cost values, many larger values are omitted here.
# Searching for the best SVM with linear kernel changing cost
costs<-c(seq(0.05, 0.5, by=0.01 ),1, 10, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000)
CV.errors<-numeric(length(costs))
for(j in 1:length(costs)){
errors<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2,data = train.data, kernel="linear", cost=costs[j] ,scale=FALSE)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
errors<-c(errors,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
CV.errors[j]<-sum(errors)
}
min(CV.errors)
## [1] 0.01525424
costs[which.min(CV.errors)]
## [1] 0.27
Of all of the cost values considered, the SVM with linear kernel had the lowest CV error rate when the cost was set 0.27. The CV error rate for this model was 1.53% - nearly a third of the QDA CV error rate.
Below is a plot of different cost values and the CV error rates produced.
interactive<-ggplot(data = data.frame(cbind(costs, CV.errors)), aes(x=costs, y=CV.errors)) + geom_point()
ggplotly(interactive)
Here we retrieve and confirm the best performing SVM with linear kernel.
# Using SVM linear kernel cost = 0.27 - the best polynomial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2,data = train.data, kernel="linear", cost=0.27 ,scale=FALSE)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.01525424
Using the SVM with polynomial kernel, low values of degree with moderate costs tended to perform well. The coef0 argument was kept as 0, as changes did not help to produce better models. Also, the default gamma value (1/(data dimension)) was kept the same, as making this variable would result in the amount of models having to be fit getting out of hand - I’ll look more deeply into this in the near future.
# Searching for the best SVM with polynomial kernel changing cost and degree
costs<-c(seq(0.1, 0.5, by=0.2 ),1, 10, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 2000, 3000)
degrees<-c(1:8)
matrix.errors<-matrix(NA, nrow = length(costs), ncol = length(degrees))
for(j in 1:length(costs)){
for(l in 1:length(degrees)){
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2, data = train.data, kernel="polynomial", cost=costs[j], degree=degrees[l])
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
matrix.errors[j, l]<-sum(CV.error)
}
}
min(matrix.errors)
## [1] 0.01525424
# turn the matrix.error into a column vector
xgrid<-expand.grid(X1=costs, X2=degrees)
colnames(xgrid)<-c("costs", "degrees")
CV.Errors<-as.vector(matrix.errors)
xgrid<-cbind(xgrid, CV.Errors)
xgrid[which.min(CV.Errors), ]
## costs degrees CV.Errors
## 10 300 1 0.01525424
Of all the different combinations of parameter values considered, the best performance occured when degree was 1 and cost was 300. The CV error rate was 1.53% - this is identical to the best performing SVM with linear kernel.
Retrieving and confirming the best performing SVM with polynomial kernel.
# Using SVM polynomial kernel degree = 1, cost = 300 - the best polynomial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2, data = train.data, kernel="polynomial",degree = 1, cost=300)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.01525424
Using the SVM with radial kernel, low values of gamma with moderate values of cost tended to perform well.
# Searching for the best SVM with radial kernel changing cost and gamma
costs<-c(seq(0.1, 0.5, by=0.2 ),1, 10, 50, 100, 150, 200, 300, 400, 500, 40000)
gammas<-c(0.01, seq(0, 0.5, by=0.2),seq(1, 100, by=20), seq(100, 1000, by=200))
matrix.errors<-matrix(NA, nrow = length(costs), ncol = length(gammas))
for(j in 1:length(costs)){
for(l in 1:length(gammas)){
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2, data = train.data, kernel="radial", cost=costs[j], gamma=gammas[l])
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
matrix.errors[j, l]<-sum(CV.error)
}
}
min(matrix.errors)
## [1] 0.01525424
# turn the matrix.error into a column vector
xgrid<-expand.grid(X1=costs, X2=gammas)
colnames(xgrid)<-c("costs", "gammas")
CV.Errors<-as.vector(matrix.errors)
xgrid<-cbind(xgrid, CV.Errors)
xgrid[which.min(CV.Errors), ]
## costs gammas CV.Errors
## 13 40000 0.01 0.01525424
Of all the different combinations of parameter values considered, the best performance occured when gamma was 0.2 and cost was 300. The CV error rate was 2.2%.
# Using SVM radial kernel cost = 300, gamma = 0.2 - the best radial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~PC1+PC2, data = train.data, kernel="radial", cost=300, gamma=0.2)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.0220339
Overall, there was a tie between the SVM (linear kernel, cost = 0.27) and SVM (polynomial kernel, degree = 1, cost = 300). However, in the case of this tie, the SVM with linear kernel is the simpler model and thus preferred. Now we will use the entire training set to train this model and test it on the test set to get the test error rate.
svmfit<-svm(Group~PC1+PC2,data = train.set,kernel="linear",cost=0.27 ,scale=FALSE)
svmfit
##
## Call:
## svm(formula = Group ~ PC1 + PC2, data = train.set, kernel = "linear",
## cost = 0.27, scale = FALSE)
##
##
## Parameters:
## SVM-Type: C-classification
## SVM-Kernel: linear
## cost: 0.27
## gamma: 0.5
##
## Number of Support Vectors: 57
A plot of how the model partitioned the space is given below - the observations represented by a cross are the support vectors.
The in-built plot function in the e1071 library plots the SVM classification in a weird way - PC1 gets put on the y-axis and PC2 gets put on the x-axis. I decided to create my own classification plot over a fine grid
xgrid<-expand.grid(X1=seq(min(Data$PC1), max(Data$PC1), length.out = 150), X2=seq(min(Data$PC2), max(Data$PC2), length.out = 150))
colnames(xgrid)<-c("PC1", "PC2")
group.train.set.pred<-predict(svmfit, xgrid)
xgrid<-cbind(xgrid, group.train.set.pred)
ggplot(xgrid, aes(x=PC1,y=PC2))+
geom_point(aes(colour=group.train.set.pred), alpha = 1/5)+
geom_point(data = train.set[-svmfit$index, ], aes(x=PC1, y=PC2, colour=Group))+
geom_point(data = train.set[svmfit$index, ], aes(x=PC1, y=PC2, colour=Group), shape=4)+
labs(colour = "Group", title = "Decision Surface With Training Set Observations")
svm.y<-test.set$Group
svm.predy<-predict(svmfit, test.set)
mean(svm.y!=svm.predy)
## [1] 0.03521127
The test error rate for this model was 3.52%.
The cross-classification table is given below.
table(svm.y, svm.predy)
## svm.predy
## svm.y 1 2 3 4 5 6 7 8 9
## 1 11 0 0 0 0 0 0 0 0
## 2 0 8 0 0 0 0 0 0 0
## 3 0 0 15 0 0 0 0 0 0
## 4 0 0 0 23 1 0 0 0 0
## 5 0 0 0 1 44 0 0 0 3
## 6 0 0 0 0 0 8 0 0 0
## 7 0 0 0 0 0 0 3 0 0
## 8 0 0 0 0 0 0 0 5 0
## 9 0 0 0 0 0 0 0 0 20
4 out of the 5 errors were observations of class 5 being misclassified as either group 4 or 9.
ggplot(xgrid, aes(x=PC1,y=PC2))+
geom_point(aes(colour=group.train.set.pred), alpha = 1/5)+
geom_point(data = test.set, aes(x=PC1, y=PC2, colour=Group))+
labs(colour = "Group", title = "Decision Surface With Test Observations")
In order to use each model, I prepare a suitable data frame - splitting it into training and test sets and then splitting the training set into 5 folds.
Data2<-data.frame(Data[, c("Group", "Latitude", "Longitude", "OverallAvg")])
# Stratify the entire training set into training and test sets
set.seed(234)
library(caret)
train.index<-createDataPartition(Data2$Group, p=0.8, list = FALSE)
train.set<-Data2[train.index, ]
test.set<-Data2[-train.index, ]
# Stratify the training set into 5 folds
folds <- createFolds(y=factor(train.set$Group), k = 5, list = FALSE)
train.set$fold <- folds
Here we implement QDA and LDA, the comparison may be insightful.
# Using LDA to produce the CV error rate
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
lda.fit<-lda(formula = Group~ Longitude + Latitude + OverallAvg, data=train.data)
lda.y <- valid.data$Group
lda.predy<-predict(lda.fit, valid.data)$class
ith.test.error<- mean(lda.y!=lda.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.08983051
# Using QDA to produce the CV error rate
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
qda.fit<-qda(formula = Group~ Longitude + Latitude + OverallAvg, data=train.data)
qda.y <- valid.data$Group
qda.predy<-predict(qda.fit, valid.data)$class
ith.test.error<- mean(qda.y!=qda.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.05932203
As seen in the output above, QDA performed much better than LDA with each model producing a cross-validationerror rate of 5.93% and 8.98%, respectively. There appears to be a significant difference in the two rates - it may be the case that we could expect more flexible models to perform better.
Using an SVM with linear kernel, low values of cost tended to perform well.
# Searching for the best SVM with linear kernel changing cost
costs<-c(seq(0.05, 0.5, by=0.01 ),1, 10, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000)
CV.errors<-numeric(length(costs))
for(j in 1:length(costs)){
errors<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg, data = train.data, kernel="linear", cost=costs[j] ,scale=FALSE)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
errors<-c(errors,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
CV.errors[j]<-sum(errors)
}
min(CV.errors)
## [1] 0.0559322
costs[which.min(CV.errors)]
## [1] 10
Of all the cost values considered, the SVM with linear kernel had the lowest CV error rate when the cost was set to 10. The CV error rate for this model was 5.59% - slightly better than the performance of QDA.
Below is a plot of different cost values and the CV error rates produced.
interactive<-ggplot(data = data.frame(cbind(costs, CV.errors)), aes(x=costs, y=CV.errors)) + geom_point()
ggplotly(interactive)
Retrieving and confirming the best performing SVM with linear kernel.
# Using SVM linear kernel cost = 10 - the best polynomial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg, data = train.data, kernel="linear", cost=10 ,scale=FALSE)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.0559322
Using the SVM with polynomial kernel, low values of degree with moderate costs tended to perform well. The treatment of coef0 and gamma are the same as previously mentioned.
# Searching for the best SVM with polynomial kernel changing cost and degree
costs<-c(seq(0.1, 0.5, by=0.2 ),1, 10, 50, 100, 150, 200, 300, 400, 500, 600, 700, 800, 900, 1000, seq(1000, 3000, by=500))
degrees<-c(1:8)
matrix.errors<-matrix(NA, nrow = length(costs), ncol = length(degrees))
for(j in 1:length(costs)){
for(l in 1:length(degrees)){
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg,data = train.data, kernel="polynomial", cost=costs[j], degree=degrees[l])
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
matrix.errors[j, l]<-sum(CV.error)
}
}
min(matrix.errors)
## [1] 0.04576271
# turn the matrix.error into a column vector
xgrid<-expand.grid(X1=costs, X2=degrees)
colnames(xgrid)<-c("costs", "degrees")
CV.Errors<-as.vector(matrix.errors)
xgrid<-cbind(xgrid, CV.Errors)
xgrid[which.min(CV.Errors), ]
## costs degrees CV.Errors
## 12 500 1 0.04576271
Of all the different combinations of parameter values considered, the best performance occured when degree was 1 and cost was 500. The CV error rate was 4.58%.
Retrieving and confirming the best performing SVM with polynomial kernel.
# Using SVM polynomial kernel degree = 1, cost = 500 - the best polynomial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg, data = train.data, kernel="polynomial",degree = 1, cost=500)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.04576271
Using the SVM with radial kernel, low values of gamma with moderate values of cost tended to perform well.
# Searching for the best SVM with radial kernel changing cost and gamma
# changed cost gamma high cost low gammas appear to be good choices.
costs<-c(seq(0.1, 0.5, by=0.2 ),1, 10, 50, 100, 150, 200, 300, 400, 500)
gammas<-c(seq(0, 0.5, by=0.2),seq(1, 100, by=20), seq(100, 1000, by=200))
matrix.errors<-matrix(NA, nrow = length(costs), ncol = length(gammas))
for(j in 1:length(costs)){
for(l in 1:length(gammas)){
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg, data = train.data, kernel="radial", cost=costs[j], gamma=gammas[l])
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
matrix.errors[j, l]<-sum(CV.error)
}
}
min(matrix.errors)
## [1] 0.04915254
# turn the matrix.error into a column vector
xgrid<-expand.grid(X1=costs, X2=gammas)
colnames(xgrid)<-c("costs", "gammas")
CV.Errors<-as.vector(matrix.errors)
xgrid<-cbind(xgrid, CV.Errors)
xgrid[which.min(CV.Errors), ]
## costs gammas CV.Errors
## 22 300 0.2 0.04915254
Of all the different combinations of parameter values considered, the best performance occured when gamma was 0.2 and cost was 300 - this is the same as for when PC1 & PC2 were the explanatory variables. The CV error rate here was 4.92%.
Retrieving and confirming the best performing SVM with radial kernel.
# Using SVM radial kernel cost = 300, gamma = 0.2 - the best radial kernel SVM model considered.
CV.error<-NULL
for (i in 1:5) {
valid.data <- subset(train.set, fold == i)
train.data <- subset(train.set, fold != i)
svmfit<-svm(Group~ Longitude + Latitude + OverallAvg, data = train.data, kernel="radial", cost=300, gamma=0.2)
svm.y<-valid.data$Group
svm.predy<-predict(svmfit, valid.data)
ith.test.error<- mean(svm.y!=svm.predy)
CV.error<-c(CV.error,(nrow(valid.data)/nrow(train.set))*ith.test.error)
}
sum(CV.error)
## [1] 0.04915254
Overall, the best performing model was an SVM (polynomial kernel, degree = 1, cost = 500). Now we will use the entire training set to train ths model and test it on the test set to get the test error rate.
svmfit<-svm(Group ~ Longitude + Latitude + OverallAvg,data = train.set, kernel="polynomial", degree=1, cost=500)
svm.y<-test.set$Group
svm.predy<-predict(svmfit, test.set)
mean(svm.y!=svm.predy)
## [1] 0.04929577
The test error rate for this model was 4.93%. The cross-classification table is given below.
table(svm.y, svm.predy)
## svm.predy
## svm.y 1 2 3 4 5 6 7 8 9
## 1 11 0 0 0 0 0 0 0 0
## 2 0 8 0 0 0 0 0 0 0
## 3 0 0 15 0 0 0 0 0 0
## 4 0 0 0 24 0 0 0 0 0
## 5 0 0 0 3 44 0 0 0 1
## 6 0 0 1 0 0 7 0 0 0
## 7 0 0 0 0 0 0 2 1 0
## 8 0 0 0 0 0 0 0 5 0
## 9 0 0 0 0 1 0 0 0 19
Interestingly,4 out of the 7 errors were observations of class 5 being misclassified as either groups 4 or 9. High misclassification of class 5 was also evident when PC1 and PC2 were used as explanatory variables.
svmfit<-svm(Group ~ Longitude + Latitude + OverallAvg,data = train.set, kernel="polynomial", degree=1, cost=500)
xgrid<-expand.grid(X1=seq(min(Data$Longitude), max(Data$Longitude), length.out = 70), X2=seq(min(Data$Latitude), max(Data$Latitude), length.out = 70), X3=seq(min(Data$OverallAvg), max(Data$OverallAvg), length.out = 70))
colnames(xgrid)<-c("Longitude", "Latitude", "OverallAvg")
group.train.set.pred<-predict(svmfit, xgrid)
xgrid<-cbind(xgrid, group.train.set.pred)
There is no in-built function to plot the SVM classification for dimensions greater than two. The plot of how the model partitions the space is given below. The plot was created using a fine grid.
interactive <- plot_ly() %>%
add_trace(
x = ~xgrid$Longitude,
y = ~xgrid$Latitude,
z= ~xgrid$OverallAvg,
mode = "markers",
color= ~ xgrid$group.train.set.pred,
opacity=0.05,
text = ~paste("Predicted Group: ", xgrid$group.train.set.pred)) %>%
add_trace(
x = ~test.set$Longitude,
y = ~test.set$Latitude,
z= ~test.set$OverallAvg,
mode = "markers",
color= ~ test.set$Group,
opacity=1,
text = ~paste("Group: ", test.set$Group))%>%
layout(
title = "Predicted 3d space with test observations",
scene = list(
xaxis = list(title = "Longitude"),
yaxis = list(title = "Latitude"),
zaxis = list(title = "OverallAvg")))%>%
layout(annotations=list(yref="paper", xref="paper", y=1.05, x=1.1,text= "Predicted / Actual", showarrow=F))
interactive
## Warning in RColorBrewer::brewer.pal(N, "Set2"): n too large, allowed maximum for palette Set2 is 8
## Returning the palette you asked for with that many colors