I am trying to apply the machine learning model Support Vector Machine
library(caret)
library(e1071)
library(rpart)
library(mlbench)
##using the Glass data from the mlbench package
data(Glass, package = "mlbench")
##creating a training and a testing set
inTrain <- createDataPartition(y=Glass$Type, p=0.75, list=FALSE)
training <- Glass[inTrain,]
testing <- Glass[-inTrain,]
Here I prepare training scheme. Doing a repeated cross-validation with 10 folds and 3 repeats
control <- trainControl(method="repeatedcv", number=10, repeats=3)
# train the model using support vector machines radial basis function and using the above control
model <- train(Type~., data=training, method="svmRadial", trControl=control, tuneLength=5)
# summarize the model
print(model)
## Support Vector Machines with Radial Basis Function Kernel
##
## 162 samples
## 9 predictor
## 6 classes: '1', '2', '3', '5', '6', '7'
##
## No pre-processing
## Resampling: Cross-Validated (10 fold, repeated 3 times)
## Summary of sample sizes: 147, 146, 147, 145, 147, 146, ...
## Resampling results across tuning parameters:
##
## C Accuracy Kappa Accuracy SD Kappa SD
## 0.25 0.6250136 0.4456794 0.08526855 0.1275448
## 0.50 0.6352315 0.4616255 0.07896820 0.1189824
## 1.00 0.6985321 0.5670292 0.09573832 0.1407493
## 2.00 0.7047522 0.5767124 0.09160754 0.1346595
## 4.00 0.7084259 0.5842125 0.07855055 0.1165467
##
## Tuning parameter 'sigma' was held constant at a value of 0.3044607
## Accuracy was used to select the optimal model using the largest value.
## The final values used for the model were sigma = 0.3044607 and C = 4.
I choose C=1 based on the cross-validation results above to train my svm model
svm.model <- svm(Type ~ ., data = training, cost = 1, gamma = 1)
##using the model to predict the testing data
svm.pred <- predict(svm.model, testing[,-10])
##compute the confusion matrix using the prediction and the true values'
confusionMatrix(svm.pred, testing[,10])
## Confusion Matrix and Statistics
##
## Reference
## Prediction 1 2 3 5 6 7
## 1 13 5 2 0 0 0
## 2 4 14 2 3 1 2
## 3 0 0 0 0 0 0
## 5 0 0 0 0 0 0
## 6 0 0 0 0 1 0
## 7 0 0 0 0 0 5
##
## Overall Statistics
##
## Accuracy : 0.6346
## 95% CI : (0.4896, 0.7638)
## No Information Rate : 0.3654
## P-Value [Acc > NIR] : 7.296e-05
##
## Kappa : 0.461
## Mcnemar's Test P-Value : NA
##
## Statistics by Class:
##
## Class: 1 Class: 2 Class: 3 Class: 5 Class: 6 Class: 7
## Sensitivity 0.7647 0.7368 0.00000 0.00000 0.50000 0.71429
## Specificity 0.8000 0.6364 1.00000 1.00000 1.00000 1.00000
## Pos Pred Value 0.6500 0.5385 NaN NaN 1.00000 1.00000
## Neg Pred Value 0.8750 0.8077 0.92308 0.94231 0.98039 0.95745
## Prevalence 0.3269 0.3654 0.07692 0.05769 0.03846 0.13462
## Detection Rate 0.2500 0.2692 0.00000 0.00000 0.01923 0.09615
## Detection Prevalence 0.3846 0.5000 0.00000 0.00000 0.01923 0.09615
## Balanced Accuracy 0.7824 0.6866 0.50000 0.50000 0.75000 0.85714