###PART 3
letterdata <- read.csv("C:/Users/Priya/Downloads/letterdata.csv")
str(letterdata)
## 'data.frame': 20000 obs. of 17 variables:
## $ letter: Factor w/ 26 levels "A","B","C","D",..: 20 9 4 14 7 19 2 1 10 13 ...
## $ xbox : int 2 5 4 7 2 4 4 1 2 11 ...
## $ ybox : int 8 12 11 11 1 11 2 1 2 15 ...
## $ width : int 3 3 6 6 3 5 5 3 4 13 ...
## $ height: int 5 7 8 6 1 8 4 2 4 9 ...
## $ onpix : int 1 2 6 3 1 3 4 1 2 7 ...
## $ xbar : int 8 10 10 5 8 8 8 8 10 13 ...
## $ ybar : int 13 5 6 9 6 8 7 2 6 2 ...
## $ x2bar : int 0 5 2 4 6 6 6 2 2 6 ...
## $ y2bar : int 6 4 6 6 6 9 6 2 6 2 ...
## $ xybar : int 6 13 10 4 6 5 7 8 12 12 ...
## $ x2ybar: int 10 3 3 4 5 6 6 2 4 1 ...
## $ xy2bar: int 8 9 7 10 9 6 6 8 8 9 ...
## $ xedge : int 0 2 3 6 1 0 2 1 1 8 ...
## $ xedgey: int 8 8 7 10 7 8 8 6 6 1 ...
## $ yedge : int 0 4 3 2 5 9 7 2 1 1 ...
## $ yedgex: int 8 10 9 8 10 7 10 7 7 8 ...
letters_train <- letterdata[1:18000, ]
letters_test <- letterdata[18001:20000, ]
library(kernlab)
## Warning: package 'kernlab' was built under R version 3.5.2
letter_classifier <- ksvm(letter ~., data= letters_train,kernel="vanilladot")
## Setting default kernel parameters
summary(letter_classifier)
## Length Class Mode
## 1 ksvm S4
letter_predictions <- predict(letter_classifier, letters_test)
(p<- table(letter_predictions,letters_test$letter))
##
## letter_predictions A B C D E F G H I J K L M N O P Q R
## A 73 0 0 0 0 0 0 0 0 1 0 0 0 0 3 0 4 0
## B 0 61 0 3 2 0 1 1 0 0 1 1 0 0 0 2 0 1
## C 0 0 64 0 2 0 4 2 1 0 1 2 0 0 1 0 0 0
## D 2 1 0 67 0 0 1 3 3 2 1 2 0 3 4 2 1 2
## E 0 0 1 0 64 1 1 0 0 0 2 2 0 0 0 0 2 0
## F 0 0 0 0 0 70 1 1 4 0 0 0 0 0 0 5 1 0
## G 1 1 2 1 3 2 68 1 0 0 0 1 0 0 0 0 4 1
## H 0 0 0 1 0 1 0 46 0 2 3 1 1 1 9 0 0 5
## I 0 0 0 0 0 0 0 0 65 3 0 0 0 0 0 0 0 0
## J 0 1 0 0 0 1 0 0 3 61 0 0 0 0 1 0 0 0
## K 0 1 4 0 0 0 0 5 0 0 56 0 0 2 0 0 0 4
## L 0 0 0 0 1 0 0 1 0 0 0 63 0 0 0 0 0 0
## M 0 0 1 0 0 0 1 0 0 0 0 0 70 2 0 0 0 0
## N 0 0 0 0 0 0 0 0 0 0 0 0 0 77 0 0 0 1
## O 0 0 1 1 0 0 0 1 0 1 0 0 0 0 49 1 2 0
## P 0 0 0 0 0 3 0 0 0 0 0 0 0 0 2 69 0 0
## Q 0 0 0 0 0 0 3 1 0 0 0 2 0 0 2 1 52 0
## R 0 4 0 0 1 0 0 3 0 0 3 0 0 0 1 0 0 64
## S 0 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 6 0
## T 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0
## U 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## V 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
## W 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## X 0 1 0 0 1 0 0 1 0 0 1 4 0 0 0 0 0 1
## Y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0
## Z 1 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0
##
## letter_predictions S T U V W X Y Z
## A 0 1 2 0 1 0 0 0
## B 3 0 0 0 0 0 0 0
## C 0 0 0 0 0 0 0 0
## D 0 0 0 0 0 0 1 0
## E 6 0 0 0 0 1 0 0
## F 2 0 0 1 0 0 2 0
## G 3 2 0 0 0 0 0 0
## H 0 3 0 2 0 0 1 0
## I 2 0 0 0 0 2 1 0
## J 1 0 0 0 0 1 0 4
## K 0 1 2 0 0 4 0 0
## L 0 0 0 0 0 0 0 0
## M 0 0 1 0 6 0 0 0
## N 0 0 1 0 2 0 0 0
## O 0 0 1 0 0 0 0 0
## P 0 0 0 0 0 0 1 0
## Q 1 0 0 0 0 0 0 0
## R 0 1 0 1 0 0 0 0
## S 47 1 0 0 0 1 0 6
## T 1 83 1 0 0 0 2 2
## U 0 0 83 0 0 0 0 0
## V 0 0 0 64 1 0 1 0
## W 0 0 0 3 59 0 0 0
## X 0 0 0 0 0 76 1 0
## Y 0 1 0 0 0 1 58 0
## Z 5 1 0 0 0 0 0 70
(accuracy <- sum(diag(p))/sum(p)*100)
## [1] 83.95
###from using kernel as “vanilladot” we get an accuracy of 83.95%. Let change the kernel to “rbfdot” and “polydot” to see change in accuracy of the models.
###Q4- We may be able to do better than this by changing the Kernels. Try Polynomial and RBF kernels to improve the result.
###ANSWER - Q4:
letter_classifier <- ksvm(letter ~ ., data = letters_train, kernel = "rbfdot")
summary(letter_classifier)
## Length Class Mode
## 1 ksvm S4
letter_predictions <- predict(letter_classifier, letters_test)
(p<- table(letter_predictions,letters_test$letter))
##
## letter_predictions A B C D E F G H I J K L M N O P Q R
## A 75 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0
## B 0 67 0 2 0 1 0 0 0 0 0 1 0 1 0 2 1 1
## C 0 0 72 0 3 0 0 0 0 0 0 1 0 0 0 0 0 0
## D 1 1 0 71 0 0 1 2 2 2 1 0 0 0 0 2 1 1
## E 0 0 0 0 70 2 0 0 0 1 0 2 0 0 0 0 0 0
## F 0 0 0 0 0 76 0 0 3 0 0 0 0 0 0 6 0 0
## G 0 0 1 0 3 0 76 1 0 0 0 0 0 0 0 0 0 0
## H 0 0 0 1 0 0 1 58 0 1 0 1 1 0 0 0 1 1
## I 0 0 0 0 0 0 0 0 69 1 0 0 0 0 0 0 0 0
## J 0 0 0 0 0 0 0 0 2 66 0 0 0 0 0 0 0 0
## K 0 0 0 0 0 0 0 3 0 0 62 0 0 1 0 0 0 2
## L 0 0 0 0 0 0 1 0 0 0 0 69 0 0 0 0 0 0
## M 0 0 0 0 0 0 1 0 0 0 0 0 71 1 0 0 0 0
## N 0 0 0 0 0 1 0 0 0 0 0 0 0 78 0 0 0 0
## O 0 0 1 0 0 0 0 0 0 1 0 0 0 2 67 1 2 0
## P 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 72 0 0
## Q 0 0 0 0 0 0 0 1 0 0 0 0 0 0 3 1 65 0
## R 0 1 0 0 0 0 1 1 0 0 4 0 0 2 1 0 0 74
## S 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0
## T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## U 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## W 0 0 1 0 0 0 0 0 0 0 0 0 1 0 2 0 0 0
## X 0 1 0 0 0 0 0 0 0 0 2 4 0 0 0 0 0 0
## Y 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
## Z 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
##
## letter_predictions S T U V W X Y Z
## A 0 1 0 0 0 0 0 0
## B 1 0 0 1 0 0 0 0
## C 0 0 0 0 0 0 0 0
## D 0 0 1 0 0 0 0 0
## E 0 0 0 0 0 0 0 0
## F 1 0 0 1 0 0 0 0
## G 0 0 0 0 0 0 0 0
## H 0 3 0 1 0 0 0 0
## I 0 0 0 0 0 2 0 0
## J 0 0 0 0 0 0 0 1
## K 0 0 0 0 0 0 0 0
## L 0 0 0 0 0 0 0 0
## M 0 0 0 0 2 0 0 0
## N 0 0 0 0 1 0 0 0
## O 0 0 0 0 0 0 0 0
## P 0 0 0 0 0 0 0 0
## Q 0 0 0 0 0 0 0 0
## R 0 1 0 0 0 0 0 0
## S 68 0 0 0 0 0 0 0
## T 0 88 0 0 0 0 1 0
## U 0 0 89 0 0 0 0 0
## V 0 0 0 68 0 0 1 0
## W 0 0 1 0 66 0 0 0
## X 0 0 0 0 0 84 1 0
## Y 0 1 0 0 0 0 65 0
## Z 1 0 0 0 0 0 0 81
(accuracy <- sum(diag(p))/sum(p)*100)
## [1] 93.35
###Q4 - Answer: The accuracy increases to 93.45% when using “rbfdot” as a kernel. Lets check the same for “polydot”.
letter_classifier <- ksvm(letter ~ ., data = letters_train, kernel = "polydot")
## Setting default kernel parameters
summary(letter_classifier)
## Length Class Mode
## 1 ksvm S4
letter_predictions <- predict(letter_classifier, letters_test)
(p<- table(letter_predictions,letters_test$letter))
##
## letter_predictions A B C D E F G H I J K L M N O P Q R
## A 73 0 0 0 0 0 0 0 0 1 0 0 0 0 3 0 4 0
## B 0 61 0 3 2 0 1 1 0 0 1 1 0 0 0 2 0 1
## C 0 0 64 0 2 0 4 2 1 0 1 2 0 0 1 0 0 0
## D 2 1 0 67 0 0 1 3 3 2 1 2 0 3 4 2 1 2
## E 0 0 1 0 64 1 1 0 0 0 2 2 0 0 0 0 2 0
## F 0 0 0 0 0 70 1 1 4 0 0 0 0 0 0 5 1 0
## G 1 1 2 1 3 2 68 1 0 0 0 1 0 0 0 0 4 1
## H 0 0 0 1 0 1 0 46 0 2 3 1 1 1 9 0 0 5
## I 0 0 0 0 0 0 0 0 65 2 0 0 0 0 0 0 0 0
## J 0 1 0 0 0 1 0 0 3 62 0 0 0 0 1 0 0 0
## K 0 1 4 0 0 0 0 5 0 0 56 0 0 2 0 0 0 4
## L 0 0 0 0 1 0 0 1 0 0 0 63 0 0 0 0 0 0
## M 0 0 1 0 0 0 1 0 0 0 0 0 70 2 0 0 0 0
## N 0 0 0 0 0 0 0 0 0 0 0 0 0 77 0 0 0 1
## O 0 0 1 1 0 0 0 1 0 1 0 0 0 0 49 1 2 0
## P 0 0 0 0 0 3 0 0 0 0 0 0 0 0 2 69 0 0
## Q 0 0 0 0 0 0 3 1 0 0 0 2 0 0 2 1 52 0
## R 0 4 0 0 1 0 0 3 0 0 3 0 0 0 1 0 0 64
## S 0 1 0 0 1 1 1 0 1 1 0 0 0 0 0 0 6 0
## T 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 0
## U 0 0 2 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0
## V 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
## W 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
## X 0 1 0 0 1 0 0 1 0 0 1 4 0 0 0 0 0 1
## Y 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0
## Z 1 0 0 0 2 0 0 0 0 2 0 0 0 0 0 0 0 0
##
## letter_predictions S T U V W X Y Z
## A 0 1 2 0 1 0 0 0
## B 3 0 0 0 0 0 0 0
## C 0 0 0 0 0 0 0 0
## D 0 0 0 0 0 0 1 0
## E 6 0 0 0 0 1 0 0
## F 2 0 0 1 0 0 2 0
## G 3 2 0 0 0 0 0 0
## H 0 3 0 2 0 0 1 0
## I 2 0 0 0 0 2 1 0
## J 1 0 0 0 0 1 0 4
## K 0 1 2 0 0 4 0 0
## L 0 0 0 0 0 0 0 0
## M 0 0 1 0 6 0 0 0
## N 0 0 1 0 2 0 0 0
## O 0 0 1 0 0 0 0 0
## P 0 0 0 0 0 0 1 0
## Q 1 0 0 0 0 0 0 0
## R 0 1 0 1 0 0 0 0
## S 47 1 0 0 0 1 0 6
## T 1 83 1 0 0 0 2 2
## U 0 0 83 0 0 0 0 0
## V 0 0 0 64 1 0 1 0
## W 0 0 0 3 59 0 0 0
## X 0 0 0 0 0 76 1 0
## Y 0 1 0 0 0 1 58 0
## Z 5 1 0 0 0 0 0 70
(accuracy <- sum(diag(p))/sum(p)*100)
## [1] 84
###Q4 - Answer: Using “polydot” as a kernel we get 84% accuracy which is similar to that of the first classifier model we created using “vanilladot”.