Preparando los datos

letters <- read.csv("letter-recognition.csv")
str(letters)

## 'data.frame':    20000 obs. of  17 variables:
##  $ lettr: Factor w/ 26 levels "A","B","C","D",..: 20 9 4 14 7 19 2 1 10 13 ...
##  $ x.box: int  2 5 4 7 2 4 4 1 2 11 ...
##  $ y.box: int  8 12 11 11 1 11 2 1 2 15 ...
##  $ width: int  3 3 6 6 3 5 5 3 4 13 ...
##  $ high : int  5 7 8 6 1 8 4 2 4 9 ...
##  $ onpix: int  1 2 6 3 1 3 4 1 2 7 ...
##  $ x.bar: int  8 10 10 5 8 8 8 8 10 13 ...
##  $ y.bar: int  13 5 6 9 6 8 7 2 6 2 ...
##  $ x2bar: int  0 5 2 4 6 6 6 2 2 6 ...
##  $ y2bar: int  6 4 6 6 6 9 6 2 6 2 ...
##  $ xybar: int  6 13 10 4 6 5 7 8 12 12 ...
##  $ x2ybr: int  10 3 3 4 5 6 6 2 4 1 ...
##  $ xy2br: int  8 9 7 10 9 6 6 8 8 9 ...
##  $ x.ege: int  0 2 3 6 1 0 2 1 1 8 ...
##  $ xegvy: int  8 8 7 10 7 8 8 6 6 1 ...
##  $ y.ege: int  0 4 3 2 5 9 7 2 1 1 ...
##  $ yegvx: int  8 10 9 8 10 7 10 7 7 8 ...

Un clasificador SVM requiere que todas las caracteristicas sean numericas. Afortunadamente, en este caso lo son.

Separamos los conjuntos de prueba y aprendizaje:

largo <- nrow(letters)
s <- sample(largo, largo*0.8)
letters_train <- letters[s, ]
letters_test <- letters[-s, ]

Construimos el modelo y lo probamos

library(kernlab)
modelo <- ksvm(lettr ~ ., data=letters_train, kernel="vanilladot")

##  Setting default kernel parameters

predicciones <- predict(modelo, letters_test)
table(predicciones, letters_test$lettr)

##             
## predicciones   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O
##            A 146   2   0   2   0   0   0   0   0   3   0   1   0   1   0
##            B   0 127   0   5   2   0   4   1   1   0   0   0   0   1   0
##            C   1   0 131   0   0   0   3   5   0   0   2   1   1   0   4
##            D   0   1   0 150   0   1   1  10   1   1   2   1   0   4   3
##            E   0   2   6   0 125   1   0   1   0   0   2   6   0   0   0
##            F   0   0   0   0   1 147   2   1   2   1   0   0   0   0   0
##            G   0   1   3   0   6   1 119   2   0   0   0   6   1   0   1
##            H   0   1   1   2   0   2   0  98   0   1   1   2   2   8  15
##            I   0   0   0   0   0   3   0   0 136   6   0   0   0   0   0
##            J   2   0   0   0   0   0   0   0   3 124   0   0   0   0   0
##            K   2   0   7   0   0   1   3   3   0   0 124   0   0   0   0
##            L   0   0   1   0   0   0   1   1   1   0   1 150   0   0   0
##            M   1   0   1   0   0   0   0   0   0   0   0   0 143   1   0
##            N   0   1   0   0   0   2   0   0   0   1   0   0   1 155   0
##            O   0   0   0   0   0   0   0   6   0   0   0   0   1   0 116
##            P   0   1   0   0   0   2   0   2   0   0   0   0   0   0   1
##            Q   0   0   0   0   3   0   8   1   0   0   1   1   0   0   2
##            R   0   5   0   3   0   0   1  11   0   0   6   0   2   1   2
##            S   0   0   0   0   3   1   2   0   4   1   0   0   0   0   0
##            T   0   0   0   1   1   4   0   0   0   0   0   0   0   0   0
##            U   0   0   1   1   0   0   0   1   0   0   0   0   2   0   0
##            V   0   0   0   0   0   1   5   0   0   0   0   0   0   1   0
##            W   0   0   0   0   0   0   1   0   0   0   1   0   0   0   5
##            X   0   1   0   0   0   0   1   1   1   2   8   0   0   0   1
##            Y   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0
##            Z   0   0   0   0   2   0   0   0   1   5   0   0   0   0   0
##             
## predicciones   P   Q   R   S   T   U   V   W   X   Y   Z
##            A   0   7   2   1   0   2   0   0   0   2   0
##            B   0   0   2   8   0   0   0   2   0   0   0
##            C   0   0   0   1   0   0   0   0   0   0   0
##            D   1   2   1   0   0   0   0   0   4   0   1
##            E   0   1   0   7   0   0   0   0   0   0   1
##            F  11   0   1   5   3   0   0   0   0   2   0
##            G   2   3   2   3   3   1   2   0   1   0   0
##            H   0   1   3   0   0   2   1   0   0   1   0
##            I   0   0   0   3   1   0   0   0   2   0   0
##            J   0   0   0   0   0   0   0   0   1   0   7
##            K   0   1   5   0   0   2   0   0   8   1   0
##            L   0   0   0   3   0   0   0   0   1   0   0
##            M   0   0   0   0   0   4   0   5   0   0   0
##            N   0   0   1   0   0   0   0   1   0   0   0
##            O   0   3   0   1   0   0   0   0   0   0   0
##            P 147   1   0   1   2   0   1   0   0   1   0
##            Q   0 116   0   3   0   0   0   0   0   2   1
##            R   0   2 139   3   1   0   1   0   0   0   0
##            S   0   8   0 101   3   0   0   0   1   1  12
##            T   0   0   0   2 137   2   0   0   1   3   1
##            U   0   0   0   0   0 160   0   0   0   0   0
##            V   0   0   0   0   0   0 149   0   0   9   0
##            W   0   1   0   0   0   1   2 135   0   0   0
##            X   0   0   0   1   2   0   0   0 126   0   0
##            Y   5   0   0   1   2   0   3   0   0 138   0
##            Z   0   1   0  10   2   0   0   0   0   0 117

Podemos determinar numericamente cuan buena fue nuestra prediccion:

correctos <- predicciones == letters_test$lettr
table(correctos)

## correctos
## FALSE  TRUE 
##   544  3456

prop.table(table(correctos))

## correctos
## FALSE  TRUE 
## 0.136 0.864

Preparando los datos

Mejorando el modelo