2.1) I downloaded the information, loaded it as “Data”, and organized it:
Data <- read.table(“/Users/mariajimenaolanov/Documents/R/credit_card_data-headers.txt”,header=TRUE) library(kernlab) Plot <- as.matrix(Data[,1:10]) Classifier <- as.factor(Data[,11])
With this I was able to excecute the ksvm function, with C=100 to find a good classifier:
ksvm(Plot,Classifier,type=“C-svc”,kernel=“vanilladot”,C=100,scaled=TRUE) model <- ksvm(Plot,Classifier,type=“C-svc”,kernel=“vanilladot”,C=100,scaled=TRUE) a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
a
A1 A2 A3 A8 A9 A10 A11
-0.0011608980 -0.0006366002 -0.0015209679 0.0032020638 1.0041338724 -0.0033773669 0.0002428616
A12 A14 A15
-0.0004747021 -0.0011931900 0.1064450527
a0 <- -model@b
a0
[1] 0.08155226
pred <- predict (model,Data[,1:10])
pred
[1] 1 1 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1
[52] 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[103] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[154] 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[205] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0
[256] 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0 1 0
[307] 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[358] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[409] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[460] 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[511] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1
[562] 1 1 1 0 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[613] 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Levels: 0 1
sum(pred == Data[,11]) / nrow(Data)
[1] 0.8639144
I then tried to run the the ksvn function with different values for C to find the best classifier, with the following results: 1) C=100 86.39144% 2) C=1000 86.23853% 3) C=1 86.39144%
After these trials I chose the first option offered in the excercise with a predicted value of 86.39144%. The equation of the classifier would be as follows: -0.0010065348A1 -0.0011729048A2 -0.0016261967A3 +0.0030064203A8 +1.0049405641A9 -0.0028259432A10 +0.0002600295A11 -0.0005349551A12 -0.001228375813 +0.1063633995A14 + 0.08158492