Question 2.1

Describe a situation or problem from your job, everyday life, current events, etc., for which a classification model would be appropriate. List some (up to 5) predictors that you might use.

In my current job, the Information Technologies department maintains a list of requested Projects in wait state. Currently it is a table in an Excel spreadsheet and it has columns for Project Name, Description, Status, Start and Due Date, Sponsor, etc. In addition, we have several columns that could be used as predictors if we give these a numeric weight, as Category (Survival, Growth, Maintenance), Timing (Near term, Short term, Long term), Risk, Potential ROI, Cost and others. I think it will be a neat exercise to use a classification method to decide which projects should be started and which ones to keep in waiting state and giving us a starting order. We may factor in their precedence or dependency too. If we do so, we will transform the spreadsheet from a Project List into a Project Portfolio.

Question 2.1

The files credit_card_data.txt (without headers) and credit_card_data-headers.txt (with headers) contain a dataset with 654 data points, 6 continuous and 4 binary predictor variables. It has anonymized credit card applications with a binary response variable (last column) indicating if the application was positive or negative. The dataset is the “Credit Approval Data Set” from the UCI Machine Learning Repository (https://archive.ics.uci.edu/ml/datasets/Credit+Approval) without the categorical variables and without data points that have missing values.

  1. Using the support vector machine function ksvm contained in the R package kernlab, find a good classifier for this data. Show the equation of your classifier, and how well it classifies the data points in the full data set. (Don’t worry about test/validation data yet; we’ll cover that topic soon.) Notes on ksvm:
  • You can use scaled=TRUE to get ksvm to scale the data as part of calculating a classifier.
  • The term λ we used in the SVM lesson to trade off the two components of correctness and margin is called C in ksvm. One of the challenges of this homework is to find a value of C that works well; for many values of C, almost all predictions will be “yes” or almost all predictions will be “no”.
  • ksvm does not directly return the coefficients a0 and a1…am. Instead, you need to do the last step of the calculation yourself.

filename = "E:/mzambrano/Documents/OneDrive/Personaldocs/Backtoschool/GeorgiaTech/MicroMasters/GTx_ISYE6501x_IntroductionToAnalyticsModeling/02 - HW1/Data2.2/credit_card_data-headers.txt"
# Get the data from file
ccdata <- as.matrix(read.delim(filename, header=TRUE))

# load library kernlab
library(kernlab)
## Warning: package 'kernlab' was built under R version 3.4.4
skernel = "vanilladot"
cvalue <- 100
# this oly works if ccdata is a matrix
# we use the first 10 columns as data and the 11th column as response vector
# as per the question we don't worry about splitting into test/validation, use all the data
model <- ksvm(ccdata[,1:10],ccdata[,11],type="C-svc",kernel=skernel,C=cvalue,scaled=TRUE)
##  Setting default kernel parameters
# Calculate a1...am
a <- colSums(model@xmatrix[[1]] * model@coef[[1]])

# Calculate a0
a0 <- -model@b

# model predictions (model vs data)
pred <- predict(model,ccdata[,1:10])

# see what fraction of the model's predictions match the actual classification
matching <- sum(pred == ccdata[,11]) / nrow(ccdata)
a
##            A1            A2            A3            A8            A9 
## -0.0010065348 -0.0011729048 -0.0016261967  0.0030064203  1.0049405641 
##           A10           A11           A12           A14           A15 
## -0.0028259432  0.0002600295 -0.0005349551 -0.0012283758  0.1063633995
a0
## [1] 0.08158492
matching
## [1] 0.8639144

The equation is the equation of a soft classifier soft classifier

What is left is evaluating with values of lambda (C) in various degrees of magnitude

# Get the data from file
filename = "E:/mzambrano/Documents/OneDrive/Personaldocs/Backtoschool/GeorgiaTech/MicroMasters/GTx_ISYE6501x_IntroductionToAnalyticsModeling/02 - HW1/Data2.2/credit_card_data-headers.txt"
ccdata <- as.matrix(read.delim(filename, header=TRUE))

# load library kernlab
library(kernlab)

skernel = "vanilladot"
# vector containing c values (lambdas)
cvalues <- c(.00001,.0001,.001,.01,.1, 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,100000000)
# vector containing labels for showin in plot
cxlabels <- c("1e-05","1e-04","1e-03","1e-02","1e-01","1e+00","1e+01","1e+02","1e+03","1e+04","1e+05","1e+06","1e+07","1e+08")
# Data frame to store results
results <- data.frame("kernelname" = character(), "value of C" = integer(), matching = numeric(), stringsAsFactors = FALSE)


for (cvalue in cvalues)
{  
  # Execute ksvm function with 14 values of lambda, from 10e-05 to 10e+08
  model <- ksvm(ccdata[,1:10],ccdata[,11],type="C-svc",kernel=skernel,C=cvalue,scaled=TRUE)
  # model predictions (model vs data)
  pred <- predict(model,ccdata[,1:10])
  
  # see what fraction of the model's predictions match the actual classification
  matching <- sum(pred == ccdata[,11]) / nrow(ccdata)
  
  # add to results dataframe
  results[nrow(results) + 1,] <- c(skernel,cvalue,matching)
}    
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters
results
##    kernelname value.of.C          matching
## 1  vanilladot      1e-05 0.547400611620795
## 2  vanilladot      1e-04 0.547400611620795
## 3  vanilladot      0.001 0.837920489296636
## 4  vanilladot       0.01 0.863914373088685
## 5  vanilladot        0.1 0.863914373088685
## 6  vanilladot          1 0.863914373088685
## 7  vanilladot         10 0.863914373088685
## 8  vanilladot        100 0.863914373088685
## 9  vanilladot       1000 0.862385321100917
## 10 vanilladot      10000 0.862385321100917
## 11 vanilladot      1e+05 0.863914373088685
## 12 vanilladot      1e+06 0.625382262996942
## 13 vanilladot      1e+07 0.545871559633027
## 14 vanilladot      1e+08 0.663608562691132
# let's visualize the results
plot(results[results[,1]==skernel,3], type = "o", col = "red", xlab = "Values of C (lambda)", ylab = "match %",
     main = "Kernels: matching % vs c values", axes = FALSE)


text(1:14, .57, cxlabels,cex=0.6)
text(1:14, .99, cxlabels,cex=0.6)
axis(side=1, at=c(1:14), cex.axis = 0.6, labels=cxlabels)
axis(side=2, at=seq(0, 1.1, by=0.025), cex.axis=0.6)
legend("center",legend=skernel, col="red", lty=1:2, cex=0.8)
grid()

We can see that the vanilladot kernel achieves a maximum matching vs the actual classsification of 86.39% and that those values are maintened when lambda is 10e-02 through 10e+05, results out of that range diminish precision rapidly.

  1. You are welcome, but not required, to try other (nonlinear) kernels as well; we’re not covering them in this course, but they can sometimes be useful and might provide better predictions than vanilladot.

This is the most generalized example, we do try four kernels and 14 different values of lambda. The results are then plotted for easy indentification.

# Get the data from file
filename = "E:/mzambrano/Documents/OneDrive/Personaldocs/Backtoschool/GeorgiaTech/MicroMasters/GTx_ISYE6501x_IntroductionToAnalyticsModeling/02 - HW1/Data2.2/credit_card_data-headers.txt"
ccdata <- as.matrix(read.delim(filename, header=TRUE))

# load library kernlab
library(kernlab)

# Auxiliary lists and variables
#---------------------------------------------------------------------------
# vector containing kernel name 
kernels <- c("rbfdot","splinedot","vanilladot","polydot")
# vector with colors to apply in plot
colors <- c("red","blue","green","black")
# vector containing c values (lambdas)
cvalues <- c(.00001,.0001,.001,.01,.1, 1, 10, 100, 1000, 10000, 100000, 1000000, 10000000,100000000)
# vector containing labels for showin in plot
cxlabels <- c("1e-05","1e-04","1e-03","1e-02","1e-01","1e+00","1e+01","1e+02","1e+03","1e+04","1e+05","1e+06","1e+07","1e+08")
# Data frame to store results
results <- data.frame("kernelname" = character(), "value of C" = integer(), matching = numeric(), stringsAsFactors = FALSE)
n <- 0
#---------------------------------------------------------------------------

# main loop, get each kernel in vector
for (skernel in kernels)
{
  # counter
  n <- n + 1
  # loop lambda values
  for (cvalue in cvalues)
  {  
    # this oly works if ccdata is a matrix
    model <- ksvm(ccdata[,1:10],ccdata[,11],type="C-svc",kernel=skernel,C=cvalue,scaled=TRUE)
    # if ccdata is a dataframe, use as.matrix and as.factor
    # model <- ksvm(as.matrix(ccdata[,1:10]),as.factor(ccdata[,11]),type="C-svc",kernel="vanilladot",C=1000,scaled=TRUE)
    
    # Calculate a1...am
    a <- colSums(model@xmatrix[[1]] * model@coef[[1]])
    a
    
    # Calculate a0
    a0 <- -model@b
    a0
    
    # model predictions (model vs data)
    pred <- predict(model,ccdata[,1:10])
    pred
    
    # see what fraction of the model's predictions match the actual classification
    matching <- sum(pred == ccdata[,11]) / nrow(ccdata)
    matching
    
    # add to results
    results[nrow(results) + 1,] <- c(skernel,cvalue,matching)
    #print(skernel)
  }
  
  if (n == 1)
  {
    plot(results[results[,1]==skernel,3], type = "o", col = colors[n], xlab = "Values of C", ylab = "match %",
       main = "Kernels: matching % vs c values", axes = FALSE)
    # text(1:14, .57, cxlabels,cex=0.6)
    text(1:14, .99, cxlabels,cex=0.6)
    axis(side=1, at=c(1:14), cex.axis = 0.6, labels=cxlabels)
    axis(side=2, at=seq(0, 1.1, by=0.025), cex.axis=0.6)
  }
  else
  {
    lines(results[results[,1]==skernel,3], type = "o", col = colors[n])
  }
}
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters  
##  Setting default kernel parameters
legend("center",legend=kernels, col=colors, lty=1:2, cex=0.8)
grid()

results
##    kernelname value.of.C          matching
## 1      rbfdot      1e-05 0.547400611620795
## 2      rbfdot      1e-04 0.547400611620795
## 3      rbfdot      0.001 0.547400611620795
## 4      rbfdot       0.01 0.567278287461774
## 5      rbfdot        0.1 0.859327217125382
## 6      rbfdot          1 0.871559633027523
## 7      rbfdot         10  0.90519877675841
## 8      rbfdot        100 0.954128440366973
## 9      rbfdot       1000 0.984709480122324
## 10     rbfdot      10000 0.995412844036697
## 11     rbfdot      1e+05 0.996941896024465
## 12     rbfdot      1e+06 0.998470948012232
## 13     rbfdot      1e+07                 1
## 14     rbfdot      1e+08                 1
## 15  splinedot      1e-05 0.577981651376147
## 16  splinedot      1e-04 0.623853211009174
## 17  splinedot      0.001 0.782874617737003
## 18  splinedot       0.01  0.81039755351682
## 19  splinedot        0.1 0.944954128440367
## 20  splinedot          1 0.966360856269113
## 21  splinedot         10 0.978593272171254
## 22  splinedot        100 0.978593272171254
## 23  splinedot       1000 0.978593272171254
## 24  splinedot      10000 0.978593272171254
## 25  splinedot      1e+05 0.978593272171254
## 26  splinedot      1e+06 0.943425076452599
## 27  splinedot      1e+07 0.877675840978593
## 28  splinedot      1e+08 0.865443425076453
## 29 vanilladot      1e-05 0.547400611620795
## 30 vanilladot      1e-04 0.547400611620795
## 31 vanilladot      0.001 0.837920489296636
## 32 vanilladot       0.01 0.863914373088685
## 33 vanilladot        0.1 0.863914373088685
## 34 vanilladot          1 0.863914373088685
## 35 vanilladot         10 0.863914373088685
## 36 vanilladot        100 0.863914373088685
## 37 vanilladot       1000 0.862385321100917
## 38 vanilladot      10000 0.862385321100917
## 39 vanilladot      1e+05 0.863914373088685
## 40 vanilladot      1e+06 0.625382262996942
## 41 vanilladot      1e+07 0.545871559633027
## 42 vanilladot      1e+08 0.663608562691132
## 43    polydot      1e-05 0.547400611620795
## 44    polydot      1e-04 0.547400611620795
## 45    polydot      0.001 0.837920489296636
## 46    polydot       0.01 0.863914373088685
## 47    polydot        0.1 0.863914373088685
## 48    polydot          1 0.863914373088685
## 49    polydot         10 0.863914373088685
## 50    polydot        100 0.863914373088685
## 51    polydot       1000 0.862385321100917
## 52    polydot      10000 0.862385321100917
## 53    polydot      1e+05 0.862385321100917
## 54    polydot      1e+06 0.331804281345566
## 55    polydot      1e+07 0.767584097859327
## 56    polydot      1e+08  0.67737003058104

From these results we can see that rbfdot and splinedot achieve high levels of matching with the actual classification (close to 100%). These also have better performance that vanilladot. rbfdot maintains 99%+ from 10e+4 to 10e+6 splinedot maintains 97%+ for a bigger interval, 10e01 to 10e10

  1. Using the k-nearest-neighbors classification function kknn contained in the R kknn package, suggest a good value of k, and show how well it classifies that data points in the full data set. Don’t forget to scale the data (scale=TRUE in kknn).

First try:

# Read the data from file
filename = "E:/mzambrano/Documents/OneDrive/Personaldocs/Backtoschool/GeorgiaTech/MicroMasters/GTx_ISYE6501x_IntroductionToAnalyticsModeling/02 - HW1/Data2.2/credit_card_data-headers.txt"
ccdata <- read.delim(filename, header=TRUE)
# Load library
library(kknn)
## Warning: package 'kknn' was built under R version 3.4.4
# we take a sample of about a 1/3 for learning and 2/3 for testing
numSample <- sample(1:654, 210)
learningData <- ccdata[numSample, ]
testingData <- ccdata[-numSample, ]

# Train model with different values of K
model <- train.kknn(R1 ~ ., data = learningData, kmax = 9, scale=TRUE)
model
## 
## Call:
## train.kknn(formula = R1 ~ ., data = learningData, kmax = 9, scale = TRUE)
## 
## Type of response variable: continuous
## minimal mean absolute error: 0.2422777
## Minimal mean squared error: 0.1373601
## Best kernel: optimal
## Best k: 9
prediction <- predict(model, testingData[, -11])
prediction
##   [1] 0.317254449 1.000000000 0.797797042 0.500065424 1.000000000
##   [6] 0.559957878 0.667669619 0.202202958 0.510224074 0.544691990
##  [11] 1.000000000 1.000000000 1.000000000 0.691330008 0.797797042
##  [16] 0.957893489 1.000000000 1.000000000 0.797797042 1.000000000
##  [21] 0.425054070 1.000000000 1.000000000 0.604838719 1.000000000
##  [26] 1.000000000 0.895226705 0.552319283 1.000000000 1.000000000
##  [31] 0.552319283 0.822281725 1.000000000 1.000000000 1.000000000
##  [36] 0.775082515 0.086491289 0.354748691 0.313954670 0.564667734
##  [41] 0.582584525 1.000000000 1.000000000 1.000000000 0.604838719
##  [46] 0.691330008 1.000000000 0.905870116 0.720177753 0.661560647
##  [51] 0.847454267 0.201542780 0.720177753 0.510212772 0.657510968
##  [56] 0.455308010 0.861000577 0.350127626 0.393467543 0.835203403
##  [61] 0.445029814 0.332617924 0.465839295 0.447557290 1.000000000
##  [66] 0.473466588 0.671057278 0.842841998 0.755690531 0.913508711
##  [71] 0.992361405 0.661560647 0.447557290 0.957893489 0.992361405
##  [76] 0.822281725 0.797797042 0.582584525 1.000000000 0.884948509
##  [81] 0.937333216 0.797797042 0.628663224 0.682745551 0.237839164
##  [86] 0.937333216 0.481488455 1.000000000 0.850841926 0.929694621
##  [91] 0.461215724 0.861000577 0.937333216 0.848775553 0.884948509
##  [96] 1.000000000 0.696229820 0.905870116 0.884948509 1.000000000
## [101] 0.976052067 0.957893489 1.000000000 1.000000000 0.992361405
## [106] 0.691330008 1.000000000 1.000000000 0.427693671 0.181105934
## [111] 0.797797042 0.447680717 0.588492239 0.530199818 0.937333216
## [116] 1.000000000 0.797797042 1.000000000 0.683978956 1.000000000
## [121] 0.649223498 0.992361405 0.616691107 0.799030447 0.604838719
## [126] 0.691330008 1.000000000 0.937333216 1.000000000 0.933945557
## [131] 1.000000000 0.385013933 0.937333216 0.466412523 1.000000000
## [136] 0.968413472 0.937333216 1.000000000 1.000000000 0.786108769
## [141] 1.000000000 0.427120444 0.660031024 0.957893489 0.937333216
## [146] 0.573252191 0.177718275 0.297862465 0.402512333 0.517851367
## [151] 0.950254894 0.667382076 0.848775553 0.735790435 1.000000000
## [156] 0.437267792 0.824827620 1.000000000 0.597364239 0.735790435
## [161] 0.439381944 1.000000000 0.691330008 0.853361982 1.000000000
## [166] 0.691330008 0.000000000 0.007638595 0.115051491 0.202202958
## [171] 0.062666784 0.226150891 0.110439222 0.201542780 0.115051491
## [176] 0.115051491 0.151224447 0.453614272 0.086491289 0.086491289
## [181] 0.042106511 0.115051491 0.202202958 0.440042122 0.042106511
## [186] 0.157158002 0.062666784 0.042106511 0.000000000 0.122690086
## [191] 0.000000000 0.070305379 0.000000000 0.251411313 0.597076696
## [196] 0.499934576 0.000000000 0.000000000 0.115051491 0.308669992
## [201] 0.000000000 0.115051491 0.000000000 0.151224447 0.122690086
## [206] 0.202202958 0.086491289 0.598893862 1.000000000 0.062666784
## [211] 0.062666784 0.000000000 0.070305379 0.000000000 0.000000000
## [216] 0.007638595 0.007638595 0.062666784 0.000000000 0.042106511
## [221] 0.000000000 0.115051491 0.094129884 0.371336776 0.000000000
## [226] 0.151224447 0.371336776 0.913508711 0.000000000 0.000000000
## [231] 0.062666784 0.086491289 0.000000000 0.000000000 0.115051491
## [236] 0.062666784 0.042106511 0.062666784 0.151224447 0.086491289
## [241] 0.042106511 0.042106511 0.000000000 0.177718275 0.000000000
## [246] 0.000000000 0.308669992 0.151224447 0.007638595 0.202202958
## [251] 0.226150891 0.110439222 0.007638595 0.007638595 0.062666784
## [256] 0.308669992 0.151224447 0.000000000 0.128597800 0.423721483
## [261] 0.115051491 0.094129884 0.000000000 0.000000000 0.086491289
## [266] 0.000000000 0.086491289 0.122690086 0.308669992 0.243772719
## [271] 0.000000000 0.000000000 0.122690086 0.115051491 0.000000000
## [276] 0.177718275 0.086491289 0.308669992 0.000000000 0.115051491
## [281] 0.000000000 0.201542780 0.000000000 0.266275938 0.151224447
## [286] 0.000000000 0.151224447 0.158863042 0.000000000 0.000000000
## [291] 0.104773295 0.000000000 0.202202958 0.086491289 0.094129884
## [296] 0.000000000 0.049745106 0.151224447 0.000000000 0.000000000
## [301] 0.115051491 0.086491289 0.000000000 0.279822247 0.308669992
## [306] 0.115051491 0.202202958 0.042106511 0.086491289 0.000000000
## [311] 0.000000000 0.000000000 0.712415730 0.669075814 0.062666784
## [316] 0.000000000 0.790818625 0.465179117 0.419109214 0.625924441
## [321] 0.576278517 1.000000000 0.808735416 0.735130257 0.555093613
## [326] 0.992361405 0.871402200 0.573539734 0.403745739 0.992361405
## [331] 1.000000000 0.691330008 0.243772719 0.583905810 0.371049233
## [336] 0.352890655 1.000000000 0.881922183 0.300382521 0.264869743
## [341] 0.808735416 0.371049233 1.000000000 1.000000000 0.313954670
## [346] 0.215212517 0.383308893 1.000000000 0.366999555 0.992361405
## [351] 0.604838719 1.000000000 1.000000000 1.000000000 0.937333216
## [356] 0.933945557 0.604838719 0.797797042 0.518224002 0.797797042
## [361] 0.494026862 0.822281725 0.296456270 0.957893489 0.264869743
## [366] 1.000000000 1.000000000 0.937333216 0.526533412 1.000000000
## [371] 0.913508711 0.447020539 0.871402200 0.481488455 1.000000000
## [376] 1.000000000 1.000000000 0.913508711 0.086491289 0.425054070
## [381] 0.431360078 0.937333216 1.000000000 0.926306962 0.000000000
## [386] 0.062666784 0.086491289 0.007638595 0.151224447 0.042106511
## [391] 0.000000000 0.086491289 0.115051491 0.062666784 0.062666784
## [396] 0.128597800 0.151224447 0.151224447 0.308669992 0.000000000
## [401] 0.023947933 0.000000000 0.177718275 0.000000000 0.086614717
## [406] 0.000000000 0.000000000 0.023947933 0.000000000 0.000000000
## [411] 0.062666784 0.023947933 0.303770180 0.151224447 0.062666784
## [416] 0.311408775 0.175172380 0.000000000 0.115051491 0.115051491
## [421] 0.104773295 0.115051491 0.007638595 0.000000000 0.023947933
## [426] 0.000000000 0.000000000 0.042106511 0.007638595 0.317254449
## [431] 0.000000000 0.042106511 0.086491289 0.023947933 0.703667157
## [436] 0.202202958 0.086491289 0.062666784 0.062666784 0.042106511
## [441] 0.332617924 0.000000000 0.122690086 0.062666784
CM <- table(testingData[, 11], prediction)
CM
##    prediction
##      0 0.00763859493855109 0.023947932599854 0.0421065107340216
##   0 57                   9                 5                 12
##   1  1                   0                 0                  0
##    prediction
##     0.0497451056725726 0.0626667843170186 0.0703053792555696
##   0                  1                 16                  1
##   1                  0                  2                  1
##    prediction
##     0.0864912894487642 0.0866147169168726 0.0941298843873153
##   0                 13                  1                  3
##   1                  4                  0                  0
##    prediction
##     0.10477329505104 0.110439222048618 0.115051490895509 0.12269008583406
##   0                2                 2                17                5
##   1                0                 0                 0                0
##    prediction
##     0.128597800182786 0.151224447037734 0.157158001629531
##   0                 2                13                 1
##   1                 0                 0                 0
##    prediction
##     0.158863041976285 0.175172379637588 0.177718275212528
##   0                 1                 1                 3
##   1                 0                 0                 1
##    prediction
##     0.181105934229385 0.201542780344273 0.202202958349245
##   0                 0                 3                 7
##   1                 1                 0                 1
##    prediction
##     0.215212517099658 0.226150890949099 0.237839163954606
##   0                 1                 2                 0
##   1                 0                 0                 1
##    prediction
##     0.243772718546403 0.251411313484954 0.264869742666263
##   0                 2                 1                 1
##   1                 0                 0                 1
##    prediction
##     0.266275937933243 0.27982224722052 0.296456270204668 0.297862465471648
##   0                 1                1                 0                 0
##   1                 0                0                 1                 1
##    prediction
##     0.300382520803517 0.303770179820374 0.308669991679303
##   0                 1                 1                 7
##   1                 0                 0                 0
##    prediction
##     0.311408774758925 0.313954670333864 0.317254449244754
##   0                 1                 1                 1
##   1                 0                 1                 1
##    prediction
##     0.332617924279157 0.350127626476089 0.352890654850116
##   0                 2                 1                 1
##   1                 0                 0                 0
##    prediction
##     0.354748691131885 0.366999554917326 0.371049232984283
##   0                 0                 1                 2
##   1                 1                 0                 0
##    prediction
##     0.371336775996321 0.383308892578629 0.385013932925384
##   0                 2                 1                 0
##   1                 0                 0                 1
##    prediction
##     0.393467542849049 0.40251233305458 0.403745738693518 0.419109213727921
##   0                 1                0                 0                 0
##   1                 0                1                 1                 1
##    prediction
##     0.423721482574812 0.425054070387454 0.427120443659405
##   0                 1                 0                 0
##   1                 0                 2                 1
##    prediction
##     0.427693671293372 0.431360077513363 0.437267791862089 0.43938194429888
##   0                 0                 0                 0                0
##   1                 1                 1                 1                1
##    prediction
##     0.440042122303851 0.445029814268748 0.447020539237431
##   0                 1                 1                 0
##   1                 0                 0                 1
##    prediction
##     0.447557289774294 0.447680717242402 0.453614271834199
##   0                 2                 0                 0
##   1                 0                 1                 1
##    prediction
##     0.455308010113217 0.461215724461943 0.465179117371598 0.46583929537657
##   0                 1                 0                 0                1
##   1                 0                 1                 1                0
##    prediction
##     0.466412523010537 0.473466588247385 0.481488455032901
##   0                 0                 1                 0
##   1                 1                 0                 2
##    prediction
##     0.494026861830381 0.499934576179107 0.500065423820893
##   0                 0                 1                 0
##   1                 1                 0                 1
##    prediction
##     0.510212772023576 0.510224074091313 0.517851366962127 0.51822400195506
##   0                 1                 0                 0                0
##   1                 0                 1                 1                1
##    prediction
##     0.526533411752615 0.530199817972606 0.544691989886783
##   0                 0                 0                 0
##   1                 1                 1                 1
##    prediction
##     0.552319282757598 0.55509361319936 0.559957877696149 0.564667733768077
##   0                 0                0                 0                 0
##   1                 2                1                 1                 1
##    prediction
##     0.573252191333528 0.573539734345566 0.576278517425188
##   0                 0                 0                 0
##   1                 1                 1                 1
##    prediction
##     0.582584524551097 0.583905810296003 0.588492238899822
##   0                 0                 1                 0
##   1                 2                 0                 1
##    prediction
##     0.597076696465274 0.597364239477312 0.5988938622124 0.604838718871933
##   0                 1                 0               1                 0
##   1                 0                 1               0                 5
##    prediction
##     0.616691107421371 0.625924440924057 0.628663224003679
##   0                 0                 0                 0
##   1                 1                 1                 1
##    prediction
##     0.649223497586676 0.657510968462462 0.66003102379433 0.661560646529418
##   0                 0                 1                0                 2
##   1                 1                 0                1                 0
##    prediction
##     0.667382075720843 0.667669618732881 0.669075813999861
##   0                 0                 0                 1
##   1                 1                 1                 0
##    prediction
##     0.671057277749738 0.682745550755246 0.683978956394184
##   0                 1                 0                 0
##   1                 0                 1                 1
##    prediction
##     0.691330008320697 0.696229820179626 0.70366715726344 0.712415730372821
##   0                 0                 0                1                 1
##   1                 7                 1                0                 0
##    prediction
##     0.72017775277948 0.735130257333737 0.735790435338708 0.755690530916734
##   0                2                 0                 0                 1
##   1                0                 1                 2                 0
##    prediction
##     0.775082514689839 0.786108768645247 0.790818624717176
##   0                 0                 0                 0
##   1                 1                 1                 1
##    prediction
##     0.797797041650755 0.799030447289693 0.808735415500196
##   0                 0                 0                 1
##   1                 9                 1                 1
##    prediction
##     0.822281724787472 0.824827620362412 0.835203403431918
##   0                 1                 0                 1
##   1                 2                 1                 0
##    prediction
##     0.842841998370469 0.84745426721736 0.848775552962266 0.850841926234217
##   0                 1                1                 0                 0
##   1                 0                0                 2                 1
##    prediction
##     0.853361981566086 0.861000576504637 0.871402199817214
##   0                 0                 1                 0
##   1                 1                 1                 2
##    prediction
##     0.881922183012831 0.884948509104491 0.89522670494896 0.905870115612685
##   0                 1                 0                0                 0
##   1                 0                 3                1                 2
##    prediction
##     0.913508710551236 0.926306961727573 0.92969462074443 0.933945556666124
##   0                 2                 0                0                 0
##   1                 2                 1                1                 2
##    prediction
##     0.937333215682981 0.950254894327427 0.957893489265979
##   0                 0                 0                 1
##   1                11                 1                 4
##    prediction
##     0.968413472461595 0.976052067400146 0.992361405061449  1
##   0                 0                 0                 3  5
##   1                 1                 1                 4 56
# calculate accuracy
accuracy <- (sum(diag(CM)))/sum(CM)
accuracy
## [1] 0.1283784
# Plot model

plot(model)

Second try varying the value of kmax

# Read the data from file
filename = "E:/mzambrano/Documents/OneDrive/Personaldocs/Backtoschool/GeorgiaTech/MicroMasters/GTx_ISYE6501x_IntroductionToAnalyticsModeling/02 - HW1/Data2.2/credit_card_data-headers.txt"
ccdata <- read.delim(filename, header=TRUE)
# Load library
library(kknn)

# some insight gotten from 
# https://stackoverflow.com/questions/57649227/how-to-predict-in-kknn-function-librarykknn
# https://stackoverflow.com/questions/57649227/how-to-predict-in-kknn-function-librarykknn

# we take a sample of about a 1/3 for learning and 2/3 for testing
numSample <- sample(1:654, 210)
learningData <- ccdata[numSample, ]
testingData <- ccdata[-numSample, ]

# Auxiliary lists and variables
#---------------------------------------------------------------------------
# vector containing number of neighbords considered (4 to 14)
numnneighbors <- seq(4,30)
# vector with colors to apply in plot, let's get some random colors
#colsample <- sample(1:657,30)
# colors <- colors()[colsample]
# Data frame to store results
results <- data.frame("best kernel" = character(), "K passed" = integer(),"best K" = integer(), accuracy = numeric(), stringsAsFactors = FALSE)
#---------------------------------------------------------------------------
n <- 0

# main loop, get each kernel in vector
for (kpar in numnneighbors)
{
  n <- n + 1
  # Train model with different values of K
  model <- train.kknn(as.factor(R1) ~ ., data = learningData, kmax = kpar, scale=TRUE)
  
  prediction <- predict(model, testingData[, -11])
  CM <- table(testingData[, 11], prediction)
  # calculate accuracy
  accuracy <- (sum(diag(CM)))/sum(CM)

  results[nrow(results) + 1,] <- c(kpar,model[["best.parameters"]][["kernel"]],model[["best.parameters"]][["k"]],accuracy)
  
}

plot(results[,3],results[,4], type='o',ylab = "Accuracy", xlab = "Best K",main = "Accuracy vs Best K",col = "red")

results
##    best.kernel K.passed best.K          accuracy
## 1            4  optimal      1 0.813063063063063
## 2            5  optimal      5 0.835585585585586
## 3            6  optimal      5 0.835585585585586
## 4            7  optimal      5 0.835585585585586
## 5            8  optimal      5 0.835585585585586
## 6            9  optimal      5 0.835585585585586
## 7           10  optimal      5 0.835585585585586
## 8           11  optimal      5 0.835585585585586
## 9           12  optimal      5 0.835585585585586
## 10          13  optimal      5 0.835585585585586
## 11          14  optimal      5 0.835585585585586
## 12          15  optimal      5 0.835585585585586
## 13          16  optimal      5 0.835585585585586
## 14          17  optimal      5 0.835585585585586
## 15          18  optimal      5 0.835585585585586
## 16          19  optimal      5 0.835585585585586
## 17          20  optimal      5 0.835585585585586
## 18          21  optimal      5 0.835585585585586
## 19          22  optimal      5 0.835585585585586
## 20          23  optimal      5 0.835585585585586
## 21          24  optimal      5 0.835585585585586
## 22          25  optimal      5 0.835585585585586
## 23          26  optimal      5 0.835585585585586
## 24          27  optimal      5 0.835585585585586
## 25          28  optimal      5 0.835585585585586
## 26          29  optimal      5 0.835585585585586
## 27          30  optimal      5 0.835585585585586

And from the results we can easily determine the optimal value of K, in various runs it varies from 9 to 13 with an accuracy of 82% and better. BUT results do vary wildy in succesive runs so I think I need to better understand this, lookig forward to the homework review and what others have done when assessing peers.