El método de los K-vecinos más cercanos es una herramienta de clasificación supervisada. Consiste en tomar nuestros valores en las características del dataset e intentar clasificarlos en grupos similares para predecir características en otro grupo de datos. El algoritmo consiste en ordenar los “puntos” (los valores del dataset se convertirían en vectores) de menor a mayor distancia (la distancia euclidiana aunque se pueden usar otras) y en función del valor k (el número de vecinos ) que le especifiquemos nos proporcionará el modelo.
library(ISLR)
library(Amelia)
## Loading required package: Rcpp
## ##
## ## Amelia II: Multiple Imputation
## ## (Version 1.7.5, built: 2018-05-07)
## ## Copyright (C) 2005-2019 James Honaker, Gary King and Matthew Blackwell
## ## Refer to http://gking.harvard.edu/amelia/ for more information
## ##
library(class)
library(ggplot2)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
df <- Caravan
str(df)
## 'data.frame': 5822 obs. of 86 variables:
## $ MOSTYPE : num 33 37 37 9 40 23 39 33 33 11 ...
## $ MAANTHUI: num 1 1 1 1 1 1 2 1 1 2 ...
## $ MGEMOMV : num 3 2 2 3 4 2 3 2 2 3 ...
## $ MGEMLEEF: num 2 2 2 3 2 1 2 3 4 3 ...
## $ MOSHOOFD: num 8 8 8 3 10 5 9 8 8 3 ...
## $ MGODRK : num 0 1 0 2 1 0 2 0 0 3 ...
## $ MGODPR : num 5 4 4 3 4 5 2 7 1 5 ...
## $ MGODOV : num 1 1 2 2 1 0 0 0 3 0 ...
## $ MGODGE : num 3 4 4 4 4 5 5 2 6 2 ...
## $ MRELGE : num 7 6 3 5 7 0 7 7 6 7 ...
## $ MRELSA : num 0 2 2 2 1 6 2 2 0 0 ...
## $ MRELOV : num 2 2 4 2 2 3 0 0 3 2 ...
## $ MFALLEEN: num 1 0 4 2 2 3 0 0 3 2 ...
## $ MFGEKIND: num 2 4 4 3 4 5 3 5 3 2 ...
## $ MFWEKIND: num 6 5 2 4 4 2 6 4 3 6 ...
## $ MOPLHOOG: num 1 0 0 3 5 0 0 0 0 0 ...
## $ MOPLMIDD: num 2 5 5 4 4 5 4 3 1 4 ...
## $ MOPLLAAG: num 7 4 4 2 0 4 5 6 8 5 ...
## $ MBERHOOG: num 1 0 0 4 0 2 0 2 1 2 ...
## $ MBERZELF: num 0 0 0 0 5 0 0 0 1 0 ...
## $ MBERBOER: num 1 0 0 0 4 0 0 0 0 0 ...
## $ MBERMIDD: num 2 5 7 3 0 4 4 2 1 3 ...
## $ MBERARBG: num 5 0 0 1 0 2 1 5 8 3 ...
## $ MBERARBO: num 2 4 2 2 0 2 5 2 1 3 ...
## $ MSKA : num 1 0 0 3 9 2 0 2 1 1 ...
## $ MSKB1 : num 1 2 5 2 0 2 1 1 1 2 ...
## $ MSKB2 : num 2 3 0 1 0 2 4 2 0 1 ...
## $ MSKC : num 6 5 4 4 0 4 5 5 8 4 ...
## $ MSKD : num 1 0 0 0 0 2 0 2 1 2 ...
## $ MHHUUR : num 1 2 7 5 4 9 6 0 9 0 ...
## $ MHKOOP : num 8 7 2 4 5 0 3 9 0 9 ...
## $ MAUT1 : num 8 7 7 9 6 5 8 4 5 6 ...
## $ MAUT2 : num 0 1 0 0 2 3 0 4 2 1 ...
## $ MAUT0 : num 1 2 2 0 1 3 1 2 3 2 ...
## $ MZFONDS : num 8 6 9 7 5 9 9 6 7 6 ...
## $ MZPART : num 1 3 0 2 4 0 0 3 2 3 ...
## $ MINKM30 : num 0 2 4 1 0 5 4 2 7 2 ...
## $ MINK3045: num 4 0 5 5 0 2 3 5 2 3 ...
## $ MINK4575: num 5 5 0 3 9 3 3 3 1 3 ...
## $ MINK7512: num 0 2 0 0 0 0 0 0 0 1 ...
## $ MINK123M: num 0 0 0 0 0 0 0 0 0 0 ...
## $ MINKGEM : num 4 5 3 4 6 3 3 3 2 4 ...
## $ MKOOPKLA: num 3 4 4 4 3 3 5 3 3 7 ...
## $ PWAPART : num 0 2 2 0 0 0 0 0 0 2 ...
## $ PWABEDR : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PWALAND : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSAUT: num 6 0 6 6 0 6 6 0 5 0 ...
## $ PBESAUT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PMOTSCO : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PVRAAUT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PAANHANG: num 0 0 0 0 0 0 0 0 0 0 ...
## $ PTRACTOR: num 0 0 0 0 0 0 0 0 0 0 ...
## $ PWERKT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PBROM : num 0 0 0 0 0 0 0 3 0 0 ...
## $ PLEVEN : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PPERSONG: num 0 0 0 0 0 0 0 0 0 0 ...
## $ PGEZONG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PWAOREG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PBRAND : num 5 2 2 2 6 0 0 0 0 3 ...
## $ PZEILPL : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PPLEZIER: num 0 0 0 0 0 0 0 0 0 0 ...
## $ PFIETS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PINBOED : num 0 0 0 0 0 0 0 0 0 0 ...
## $ PBYSTAND: num 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAPART : num 0 2 1 0 0 0 0 0 0 1 ...
## $ AWABEDR : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AWALAND : num 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSAUT: num 1 0 1 1 0 1 1 0 1 0 ...
## $ ABESAUT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AMOTSCO : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AVRAAUT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AAANHANG: num 0 0 0 0 0 0 0 0 0 0 ...
## $ ATRACTOR: num 0 0 0 0 0 0 0 0 0 0 ...
## $ AWERKT : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ABROM : num 0 0 0 0 0 0 0 1 0 0 ...
## $ ALEVEN : num 0 0 0 0 0 0 0 0 0 0 ...
## $ APERSONG: num 0 0 0 0 0 0 0 0 0 0 ...
## $ AGEZONG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AWAOREG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ABRAND : num 1 1 1 1 1 0 0 0 0 1 ...
## $ AZEILPL : num 0 0 0 0 0 0 0 0 0 0 ...
## $ APLEZIER: num 0 0 0 0 0 0 0 0 0 0 ...
## $ AFIETS : num 0 0 0 0 0 0 0 0 0 0 ...
## $ AINBOED : num 0 0 0 0 0 0 0 0 0 0 ...
## $ ABYSTAND: num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purchase: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
summary(df)
## MOSTYPE MAANTHUI MGEMOMV MGEMLEEF
## Min. : 1.00 Min. : 1.000 Min. :1.000 Min. :1.000
## 1st Qu.:10.00 1st Qu.: 1.000 1st Qu.:2.000 1st Qu.:2.000
## Median :30.00 Median : 1.000 Median :3.000 Median :3.000
## Mean :24.25 Mean : 1.111 Mean :2.679 Mean :2.991
## 3rd Qu.:35.00 3rd Qu.: 1.000 3rd Qu.:3.000 3rd Qu.:3.000
## Max. :41.00 Max. :10.000 Max. :5.000 Max. :6.000
## MOSHOOFD MGODRK MGODPR MGODOV
## Min. : 1.000 Min. :0.0000 Min. :0.000 Min. :0.00
## 1st Qu.: 3.000 1st Qu.:0.0000 1st Qu.:4.000 1st Qu.:0.00
## Median : 7.000 Median :0.0000 Median :5.000 Median :1.00
## Mean : 5.774 Mean :0.6965 Mean :4.627 Mean :1.07
## 3rd Qu.: 8.000 3rd Qu.:1.0000 3rd Qu.:6.000 3rd Qu.:2.00
## Max. :10.000 Max. :9.0000 Max. :9.000 Max. :5.00
## MGODGE MRELGE MRELSA MRELOV
## Min. :0.000 Min. :0.000 Min. :0.0000 Min. :0.00
## 1st Qu.:2.000 1st Qu.:5.000 1st Qu.:0.0000 1st Qu.:1.00
## Median :3.000 Median :6.000 Median :1.0000 Median :2.00
## Mean :3.259 Mean :6.183 Mean :0.8835 Mean :2.29
## 3rd Qu.:4.000 3rd Qu.:7.000 3rd Qu.:1.0000 3rd Qu.:3.00
## Max. :9.000 Max. :9.000 Max. :7.0000 Max. :9.00
## MFALLEEN MFGEKIND MFWEKIND MOPLHOOG
## Min. :0.000 Min. :0.00 Min. :0.0 Min. :0.000
## 1st Qu.:0.000 1st Qu.:2.00 1st Qu.:3.0 1st Qu.:0.000
## Median :2.000 Median :3.00 Median :4.0 Median :1.000
## Mean :1.888 Mean :3.23 Mean :4.3 Mean :1.461
## 3rd Qu.:3.000 3rd Qu.:4.00 3rd Qu.:6.0 3rd Qu.:2.000
## Max. :9.000 Max. :9.00 Max. :9.0 Max. :9.000
## MOPLMIDD MOPLLAAG MBERHOOG MBERZELF
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:2.000 1st Qu.:3.000 1st Qu.:0.000 1st Qu.:0.000
## Median :3.000 Median :5.000 Median :2.000 Median :0.000
## Mean :3.351 Mean :4.572 Mean :1.895 Mean :0.398
## 3rd Qu.:4.000 3rd Qu.:6.000 3rd Qu.:3.000 3rd Qu.:1.000
## Max. :9.000 Max. :9.000 Max. :9.000 Max. :5.000
## MBERBOER MBERMIDD MBERARBG MBERARBO
## Min. :0.0000 Min. :0.000 Min. :0.00 Min. :0.000
## 1st Qu.:0.0000 1st Qu.:2.000 1st Qu.:1.00 1st Qu.:1.000
## Median :0.0000 Median :3.000 Median :2.00 Median :2.000
## Mean :0.5223 Mean :2.899 Mean :2.22 Mean :2.306
## 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:3.00 3rd Qu.:3.000
## Max. :9.0000 Max. :9.000 Max. :9.00 Max. :9.000
## MSKA MSKB1 MSKB2 MSKC
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:1.000 1st Qu.:2.000
## Median :1.000 Median :2.000 Median :2.000 Median :4.000
## Mean :1.621 Mean :1.607 Mean :2.203 Mean :3.759
## 3rd Qu.:2.000 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:5.000
## Max. :9.000 Max. :9.000 Max. :9.000 Max. :9.000
## MSKD MHHUUR MHKOOP MAUT1
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.00
## 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:2.000 1st Qu.:5.00
## Median :1.000 Median :4.000 Median :5.000 Median :6.00
## Mean :1.067 Mean :4.237 Mean :4.772 Mean :6.04
## 3rd Qu.:2.000 3rd Qu.:7.000 3rd Qu.:7.000 3rd Qu.:7.00
## Max. :9.000 Max. :9.000 Max. :9.000 Max. :9.00
## MAUT2 MAUT0 MZFONDS MZPART
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.000
## 1st Qu.:0.000 1st Qu.:1.000 1st Qu.:5.000 1st Qu.:1.000
## Median :1.000 Median :2.000 Median :7.000 Median :2.000
## Mean :1.316 Mean :1.959 Mean :6.277 Mean :2.729
## 3rd Qu.:2.000 3rd Qu.:3.000 3rd Qu.:8.000 3rd Qu.:4.000
## Max. :7.000 Max. :9.000 Max. :9.000 Max. :9.000
## MINKM30 MINK3045 MINK4575 MINK7512
## Min. :0.000 Min. :0.000 Min. :0.000 Min. :0.0000
## 1st Qu.:1.000 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:0.0000
## Median :2.000 Median :4.000 Median :3.000 Median :0.0000
## Mean :2.574 Mean :3.536 Mean :2.731 Mean :0.7961
## 3rd Qu.:4.000 3rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:1.0000
## Max. :9.000 Max. :9.000 Max. :9.000 Max. :9.0000
## MINK123M MINKGEM MKOOPKLA PWAPART
## Min. :0.0000 Min. :0.000 Min. :1.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:3.000 1st Qu.:0.0000
## Median :0.0000 Median :4.000 Median :4.000 Median :0.0000
## Mean :0.2027 Mean :3.784 Mean :4.236 Mean :0.7712
## 3rd Qu.:0.0000 3rd Qu.:4.000 3rd Qu.:6.000 3rd Qu.:2.0000
## Max. :9.0000 Max. :9.000 Max. :8.000 Max. :3.0000
## PWABEDR PWALAND PPERSAUT PBESAUT
## Min. :0.00000 Min. :0.00000 Min. :0.00 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :5.00 Median :0.00000
## Mean :0.04002 Mean :0.07162 Mean :2.97 Mean :0.04827
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:6.00 3rd Qu.:0.00000
## Max. :6.00000 Max. :4.00000 Max. :8.00 Max. :7.00000
## PMOTSCO PVRAAUT PAANHANG PTRACTOR
## Min. :0.0000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.0000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.0000 Median :0.000000 Median :0.00000 Median :0.00000
## Mean :0.1754 Mean :0.009447 Mean :0.02096 Mean :0.09258
## 3rd Qu.:0.0000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :7.0000 Max. :9.000000 Max. :5.00000 Max. :6.00000
## PWERKT PBROM PLEVEN PPERSONG
## Min. :0.00000 Min. :0.000 Min. :0.0000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.00000
## Median :0.00000 Median :0.000 Median :0.0000 Median :0.00000
## Mean :0.01305 Mean :0.215 Mean :0.1948 Mean :0.01374
## 3rd Qu.:0.00000 3rd Qu.:0.000 3rd Qu.:0.0000 3rd Qu.:0.00000
## Max. :6.00000 Max. :6.000 Max. :9.0000 Max. :6.00000
## PGEZONG PWAOREG PBRAND PZEILPL
## Min. :0.00000 Min. :0.00000 Min. :0.000 Min. :0.0000000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000 1st Qu.:0.0000000
## Median :0.00000 Median :0.00000 Median :2.000 Median :0.0000000
## Mean :0.01529 Mean :0.02353 Mean :1.828 Mean :0.0008588
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:4.000 3rd Qu.:0.0000000
## Max. :3.00000 Max. :7.00000 Max. :8.000 Max. :3.0000000
## PPLEZIER PFIETS PINBOED PBYSTAND
## Min. :0.00000 Min. :0.00000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.00000 Median :0.00000
## Mean :0.01889 Mean :0.02525 Mean :0.01563 Mean :0.04758
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :6.00000 Max. :1.00000 Max. :6.00000 Max. :5.00000
## AWAPART AWABEDR AWALAND APERSAUT
## Min. :0.000 Min. :0.00000 Min. :0.00000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.0000
## Median :0.000 Median :0.00000 Median :0.00000 Median :1.0000
## Mean :0.403 Mean :0.01477 Mean :0.02061 Mean :0.5622
## 3rd Qu.:1.000 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :2.000 Max. :5.00000 Max. :1.00000 Max. :7.0000
## ABESAUT AMOTSCO AVRAAUT AAANHANG
## Min. :0.00000 Min. :0.00000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.00000 Median :0.00000 Median :0.000000 Median :0.00000
## Mean :0.01048 Mean :0.04105 Mean :0.002233 Mean :0.01254
## 3rd Qu.:0.00000 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :4.00000 Max. :8.00000 Max. :3.000000 Max. :3.00000
## ATRACTOR AWERKT ABROM ALEVEN
## Min. :0.00000 Min. :0.000000 Min. :0.00000 Min. :0.00000
## 1st Qu.:0.00000 1st Qu.:0.000000 1st Qu.:0.00000 1st Qu.:0.00000
## Median :0.00000 Median :0.000000 Median :0.00000 Median :0.00000
## Mean :0.03367 Mean :0.006183 Mean :0.07042 Mean :0.07661
## 3rd Qu.:0.00000 3rd Qu.:0.000000 3rd Qu.:0.00000 3rd Qu.:0.00000
## Max. :4.00000 Max. :6.000000 Max. :2.00000 Max. :8.00000
## APERSONG AGEZONG AWAOREG ABRAND
## Min. :0.000000 Min. :0.000000 Min. :0.000000 Min. :0.0000
## 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.000000 1st Qu.:0.0000
## Median :0.000000 Median :0.000000 Median :0.000000 Median :1.0000
## Mean :0.005325 Mean :0.006527 Mean :0.004638 Mean :0.5701
## 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:0.000000 3rd Qu.:1.0000
## Max. :1.000000 Max. :1.000000 Max. :2.000000 Max. :7.0000
## AZEILPL APLEZIER AFIETS
## Min. :0.0000000 Min. :0.000000 Min. :0.00000
## 1st Qu.:0.0000000 1st Qu.:0.000000 1st Qu.:0.00000
## Median :0.0000000 Median :0.000000 Median :0.00000
## Mean :0.0005153 Mean :0.006012 Mean :0.03178
## 3rd Qu.:0.0000000 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :1.0000000 Max. :2.000000 Max. :3.00000
## AINBOED ABYSTAND Purchase
## Min. :0.000000 Min. :0.00000 No :5474
## 1st Qu.:0.000000 1st Qu.:0.00000 Yes: 348
## Median :0.000000 Median :0.00000
## Mean :0.007901 Mean :0.01426
## 3rd Qu.:0.000000 3rd Qu.:0.00000
## Max. :2.000000 Max. :2.00000
Comprobamos si tiene algún valor NA el dataset. Podemos hacerlo así:
any(is.na(df))
## [1] FALSE
o mediante la función missmap del paquete Amelia.
missmap(df, col=c("yellow","black"), legend=FALSE)
El dataset no tiene NA’s.
Para aplicar el método de los K-vecinos más cercanos debemos de estandarizar las variables que usaremos ya que sino algunas variables influirán más en la distnacia.
var(df[,1])
## [1] 165.0378
var(df[,2])
## [1] 0.1647078
Podemos observar que la diferencia en la varianza entre ambas características es muy alta, así que como se comentaba anteriormente, lo mejor es estandarizar todas las columnas con las que vamos a trabajar, quitando la columna purchase, que es la que vamos a usar para predecir.
purchase <- df[,86]
standarized.df <- scale(df[,-86])
var(standarized.df[,1])
## [1] 1
Aunque normalmente usamos el paquete caTools, en esta ocasión vamos a hacer una elección sencilla de 1000 vectores para la creación de los datasets de entrenamiento y de test.
test.index <- 1:1000
test.data <- standarized.df[test.index, ]
test.purchase <- purchase[test.index]
train.data <- standarized.df[-test.index, ]
train.purchase <- purchase[-test.index]
predicted.purchased <- knn(train.data, test.data, train.purchase, k=1)
head(predicted.purchased)
## [1] No No No No No No
## Levels: No Yes
Calculamos el porcentaje de fallos del modelo sobre el dataset de testeo.
misclass.error <- mean(predicted.purchased!=test.purchase)
misclass.error
## [1] 0.116
Uno de los problemas más frecuentes cuando usamos el modelo de K-vecinos más cercanos es la correcta elección del valor k. Realizamos distintas pruebas ahora con k=3 y k=5.
predicted.purchased <- knn(train.data, test.data, train.purchase, k=3)
missclass.error <- mean(predicted.purchased!=test.purchase)
missclass.error
## [1] 0.074
predicted.purchased <- knn(train.data, test.data, train.purchase, k=5)
missclass.error <- mean(predicted.purchased!=test.purchase)
missclass.error
## [1] 0.066
Como podemos observar, el modelo se va haciendo más eficiente.
Tanto por tiempo como por pesadez en el código, no es viable ir probando con distintos k manualmente para ver cuál es el más óptimo. Para ello, se usa el método del codo
predicted.purchased <- NULL
error.rate <- NULL
for (i in 1:20){
predicted.purchased <- knn(train.data, test.data, train.purchase, k=i)
error.rate[i] <- mean(predicted.purchased!=test.purchase)
}
error.rate
## [1] 0.117 0.112 0.075 0.072 0.066 0.063 0.062 0.062 0.058 0.058 0.059
## [12] 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059 0.059
Graficamos los valores.
k <- 1:20
error.df <- data.frame('k value'=k,'error'=error.rate)
ggplot(error.df, aes(x=k,y=error.rate)) + geom_point() + geom_line(col='red', lty='dotted')
Buscamos el valor a partir del cual no haya una variación significante en el error. El valor en este caso es k=9, donde el error es menor que 0.06 y a partir de él para los siguientes k casi no aumenta ni disminuye.
predicted.purchased <- knn(train.data, test.data, train.purchase, k=9)