Construir y evaluar un modelo KNN para predecir si una persona tiene un tumor BENIGNO O MALIGNO.
Cargar librerías, datos y hacer lo necesario aplicando función knn de la llibrería class y la función train.knn de la librería kknn.
library(readr) # Leer datos
library(kknn) # KNN modelo
library(dplyr) # Procesar filtrar
library(forcats) # para decodificar vars
library(class) # Para
library(caret) # Matriz de confusión entre otros
library(reshape) # Para modificar variables
library(knitr) # Para tablas amigables
datos <- read.csv("https://raw.githubusercontent.com/rpizarrog/Machine-Learning-con-R/main/datos/wisc_bc_data.csv", encoding = "UTF-8")
kable(head(datos, 20), caption = "Los datos. Primeros 20")
| id | diagnosis | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | points_mean | symmetry_mean | dimension_mean | radius_se | texture_se | perimeter_se | area_se | smoothness_se | compactness_se | concavity_se | points_se | symmetry_se | dimension_se | radius_worst | texture_worst | perimeter_worst | area_worst | smoothness_worst | compactness_worst | concavity_worst | points_worst | symmetry_worst | dimension_worst |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 87139402 | B | 12.32 | 12.39 | 78.85 | 464.1 | 0.10280 | 0.06981 | 0.039870 | 0.037000 | 0.1959 | 0.05955 | 0.2360 | 0.6656 | 1.670 | 17.43 | 0.008045 | 0.011800 | 0.016830 | 0.012410 | 0.01924 | 0.002248 | 13.50 | 15.64 | 86.97 | 549.1 | 0.1385 | 0.12660 | 0.124200 | 0.09391 | 0.2827 | 0.06771 |
| 8910251 | B | 10.60 | 18.95 | 69.28 | 346.4 | 0.09688 | 0.11470 | 0.063870 | 0.026420 | 0.1922 | 0.06491 | 0.4505 | 1.1970 | 3.430 | 27.10 | 0.007470 | 0.035810 | 0.033540 | 0.013650 | 0.03504 | 0.003318 | 11.88 | 22.94 | 78.28 | 424.8 | 0.1213 | 0.25150 | 0.191600 | 0.07926 | 0.2940 | 0.07587 |
| 905520 | B | 11.04 | 16.83 | 70.92 | 373.2 | 0.10770 | 0.07804 | 0.030460 | 0.024800 | 0.1714 | 0.06340 | 0.1967 | 1.3870 | 1.342 | 13.54 | 0.005158 | 0.009355 | 0.010560 | 0.007483 | 0.01718 | 0.002198 | 12.41 | 26.44 | 79.93 | 471.4 | 0.1369 | 0.14820 | 0.106700 | 0.07431 | 0.2998 | 0.07881 |
| 868871 | B | 11.28 | 13.39 | 73.00 | 384.8 | 0.11640 | 0.11360 | 0.046350 | 0.047960 | 0.1771 | 0.06072 | 0.3384 | 1.3430 | 1.851 | 26.33 | 0.011270 | 0.034980 | 0.021870 | 0.019650 | 0.01580 | 0.003442 | 11.92 | 15.77 | 76.53 | 434.0 | 0.1367 | 0.18220 | 0.086690 | 0.08611 | 0.2102 | 0.06784 |
| 9012568 | B | 15.19 | 13.21 | 97.65 | 711.8 | 0.07963 | 0.06934 | 0.033930 | 0.026570 | 0.1721 | 0.05544 | 0.1783 | 0.4125 | 1.338 | 17.72 | 0.005012 | 0.014850 | 0.015510 | 0.009155 | 0.01647 | 0.001767 | 16.20 | 15.73 | 104.50 | 819.1 | 0.1126 | 0.17370 | 0.136200 | 0.08178 | 0.2487 | 0.06766 |
| 906539 | B | 11.57 | 19.04 | 74.20 | 409.7 | 0.08546 | 0.07722 | 0.054850 | 0.014280 | 0.2031 | 0.06267 | 0.2864 | 1.4400 | 2.206 | 20.30 | 0.007278 | 0.020470 | 0.044470 | 0.008799 | 0.01868 | 0.003339 | 13.07 | 26.98 | 86.43 | 520.5 | 0.1249 | 0.19370 | 0.256000 | 0.06664 | 0.3035 | 0.08284 |
| 925291 | B | 11.51 | 23.93 | 74.52 | 403.5 | 0.09261 | 0.10210 | 0.111200 | 0.041050 | 0.1388 | 0.06570 | 0.2388 | 2.9040 | 1.936 | 16.97 | 0.008200 | 0.029820 | 0.057380 | 0.012670 | 0.01488 | 0.004738 | 12.48 | 37.16 | 82.28 | 474.2 | 0.1298 | 0.25170 | 0.363000 | 0.09653 | 0.2112 | 0.08732 |
| 87880 | M | 13.81 | 23.75 | 91.56 | 597.8 | 0.13230 | 0.17680 | 0.155800 | 0.091760 | 0.2251 | 0.07421 | 0.5648 | 1.9300 | 3.909 | 52.72 | 0.008824 | 0.031080 | 0.031120 | 0.012910 | 0.01998 | 0.004506 | 19.20 | 41.85 | 128.50 | 1153.0 | 0.2226 | 0.52090 | 0.464600 | 0.20130 | 0.4432 | 0.10860 |
| 862989 | B | 10.49 | 19.29 | 67.41 | 336.1 | 0.09989 | 0.08578 | 0.029950 | 0.012010 | 0.2217 | 0.06481 | 0.3550 | 1.5340 | 2.302 | 23.13 | 0.007595 | 0.022190 | 0.028800 | 0.008614 | 0.02710 | 0.003451 | 11.54 | 23.31 | 74.22 | 402.8 | 0.1219 | 0.14860 | 0.079870 | 0.03203 | 0.2826 | 0.07552 |
| 89827 | B | 11.06 | 14.96 | 71.49 | 373.9 | 0.10330 | 0.09097 | 0.053970 | 0.033410 | 0.1776 | 0.06907 | 0.1601 | 0.8225 | 1.355 | 10.80 | 0.007416 | 0.018770 | 0.027580 | 0.010100 | 0.02348 | 0.002917 | 11.92 | 19.90 | 79.76 | 440.0 | 0.1418 | 0.22100 | 0.229900 | 0.10750 | 0.3301 | 0.09080 |
| 91485 | M | 20.59 | 21.24 | 137.80 | 1320.0 | 0.10850 | 0.16440 | 0.218800 | 0.112100 | 0.1848 | 0.06222 | 0.5904 | 1.2160 | 4.206 | 75.09 | 0.006666 | 0.027910 | 0.040620 | 0.014790 | 0.01117 | 0.003727 | 23.86 | 30.76 | 163.20 | 1760.0 | 0.1464 | 0.35970 | 0.517900 | 0.21130 | 0.2480 | 0.08999 |
| 8711003 | B | 12.25 | 17.94 | 78.27 | 460.3 | 0.08654 | 0.06679 | 0.038850 | 0.023310 | 0.1970 | 0.06228 | 0.2200 | 0.9823 | 1.484 | 16.51 | 0.005518 | 0.015620 | 0.019940 | 0.007924 | 0.01799 | 0.002484 | 13.59 | 25.22 | 86.60 | 564.2 | 0.1217 | 0.17880 | 0.194300 | 0.08211 | 0.3113 | 0.08132 |
| 9113455 | B | 13.14 | 20.74 | 85.98 | 536.9 | 0.08675 | 0.10890 | 0.108500 | 0.035100 | 0.1562 | 0.06020 | 0.3152 | 0.7884 | 2.312 | 27.40 | 0.007295 | 0.031790 | 0.046150 | 0.012540 | 0.01561 | 0.003230 | 14.80 | 25.46 | 100.90 | 689.1 | 0.1351 | 0.35490 | 0.450400 | 0.11810 | 0.2563 | 0.08174 |
| 857810 | B | 13.05 | 19.31 | 82.61 | 527.2 | 0.08060 | 0.03789 | 0.000692 | 0.004167 | 0.1819 | 0.05501 | 0.4040 | 1.2140 | 2.595 | 32.96 | 0.007491 | 0.008593 | 0.000692 | 0.004167 | 0.02190 | 0.002990 | 14.23 | 22.25 | 90.24 | 624.1 | 0.1021 | 0.06191 | 0.001845 | 0.01111 | 0.2439 | 0.06289 |
| 9111805 | M | 19.59 | 25.00 | 127.70 | 1191.0 | 0.10320 | 0.09871 | 0.165500 | 0.090630 | 0.1663 | 0.05391 | 0.4674 | 1.3750 | 2.916 | 56.18 | 0.011900 | 0.019290 | 0.049070 | 0.014990 | 0.01641 | 0.001807 | 21.44 | 30.96 | 139.80 | 1421.0 | 0.1528 | 0.18450 | 0.397700 | 0.14660 | 0.2293 | 0.06091 |
| 925277 | B | 14.59 | 22.68 | 96.39 | 657.1 | 0.08473 | 0.13300 | 0.102900 | 0.037360 | 0.1454 | 0.06147 | 0.2254 | 1.1080 | 2.224 | 19.54 | 0.004242 | 0.046390 | 0.065780 | 0.016060 | 0.01638 | 0.004406 | 15.48 | 27.27 | 105.90 | 733.5 | 0.1026 | 0.31710 | 0.366200 | 0.11050 | 0.2258 | 0.08004 |
| 867387 | B | 15.71 | 13.93 | 102.00 | 761.7 | 0.09462 | 0.09462 | 0.071350 | 0.059330 | 0.1816 | 0.05723 | 0.3117 | 0.8155 | 1.972 | 27.94 | 0.005217 | 0.015150 | 0.016780 | 0.012680 | 0.01669 | 0.002330 | 17.50 | 19.25 | 114.30 | 922.8 | 0.1223 | 0.19490 | 0.170900 | 0.13740 | 0.2723 | 0.07071 |
| 89511502 | B | 12.67 | 17.30 | 81.25 | 489.9 | 0.10280 | 0.07664 | 0.031930 | 0.021070 | 0.1707 | 0.05984 | 0.2100 | 0.9505 | 1.566 | 17.61 | 0.006809 | 0.009514 | 0.013290 | 0.006474 | 0.02057 | 0.001784 | 13.71 | 21.10 | 88.70 | 574.4 | 0.1384 | 0.12120 | 0.102000 | 0.05602 | 0.2688 | 0.06888 |
| 89263202 | M | 20.09 | 23.86 | 134.70 | 1247.0 | 0.10800 | 0.18380 | 0.228300 | 0.128000 | 0.2249 | 0.07469 | 1.0720 | 1.7430 | 7.804 | 130.80 | 0.007964 | 0.047320 | 0.076490 | 0.019360 | 0.02736 | 0.005928 | 23.68 | 29.43 | 158.80 | 1696.0 | 0.1347 | 0.33910 | 0.493200 | 0.19230 | 0.3294 | 0.09469 |
| 866714 | B | 12.19 | 13.29 | 79.08 | 455.8 | 0.10660 | 0.09509 | 0.028550 | 0.028820 | 0.1880 | 0.06471 | 0.2005 | 0.8163 | 1.973 | 15.24 | 0.006773 | 0.024560 | 0.010180 | 0.008094 | 0.02662 | 0.004143 | 13.34 | 17.81 | 91.38 | 545.2 | 0.1427 | 0.25850 | 0.099150 | 0.08187 | 0.3469 | 0.09241 |
Eliminar la columna Id y dejarlo en variable data.frame dat.p que significa datos preparados
datos.p <- select(datos, diagnosis, radius_mean, texture_mean, perimeter_mean, area_mean, smoothness_mean, compactness_mean, concavity_mean, points_mean, symmetry_mean, dimension_mean, radius_se,, texture_se, perimeter_se, area_se, smoothness_se, compactness_se, concavity_se, points_se, symmetry_se, dimension_se, radius_worst, texture_worst, perimeter_worst, area_worst, smoothness_worst, compactness_worst, concavity_worst, points_worst, symmetry_worst, dimension_worst)
# datos.pp <- datos[-1,] # Otra alnternativa
Por medio de una función se normalizan los datos numéricos para evitar datos muy grandes o muy pequeños y centralizar los valores.
Se puede utilizar la función scale(), para escalar valores numéricos. Para este ejercicio se utiliza la función preparada normalizar.
# Función para normalizar
normalizar <- function(x){
return ((x - min(x))/(max(x) - min(x)))
}
datos.p[,2:31] <- normalizar(datos.p[, 2:31])
Cambiarle la variable diagnosis a diagnostico con valores B = Benigno y M Maligno.
Cambiar el nombre del atributo diagnosis por diagnostico.
datos.p = rename(datos.p, c(diagnosis="diagnostico"))
datos.p$diagnostico <- factor(datos.p$diagnostico)
Por medio de sample() identifica el 70% de los registros de datos.p para datos de entrenamiento y el 30% restante (los que no son de entrenamiento) serán para datos de validación.
S siembre una semilla para generar los mismos datos aleatorios.
set.seed(2022)
n <- nrow(datos.p)
entrena <- sample(x = 1:n, size = round(n * 0.70), replace = FALSE)
entrena
## [1] 228 435 476 123 233 270 248 7 498 112 470 513 307 449 378 524 428 359
## [19] 150 197 523 107 276 515 452 194 290 493 256 511 234 353 287 56 1 382
## [37] 19 501 306 425 361 563 3 503 32 417 218 319 351 492 344 465 460 146
## [55] 555 550 110 6 289 426 247 279 104 424 282 477 207 33 95 140 366 471
## [73] 313 267 329 35 291 404 542 224 239 102 16 450 53 383 442 177 72 432
## [91] 337 348 464 373 15 554 114 187 309 221 200 81 365 566 201 295 130 403
## [109] 467 29 310 527 260 420 303 38 69 556 46 20 410 77 26 443 278 34
## [127] 191 368 315 193 336 510 371 388 448 301 113 259 314 539 385 242 412 160
## [145] 263 436 379 529 269 5 223 151 305 79 362 63 179 257 206 331 168 230
## [163] 139 122 483 173 210 78 302 163 131 509 422 514 262 181 55 133 338 433
## [181] 505 466 83 293 339 367 399 67 145 533 427 91 236 402 136 490 231 322
## [199] 121 166 430 521 111 48 407 350 18 175 266 240 347 416 189 132 205 437
## [217] 277 393 283 148 487 134 61 58 176 273 120 286 105 355 488 138 92 208
## [235] 485 188 37 549 330 59 534 238 127 157 411 172 31 80 522 220 326 235
## [253] 400 101 170 376 415 363 486 298 8 504 369 47 308 73 545 375 129 246
## [271] 474 274 438 96 24 334 25 39 272 167 106 552 537 182 174 431 229 340
## [289] 560 457 66 446 507 288 423 468 195 508 245 553 216 494 352 516 548 85
## [307] 169 41 489 557 506 86 40 304 419 414 518 251 380 253 144 185 156 227
## [325] 536 232 94 405 161 526 345 147 562 115 51 512 44 70 395 335 203 268
## [343] 311 391 186 372 82 243 341 4 462 561 49 342 103 184 23 321 381 497
## [361] 502 387 541 93 374 65 271 74 299 401 45 346 22 397 9 153 118 328
## [379] 530 211 531 154 42 472 255 54 89 451 117 517 463 124 36 480 178 209
## [397] 97 535
Datos de entrenamiento
datos.entrenamiento <- datos.p[entrena, ]
kable(head(datos.entrenamiento, 10), caption = "Datos de entrenamiento primeros 10")
| diagnostico | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | points_mean | symmetry_mean | dimension_mean | radius_se | texture_se | perimeter_se | area_se | smoothness_se | compactness_se | concavity_se | points_se | symmetry_se | dimension_se | radius_worst | texture_worst | perimeter_worst | area_worst | smoothness_worst | compactness_worst | concavity_worst | points_worst | symmetry_worst | dimension_worst | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 228 | M | 0.0047438 | 0.0045933 | 0.0314528 | 0.2938411 | 2.66e-05 | 3.50e-05 | 5.01e-05 | 2.96e-05 | 4.05e-05 | 1.42e-05 | 0.0001018 | 0.0002353 | 0.0007071 | 0.0123390 | 2.1e-06 | 6.40e-06 | 1.30e-05 | 4.5e-06 | 5.8e-06 | 9.0e-07 | 0.0051787 | 0.0058933 | 0.0343206 | 0.3476728 | 3.91e-05 | 0.0000692 | 0.0001248 | 5.11e-05 | 7.13e-05 | 1.90e-05 |
| 435 | B | 0.0025787 | 0.0040433 | 0.0168618 | 0.0873296 | 2.10e-05 | 2.62e-05 | 2.22e-05 | 8.50e-06 | 3.50e-05 | 1.56e-05 | 0.0000605 | 0.0003235 | 0.0006596 | 0.0042666 | 2.0e-06 | 1.09e-05 | 1.51e-05 | 4.2e-06 | 3.6e-06 | 1.2e-06 | 0.0029055 | 0.0063164 | 0.0211895 | 0.1119887 | 3.27e-05 | 0.0000960 | 0.0001123 | 3.66e-05 | 5.97e-05 | 2.24e-05 |
| 476 | M | 0.0042830 | 0.0044358 | 0.0279031 | 0.2414198 | 2.29e-05 | 2.63e-05 | 2.66e-05 | 1.87e-05 | 4.25e-05 | 1.33e-05 | 0.0000950 | 0.0001294 | 0.0005987 | 0.0114951 | 1.1e-06 | 3.90e-06 | 5.70e-06 | 2.7e-06 | 3.0e-06 | 6.0e-07 | 0.0051340 | 0.0058768 | 0.0331218 | 0.3490832 | 3.37e-05 | 0.0000650 | 0.0000906 | 4.17e-05 | 6.61e-05 | 1.93e-05 |
| 123 | B | 0.0027268 | 0.0057569 | 0.0174495 | 0.0980724 | 1.76e-05 | 1.34e-05 | 4.60e-06 | 3.10e-06 | 4.55e-05 | 1.38e-05 | 0.0000591 | 0.0004198 | 0.0004610 | 0.0042807 | 1.4e-06 | 5.50e-06 | 3.80e-06 | 1.6e-06 | 7.5e-06 | 5.0e-07 | 0.0029243 | 0.0074330 | 0.0191326 | 0.1120122 | 2.24e-05 | 0.0000320 | 0.0000170 | 1.13e-05 | 7.63e-05 | 1.59e-05 |
| 233 | B | 0.0033521 | 0.0046192 | 0.0229972 | 0.1480724 | 1.84e-05 | 5.25e-05 | 7.06e-05 | 1.83e-05 | 4.01e-05 | 1.83e-05 | 0.0000853 | 0.0003503 | 0.0007990 | 0.0068759 | 1.2e-06 | 1.75e-05 | 3.37e-05 | 5.4e-06 | 6.0e-06 | 3.1e-06 | 0.0035966 | 0.0055783 | 0.0251528 | 0.1666667 | 2.10e-05 | 0.0000986 | 0.0001594 | 3.54e-05 | 5.64e-05 | 2.54e-05 |
| 270 | M | 0.0045604 | 0.0055242 | 0.0303479 | 0.2715092 | 2.41e-05 | 3.66e-05 | 4.82e-05 | 2.09e-05 | 4.65e-05 | 1.41e-05 | 0.0001232 | 0.0004236 | 0.0009490 | 0.0142008 | 2.5e-06 | 7.60e-06 | 9.20e-06 | 3.7e-06 | 5.1e-06 | 9.0e-07 | 0.0050893 | 0.0071768 | 0.0340621 | 0.3330983 | 3.44e-05 | 0.0000698 | 0.0000813 | 3.68e-05 | 6.86e-05 | 1.79e-05 |
| 248 | B | 0.0034015 | 0.0058745 | 0.0225223 | 0.1543018 | 2.08e-05 | 2.89e-05 | 2.37e-05 | 9.10e-06 | 4.40e-05 | 1.49e-05 | 0.0000598 | 0.0002536 | 0.0006147 | 0.0054325 | 1.7e-06 | 1.09e-05 | 9.00e-06 | 2.7e-06 | 4.9e-06 | 1.4e-06 | 0.0038129 | 0.0074589 | 0.0266808 | 0.1901504 | 3.15e-05 | 0.0000988 | 0.0000950 | 2.83e-05 | 7.49e-05 | 2.40e-05 |
| 7 | B | 0.0027057 | 0.0056253 | 0.0175176 | 0.0948519 | 2.18e-05 | 2.40e-05 | 2.61e-05 | 9.60e-06 | 3.26e-05 | 1.54e-05 | 0.0000561 | 0.0006827 | 0.0004551 | 0.0039892 | 1.9e-06 | 7.00e-06 | 1.35e-05 | 3.0e-06 | 3.5e-06 | 1.1e-06 | 0.0029337 | 0.0087353 | 0.0193418 | 0.1114716 | 3.05e-05 | 0.0000592 | 0.0000853 | 2.27e-05 | 4.96e-05 | 2.05e-05 |
| 498 | M | 0.0030959 | 0.0051269 | 0.0200799 | 0.1249412 | 2.28e-05 | 2.46e-05 | 1.94e-05 | 1.23e-05 | 4.10e-05 | 1.45e-05 | 0.0000456 | 0.0001439 | 0.0003136 | 0.0034062 | 8.0e-07 | 3.30e-06 | 3.40e-06 | 1.6e-06 | 2.6e-06 | 4.0e-07 | 0.0038152 | 0.0070263 | 0.0248002 | 0.1741185 | 3.53e-05 | 0.0000918 | 0.0000876 | 3.78e-05 | 8.68e-05 | 2.26e-05 |
| 112 | B | 0.0019321 | 0.0048660 | 0.0125223 | 0.0479314 | 2.21e-05 | 3.07e-05 | 3.11e-05 | 5.10e-06 | 5.22e-05 | 1.94e-05 | 0.0000455 | 0.0004612 | 0.0002922 | 0.0024001 | 2.9e-06 | 1.27e-05 | 1.82e-05 | 2.4e-06 | 5.4e-06 | 2.8e-06 | 0.0021373 | 0.0069864 | 0.0136530 | 0.0587212 | 3.83e-05 | 0.0001013 | 0.0001265 | 1.85e-05 | 7.81e-05 | 3.49e-05 |
Datos de validación
datos.validacion <- datos.p[-entrena, ]
kable(head(datos.validacion, 10), caption = "Datos de validación primeros 10")
| diagnostico | radius_mean | texture_mean | perimeter_mean | area_mean | smoothness_mean | compactness_mean | concavity_mean | points_mean | symmetry_mean | dimension_mean | radius_se | texture_se | perimeter_se | area_se | smoothness_se | compactness_se | concavity_se | points_se | symmetry_se | dimension_se | radius_worst | texture_worst | perimeter_worst | area_worst | smoothness_worst | compactness_worst | concavity_worst | points_worst | symmetry_worst | dimension_worst | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2 | B | 0.0024918 | 0.0044546 | 0.0162858 | 0.0814292 | 2.28e-05 | 2.70e-05 | 1.50e-05 | 6.20e-06 | 4.52e-05 | 1.53e-05 | 0.0001059 | 0.0002814 | 0.0008063 | 0.0063705 | 1.8e-06 | 8.4e-06 | 7.90e-06 | 3.2e-06 | 8.2e-06 | 8e-07 | 0.0027927 | 0.0053926 | 0.0184015 | 0.0998590 | 2.85e-05 | 0.0000591 | 0.0000450 | 1.86e-05 | 6.91e-05 | 1.78e-05 |
| 10 | B | 0.0025999 | 0.0035167 | 0.0168054 | 0.0878937 | 2.43e-05 | 2.14e-05 | 1.27e-05 | 7.90e-06 | 4.17e-05 | 1.62e-05 | 0.0000376 | 0.0001933 | 0.0003185 | 0.0025388 | 1.7e-06 | 4.4e-06 | 6.50e-06 | 2.4e-06 | 5.5e-06 | 7e-07 | 0.0028021 | 0.0046780 | 0.0187494 | 0.1034321 | 3.33e-05 | 0.0000520 | 0.0000540 | 2.53e-05 | 7.76e-05 | 2.13e-05 |
| 11 | M | 0.0048402 | 0.0049929 | 0.0323930 | 0.3102962 | 2.55e-05 | 3.86e-05 | 5.14e-05 | 2.64e-05 | 4.34e-05 | 1.46e-05 | 0.0001388 | 0.0002858 | 0.0009887 | 0.0176516 | 1.6e-06 | 6.6e-06 | 9.50e-06 | 3.5e-06 | 2.6e-06 | 9e-07 | 0.0056088 | 0.0072308 | 0.0383639 | 0.4137283 | 3.44e-05 | 0.0000846 | 0.0001217 | 4.97e-05 | 5.83e-05 | 2.12e-05 |
| 12 | B | 0.0028796 | 0.0042172 | 0.0183992 | 0.1082040 | 2.03e-05 | 1.57e-05 | 9.10e-06 | 5.50e-06 | 4.63e-05 | 1.46e-05 | 0.0000517 | 0.0002309 | 0.0003488 | 0.0038811 | 1.3e-06 | 3.7e-06 | 4.70e-06 | 1.9e-06 | 4.2e-06 | 6e-07 | 0.0031946 | 0.0059285 | 0.0203573 | 0.1326281 | 2.86e-05 | 0.0000420 | 0.0000457 | 1.93e-05 | 7.32e-05 | 1.91e-05 |
| 13 | B | 0.0030889 | 0.0048754 | 0.0202116 | 0.1262106 | 2.04e-05 | 2.56e-05 | 2.55e-05 | 8.30e-06 | 3.67e-05 | 1.42e-05 | 0.0000741 | 0.0001853 | 0.0005435 | 0.0064410 | 1.7e-06 | 7.5e-06 | 1.08e-05 | 2.9e-06 | 3.7e-06 | 8e-07 | 0.0034791 | 0.0059850 | 0.0237189 | 0.1619887 | 3.18e-05 | 0.0000834 | 0.0001059 | 2.78e-05 | 6.02e-05 | 1.92e-05 |
| 14 | B | 0.0030677 | 0.0045393 | 0.0194194 | 0.1239304 | 1.89e-05 | 8.90e-06 | 2.00e-07 | 1.00e-06 | 4.28e-05 | 1.29e-05 | 0.0000950 | 0.0002854 | 0.0006100 | 0.0077480 | 1.8e-06 | 2.0e-06 | 2.00e-07 | 1.0e-06 | 5.1e-06 | 7e-07 | 0.0033451 | 0.0052304 | 0.0212130 | 0.1467090 | 2.40e-05 | 0.0000146 | 0.0000004 | 2.60e-06 | 5.73e-05 | 1.48e-05 |
| 17 | B | 0.0036930 | 0.0032746 | 0.0239774 | 0.1790550 | 2.22e-05 | 2.22e-05 | 1.68e-05 | 1.39e-05 | 4.27e-05 | 1.35e-05 | 0.0000733 | 0.0001917 | 0.0004636 | 0.0065679 | 1.2e-06 | 3.6e-06 | 3.90e-06 | 3.0e-06 | 3.9e-06 | 5e-07 | 0.0041138 | 0.0045252 | 0.0268688 | 0.2169252 | 2.87e-05 | 0.0000458 | 0.0000402 | 3.23e-05 | 6.40e-05 | 1.66e-05 |
| 21 | B | 0.0027527 | 0.0040409 | 0.0175552 | 0.0988011 | 2.30e-05 | 1.44e-05 | 9.00e-06 | 7.60e-06 | 3.56e-05 | 1.43e-05 | 0.0000576 | 0.0001799 | 0.0004095 | 0.0041984 | 1.6e-06 | 2.0e-06 | 4.60e-06 | 2.8e-06 | 4.5e-06 | 4e-07 | 0.0030583 | 0.0050282 | 0.0198449 | 0.1225905 | 3.11e-05 | 0.0000244 | 0.0000358 | 2.58e-05 | 6.05e-05 | 1.67e-05 |
| 27 | B | 0.0031617 | 0.0043018 | 0.0203573 | 0.1304890 | 2.40e-05 | 1.92e-05 | 9.30e-06 | 6.50e-06 | 3.85e-05 | 1.34e-05 | 0.0000693 | 0.0003228 | 0.0004934 | 0.0059285 | 1.4e-06 | 3.5e-06 | 4.40e-06 | 2.2e-06 | 4.4e-06 | 4e-07 | 0.0035496 | 0.0060978 | 0.0229408 | 0.1644100 | 3.15e-05 | 0.0000412 | 0.0000325 | 1.86e-05 | 6.30e-05 | 1.55e-05 |
| 28 | M | 0.0035966 | 0.0059403 | 0.0240715 | 0.1721674 | 2.54e-05 | 3.99e-05 | 3.96e-05 | 2.06e-05 | 4.53e-05 | 1.54e-05 | 0.0001032 | 0.0002379 | 0.0008223 | 0.0102257 | 1.2e-06 | 7.2e-06 | 8.40e-06 | 2.5e-06 | 4.2e-06 | 7e-07 | 0.0047649 | 0.0086295 | 0.0350964 | 0.2983075 | 3.86e-05 | 0.0001436 | 0.0001489 | 4.76e-05 | 9.47e-05 | 2.32e-05 |
Construir el modelo bajo el algoritmo KNN en donde la variable diagnostico depende de todos las variables numéricas.
modelo <- train.kknn(data = datos.entrenamiento, formula = diagnostico ~ ., kmax = 30)
summary(modelo)
##
## Call:
## train.kknn(formula = diagnostico ~ ., data = datos.entrenamiento, kmax = 30)
##
## Type of response variable: nominal
## Minimal misclassification: 0.03015075
## Best kernel: optimal
## Best k: 12
predicciones <- predict(object = modelo, newdata = datos.validacion)
Solo se observan los primeros 20 registros a comparar
datos.comparar <- data.frame("real" = datos.validacion$diagnostico, "predicho" = predicciones)
kable(head(datos.comparar, 20), caption = "Datos a comparar previo a matriz de confusión" )
| real | predicho |
|---|---|
| B | B |
| B | B |
| M | M |
| B | B |
| B | B |
| B | B |
| B | B |
| B | B |
| B | B |
| M | M |
| B | B |
| M | M |
| B | B |
| B | B |
| B | B |
| M | M |
| M | M |
| M | B |
| B | B |
| M | M |
Con la función confussion el estadístico Accuracy = Exacitud.
matriz <- confusionMatrix(datos.comparar$real, datos.comparar$predicho)
matriz
## Confusion Matrix and Statistics
##
## Reference
## Prediction B M
## B 108 2
## M 7 54
##
## Accuracy : 0.9474
## 95% CI : (0.9024, 0.9757)
## No Information Rate : 0.6725
## P-Value [Acc > NIR] : <2e-16
##
## Kappa : 0.8832
##
## Mcnemar's Test P-Value : 0.1824
##
## Sensitivity : 0.9391
## Specificity : 0.9643
## Pos Pred Value : 0.9818
## Neg Pred Value : 0.8852
## Prevalence : 0.6725
## Detection Rate : 0.6316
## Detection Prevalence : 0.6433
## Balanced Accuracy : 0.9517
##
## 'Positive' Class : B
##
Se utiliza la función knn de la librería class para estimar predicciones. Se utiliza la variable predicciones.2 para diferencias de predicciones.
predicciones.2 <- knn(train = datos.entrenamiento[,2:31], test = datos.validacion[,2:31], cl = datos.entrenamiento$diagnostico, k = 12)
Determinando la matriz de confusion con predicciones.2
matriz2 <- confusionMatrix(datos.validacion$diagnostico, predicciones.2)
matriz2
## Confusion Matrix and Statistics
##
## Reference
## Prediction B M
## B 106 4
## M 9 52
##
## Accuracy : 0.924
## 95% CI : (0.8735, 0.9589)
## No Information Rate : 0.6725
## P-Value [Acc > NIR] : 3.84e-15
##
## Kappa : 0.8313
##
## Mcnemar's Test P-Value : 0.2673
##
## Sensitivity : 0.9217
## Specificity : 0.9286
## Pos Pred Value : 0.9636
## Neg Pred Value : 0.8525
## Prevalence : 0.6725
## Detection Rate : 0.6199
## Detection Prevalence : 0.6433
## Balanced Accuracy : 0.9252
##
## 'Positive' Class : B
##
El modelo KNN con la función train.kknn() arroja una exactitud del 94%, significa que el modelo acierta en 94 ocasiones de cada cien pacientes.
El modelo KNN con la función knn() arroja una exactitud del 92%, significa que el modelo acierta en 92 ocasiones de cada cien pacientes.
Siendo el mismo algoritmo las funciones mismas arrojan diferentes estadísticos. Esto supone el algoritmo es diferente en cada función que que cada una de ellas encapsula u propio código dependiendo del paquete y del autor.
Se puede comparar contra otros modelos:
Árbol de Clasificación su exactitud fue de: pendiente
SVM su exactitud fue de: pendiente
Regresión logística su exactitud fue de: pendiente