El análisis de componentes principales (principal component analysis) o PCA es una de las técnicas de aprendizaje no supervisado, las cuales suelen aplicarse como parte del análisis exploratorio de los datos. Una de las aplicaciones de PCA es la reducción de dimensionalidad (variables), perdiendo la menor cantidad de información (varianza) posible: cuando contamos con un gran número de variables cuantitativas posiblemente correlacionadas (indicativo de existencia de información redundante), PCA permite reducirlas a un número menor de variables transformadas (componentes principales) que explican gran parte de la variabilidad en los datos.
Esta evaluación y modificación que se realizará en el conjunto de datos es parte del proceso de la metodología CRISP-DM, la cual es la construcción de nuevos datos.
data <- read.csv("https://raw.githubusercontent.com/cnahuina/data-mineria/master/breast-cancer.csv")
De manera previa se tiene una limpieza general de los datos.
Como se mostró anteriormente, la última columna no se toma en cuenta y la otra variable a no tomar en cuenta es el identificador.
data_var <- data[,3:32]
Limpiar la data
Capar los valores extremos, es decir, localizar todo lo que cayera fuera del bigote más arriba o más abajo de 1,5 veces de el rango intercuartilico. Y decidir capar dichas obsevaciones sustituyendolas con el percentil número 5. En el caso de los que están debajo del bigote inferior y con el percentil 95 con los que están por encima del bigote superior.
replace_outliers <- function(x, removeNA = TRUE){
qrts <- quantile(x, probs = c(0.25, 0.75), na.rm = removeNA)
caps <- quantile(x, probs = c(.05, 0.95), na.rm = removeNA)
iqr <- qrts[2]-qrts[1]
h <- 1.5*iqr
x[x<qrts[1]-h] <- caps[1]
x[x>qrts[2]+h] <- caps[2]
x
}
data_var$radius_mean <- replace_outliers(data_var$radius_mean)
data_var$texture_mean <- replace_outliers(data_var$texture_mean)
data_var$perimeter_mean <- replace_outliers(data_var$perimeter_mean)
data_var$area_mean <- replace_outliers(data_var$area_mean)
data_var$smoothness_mean <- replace_outliers(data_var$smoothness_mean )
data_var$compactness_mean <- replace_outliers(data_var$compactness_mean)
data_var$concavity_mean <- replace_outliers(data_var$concavity_mean)
data_var$concave.points_mean <- replace_outliers(data_var$concave.points_mean)
data_var$symmetry_mean <- replace_outliers(data_var$symmetry_mean)
data_var$fractal_dimension_mean <- replace_outliers(data_var$fractal_dimension_mean)
data_var$radius_se <- replace_outliers(data_var$radius_se)
data_var$texture_se <- replace_outliers(data_var$texture_se )
data_var$perimeter_se <- replace_outliers(data_var$perimeter_se)
data_var$area_se <- replace_outliers(data_var$area_se)
data_var$smoothness_se <- replace_outliers(data_var$smoothness_se)
data_var$compactness_se <- replace_outliers(data_var$compactness_se)
data_var$concavity_se <- replace_outliers(data_var$concavity_se)
data_var$concave.points_se <- replace_outliers(data_var$concave.points_se)
data_var$symmetry_se <- replace_outliers(data_var$symmetry_se)
data_var$fractal_dimension_se <- replace_outliers(data_var$fractal_dimension_s)
data_var$radius_worst <- replace_outliers(data_var$radius_worst)
data_var$texture_worst <- replace_outliers(data_var$texture_worst )
data_var$perimeter_worst <- replace_outliers(data_var$perimeter_worst)
data_var$area_worst <- replace_outliers(data_var$area_worst)
data_var$smoothness_worst <- replace_outliers(data_var$smoothness_worst)
data_var$compactness_worst <- replace_outliers(data_var$compactness_worst)
data_var$concavity_worst <- replace_outliers(data_var$concavity_worst)
data_var$concave.points_worst <- replace_outliers(data_var$concave.points_worst)
data_var$symmetry_worst <- replace_outliers(data_var$symmetry_worst)
Se establece una semilla
set.seed(2018)
library(corrplot)
cor(data_var)
## radius_mean texture_mean perimeter_mean area_mean
## radius_mean 1.00000000 0.333768563 0.99746535 0.993332519
## texture_mean 0.33376856 1.000000000 0.34094427 0.343001580
## perimeter_mean 0.99746535 0.340944272 1.00000000 0.991450806
## area_mean 0.99333252 0.343001580 0.99145081 1.000000000
## smoothness_mean 0.16613701 -0.005972216 0.20335825 0.170815960
## compactness_mean 0.49768803 0.242677318 0.54877990 0.498715668
## concavity_mean 0.67402200 0.326599091 0.71374782 0.684270463
## concave.points_mean 0.80418961 0.304175168 0.83466039 0.810909295
## symmetry_mean 0.14495366 0.091591946 0.17923662 0.155082395
## fractal_dimension_mean -0.32220782 -0.064781678 -0.26964540 -0.303407288
## radius_se 0.67585052 0.324923667 0.68946986 0.709553294
## texture_se -0.11356719 0.422983238 -0.10458425 -0.087898037
## perimeter_se 0.67554203 0.337195035 0.69432336 0.707054681
## area_se 0.80040518 0.339638190 0.80964099 0.831377790
## smoothness_se -0.28629947 0.027542408 -0.26701928 -0.243455750
## compactness_se 0.23126066 0.226354384 0.27868133 0.241075746
## concavity_se 0.32774728 0.244162548 0.36851354 0.337608581
## concave.points_se 0.42452806 0.188294813 0.45730478 0.427872337
## symmetry_se -0.18296265 0.006935905 -0.16461220 -0.153721804
## fractal_dimension_se -0.01249584 0.115792221 0.03242500 0.008597376
## radius_worst 0.97300475 0.355625159 0.97401973 0.974071399
## texture_worst 0.30484292 0.906801131 0.31174473 0.308126243
## perimeter_worst 0.96606030 0.365727214 0.97242535 0.967257176
## area_worst 0.95786939 0.355199886 0.95920857 0.971475477
## smoothness_worst 0.13267495 0.079683802 0.16376463 0.142154866
## compactness_worst 0.45609714 0.272778738 0.50104761 0.447946862
## concavity_worst 0.56702510 0.311692282 0.60523518 0.565575668
## concave.points_worst 0.74391234 0.297646464 0.77310475 0.739347807
## symmetry_worst 0.19063984 0.107380091 0.21590882 0.187623516
## fractal_dimension_worst 0.01143701 0.118036583 0.05859202 0.014904028
## smoothness_mean compactness_mean concavity_mean
## radius_mean 0.166137006 0.49768803 0.67402200
## texture_mean -0.005972216 0.24267732 0.32659909
## perimeter_mean 0.203358251 0.54877990 0.71374782
## area_mean 0.170815960 0.49871567 0.68427046
## smoothness_mean 1.000000000 0.65887004 0.52629526
## compactness_mean 0.658870038 1.00000000 0.88993531
## concavity_mean 0.526295261 0.88993531 1.00000000
## concave.points_mean 0.561507233 0.83134376 0.92846135
## symmetry_mean 0.551348562 0.58234520 0.48253973
## fractal_dimension_mean 0.583862748 0.54855552 0.31370724
## radius_se 0.316839793 0.53103228 0.64005923
## texture_se 0.097608252 0.04073925 0.06155412
## perimeter_se 0.316829495 0.57748136 0.67321349
## area_se 0.276492223 0.53133275 0.67128381
## smoothness_se 0.344489796 0.12243847 0.06744569
## compactness_se 0.339053593 0.77817078 0.69525409
## concavity_se 0.309226306 0.72474243 0.77212387
## concave.points_se 0.419259182 0.70297190 0.71114374
## symmetry_se 0.148644407 0.13277751 0.07876289
## fractal_dimension_se 0.366883754 0.61762150 0.50050439
## radius_worst 0.227246454 0.54407142 0.70765327
## texture_worst 0.054539625 0.24573483 0.32194549
## perimeter_worst 0.249900841 0.59415210 0.74266754
## area_worst 0.227618632 0.53710142 0.70845249
## smoothness_worst 0.800757737 0.56484404 0.46989277
## compactness_worst 0.482702665 0.88200315 0.80282250
## concavity_worst 0.444025853 0.83331922 0.90731565
## concave.points_worst 0.512258451 0.82491388 0.88728707
## symmetry_worst 0.403887512 0.48875532 0.41521338
## fractal_dimension_worst 0.487057785 0.69226189 0.54298435
## concave.points_mean symmetry_mean
## radius_mean 0.804189608 0.14495366
## texture_mean 0.304175168 0.09159195
## perimeter_mean 0.834660394 0.17923662
## area_mean 0.810909295 0.15508240
## smoothness_mean 0.561507233 0.55134856
## compactness_mean 0.831343761 0.58234520
## concavity_mean 0.928461346 0.48253973
## concave.points_mean 1.000000000 0.45751099
## symmetry_mean 0.457510988 1.00000000
## fractal_dimension_mean 0.170648576 0.46304292
## radius_se 0.725692162 0.32984582
## texture_se 0.015494565 0.14414977
## perimeter_se 0.744721878 0.33320741
## area_se 0.777292123 0.27846203
## smoothness_se 0.005461134 0.20648819
## compactness_se 0.531826077 0.42018414
## concavity_se 0.585814564 0.36177637
## concave.points_se 0.678599440 0.37511588
## symmetry_se 0.022505900 0.39084369
## fractal_dimension_se 0.340670183 0.39403467
## radius_worst 0.830440842 0.19624965
## texture_worst 0.299144115 0.11112738
## perimeter_worst 0.852892438 0.22726538
## area_worst 0.826899258 0.19948912
## smoothness_worst 0.466239038 0.43013245
## compactness_worst 0.707309028 0.47375819
## concavity_worst 0.787789344 0.43645127
## concave.points_worst 0.918791350 0.42718608
## symmetry_worst 0.389031855 0.70758822
## fractal_dimension_worst 0.388921332 0.44821641
## fractal_dimension_mean radius_se texture_se
## radius_mean -0.322207815 0.675850516 -0.11356719
## texture_mean -0.064781678 0.324923667 0.42298324
## perimeter_mean -0.269645404 0.689469857 -0.10458425
## area_mean -0.303407288 0.709553294 -0.08789804
## smoothness_mean 0.583862748 0.316839793 0.09760825
## compactness_mean 0.548555519 0.531032284 0.04073925
## concavity_mean 0.313707241 0.640059233 0.06155412
## concave.points_mean 0.170648576 0.725692162 0.01549457
## symmetry_mean 0.463042921 0.329845821 0.14414977
## fractal_dimension_mean 1.000000000 0.009828685 0.16875952
## radius_se 0.009828685 1.000000000 0.24749700
## texture_se 0.168759516 0.247497000 1.00000000
## perimeter_se 0.042446603 0.968830153 0.24301857
## area_se -0.091186689 0.951676453 0.14588179
## smoothness_se 0.412528855 0.154228391 0.46039234
## compactness_se 0.539609971 0.405830493 0.23523157
## concavity_se 0.403415624 0.440636019 0.20986865
## concave.points_se 0.303575007 0.582591172 0.23888947
## symmetry_se 0.311261000 0.218366086 0.42885117
## fractal_dimension_se 0.711932490 0.295694551 0.30344083
## radius_worst -0.250903651 0.733499001 -0.11683611
## texture_worst -0.036638879 0.241637635 0.46441279
## perimeter_worst -0.200809161 0.733214853 -0.10755958
## area_worst -0.231863776 0.759618757 -0.09356314
## smoothness_worst 0.502454839 0.179201394 -0.03971119
## compactness_worst 0.450384065 0.364242740 -0.09663426
## concavity_worst 0.321575790 0.444296330 -0.06622163
## concave.points_worst 0.175884454 0.581234126 -0.10396139
## symmetry_worst 0.319493048 0.144517649 -0.13535425
## fractal_dimension_worst 0.752467599 0.090470233 -0.03159143
## perimeter_se area_se smoothness_se compactness_se
## radius_mean 0.6755420 0.80040518 -0.286299469 0.2312607
## texture_mean 0.3371950 0.33963819 0.027542408 0.2263544
## perimeter_mean 0.6943234 0.80964099 -0.267019276 0.2786813
## area_mean 0.7070547 0.83137779 -0.243455750 0.2410757
## smoothness_mean 0.3168295 0.27649222 0.344489796 0.3390536
## compactness_mean 0.5774814 0.53133275 0.122438466 0.7781708
## concavity_mean 0.6732135 0.67128381 0.067445691 0.6952541
## concave.points_mean 0.7447219 0.77729212 0.005461134 0.5318261
## symmetry_mean 0.3332074 0.27846203 0.206488195 0.4201841
## fractal_dimension_mean 0.0424466 -0.09118669 0.412528855 0.5396100
## radius_se 0.9688302 0.95167645 0.154228391 0.4058305
## texture_se 0.2430186 0.14588179 0.460392339 0.2352316
## perimeter_se 1.0000000 0.93042765 0.147009469 0.4774706
## area_se 0.9304277 1.00000000 0.025780361 0.3617208
## smoothness_se 0.1470095 0.02578036 1.000000000 0.3036438
## compactness_se 0.4774706 0.36172078 0.303643798 1.0000000
## concavity_se 0.5014484 0.41794250 0.256076935 0.8854517
## concave.points_se 0.6337872 0.54317733 0.325839005 0.7604251
## symmetry_se 0.2205236 0.08706032 0.461012120 0.3242693
## fractal_dimension_se 0.3388881 0.20959473 0.458977506 0.8281693
## radius_worst 0.7262819 0.84615382 -0.266911386 0.2464900
## texture_worst 0.2520989 0.27725866 -0.054683785 0.1669325
## perimeter_worst 0.7447576 0.84504736 -0.253416943 0.3043906
## area_worst 0.7509061 0.87098161 -0.224153975 0.2510187
## smoothness_worst 0.1737924 0.18212227 0.348520104 0.2573855
## compactness_worst 0.4228494 0.40444628 -0.056112230 0.7275143
## concavity_worst 0.4906278 0.49586555 -0.065792747 0.6670502
## concave.points_worst 0.6150970 0.65345520 -0.098728877 0.5355373
## symmetry_worst 0.1514968 0.16788087 -0.106334491 0.2586757
## fractal_dimension_worst 0.1246492 0.07286530 0.120158161 0.6158136
## concavity_se concave.points_se symmetry_se
## radius_mean 0.3277473 0.4245281 -0.182962654
## texture_mean 0.2441625 0.1882948 0.006935905
## perimeter_mean 0.3685135 0.4573048 -0.164612200
## area_mean 0.3376086 0.4278723 -0.153721804
## smoothness_mean 0.3092263 0.4192592 0.148644407
## compactness_mean 0.7247424 0.7029719 0.132777509
## concavity_mean 0.7721239 0.7111437 0.078762891
## concave.points_mean 0.5858146 0.6785994 0.022505900
## symmetry_mean 0.3617764 0.3751159 0.390843688
## fractal_dimension_mean 0.4034156 0.3035750 0.311261000
## radius_se 0.4406360 0.5825912 0.218366086
## texture_se 0.2098686 0.2388895 0.428851174
## perimeter_se 0.5014484 0.6337872 0.220523611
## area_se 0.4179425 0.5431773 0.087060324
## smoothness_se 0.2560769 0.3258390 0.461012120
## compactness_se 0.8854517 0.7604251 0.324269347
## concavity_se 1.0000000 0.7989200 0.250178804
## concave.points_se 0.7989200 1.0000000 0.272734387
## symmetry_se 0.2501788 0.2727344 1.000000000
## fractal_dimension_se 0.7027625 0.5945439 0.389641966
## radius_worst 0.3326466 0.4186309 -0.189616012
## texture_worst 0.1892518 0.1110552 -0.102164378
## perimeter_worst 0.3827335 0.4544369 -0.169198665
## area_worst 0.3352982 0.4153921 -0.164489060
## smoothness_worst 0.2469927 0.2664058 -0.067334485
## compactness_worst 0.6570193 0.5380887 -0.039477206
## concavity_worst 0.7541404 0.5875883 -0.056162772
## concave.points_worst 0.5863031 0.6633392 -0.103930996
## symmetry_worst 0.2133950 0.1425014 0.269917058
## fractal_dimension_worst 0.5038325 0.3316793 0.031368920
## fractal_dimension_se radius_worst texture_worst
## radius_mean -0.012495838 0.97300475 0.30484292
## texture_mean 0.115792221 0.35562516 0.90680113
## perimeter_mean 0.032425002 0.97401973 0.31174473
## area_mean 0.008597376 0.97407140 0.30812624
## smoothness_mean 0.366883754 0.22724645 0.05453963
## compactness_mean 0.617621503 0.54407142 0.24573483
## concavity_mean 0.500504389 0.70765327 0.32194549
## concave.points_mean 0.340670183 0.83044084 0.29914412
## symmetry_mean 0.394034668 0.19624965 0.11112738
## fractal_dimension_mean 0.711932490 -0.25090365 -0.03663888
## radius_se 0.295694551 0.73349900 0.24163764
## texture_se 0.303440832 -0.11683611 0.46441279
## perimeter_se 0.338888080 0.72628189 0.25209890
## area_se 0.209594727 0.84615382 0.27725866
## smoothness_se 0.458977506 -0.26691139 -0.05468379
## compactness_se 0.828169317 0.24649002 0.16693252
## concavity_se 0.702762493 0.33264658 0.18925175
## concave.points_se 0.594543946 0.41863087 0.11105524
## symmetry_se 0.389641966 -0.18961601 -0.10216438
## fractal_dimension_se 1.000000000 0.01384833 0.05194817
## radius_worst 0.013848326 1.00000000 0.35667314
## texture_worst 0.051948169 0.35667314 1.00000000
## perimeter_worst 0.062806500 0.99318265 0.36610900
## area_worst 0.032873160 0.99170436 0.34927512
## smoothness_worst 0.266369531 0.23719750 0.21614797
## compactness_worst 0.510513241 0.52818510 0.34122680
## concavity_worst 0.445355860 0.62614773 0.36804438
## concave.points_worst 0.301647791 0.79606365 0.35642920
## symmetry_worst 0.164601726 0.27907335 0.23936001
## fractal_dimension_worst 0.688976054 0.10515188 0.21471829
## perimeter_worst area_worst smoothness_worst
## radius_mean 0.9660603 0.95786939 0.13267495
## texture_mean 0.3657272 0.35519989 0.07968380
## perimeter_mean 0.9724254 0.95920857 0.16376463
## area_mean 0.9672572 0.97147548 0.14215487
## smoothness_mean 0.2499008 0.22761863 0.80075774
## compactness_mean 0.5941521 0.53710142 0.56484404
## concavity_mean 0.7426675 0.70845249 0.46989277
## concave.points_mean 0.8528924 0.82689926 0.46623904
## symmetry_mean 0.2272654 0.19948912 0.43013245
## fractal_dimension_mean -0.2008092 -0.23186378 0.50245484
## radius_se 0.7332149 0.75961876 0.17920139
## texture_se -0.1075596 -0.09356314 -0.03971119
## perimeter_se 0.7447576 0.75090611 0.17379239
## area_se 0.8450474 0.87098161 0.18212227
## smoothness_se -0.2534169 -0.22415398 0.34852010
## compactness_se 0.3043906 0.25101866 0.25738553
## concavity_se 0.3827335 0.33529823 0.24699269
## concave.points_se 0.4544369 0.41539208 0.26640576
## symmetry_se -0.1691987 -0.16448906 -0.06733449
## fractal_dimension_se 0.0628065 0.03287316 0.26636953
## radius_worst 0.9931827 0.99170436 0.23719750
## texture_worst 0.3661090 0.34927512 0.21614797
## perimeter_worst 1.0000000 0.98483132 0.25729775
## area_worst 0.9848313 1.00000000 0.23899867
## smoothness_worst 0.2572978 0.23899867 1.00000000
## compactness_worst 0.5819577 0.51106468 0.56931807
## concavity_worst 0.6692728 0.61392141 0.52684856
## concave.points_worst 0.8249048 0.78045867 0.54998997
## symmetry_worst 0.3053196 0.26252759 0.50889740
## fractal_dimension_worst 0.1523208 0.10219381 0.60114001
## compactness_worst concavity_worst concave.points_worst
## radius_mean 0.45609714 0.56702510 0.74391234
## texture_mean 0.27277874 0.31169228 0.29764646
## perimeter_mean 0.50104761 0.60523518 0.77310475
## area_mean 0.44794686 0.56557567 0.73934781
## smoothness_mean 0.48270266 0.44402585 0.51225845
## compactness_mean 0.88200315 0.83331922 0.82491388
## concavity_mean 0.80282250 0.90731565 0.88728707
## concave.points_mean 0.70730903 0.78778934 0.91879135
## symmetry_mean 0.47375819 0.43645127 0.42718608
## fractal_dimension_mean 0.45038407 0.32157579 0.17588445
## radius_se 0.36424274 0.44429633 0.58123413
## texture_se -0.09663426 -0.06622163 -0.10396139
## perimeter_se 0.42284945 0.49062777 0.61509699
## area_se 0.40444628 0.49586555 0.65345520
## smoothness_se -0.05611223 -0.06579275 -0.09872888
## compactness_se 0.72751426 0.66705021 0.53553732
## concavity_se 0.65701934 0.75414044 0.58630307
## concave.points_se 0.53808866 0.58758830 0.66333925
## symmetry_se -0.03947721 -0.05616277 -0.10393100
## fractal_dimension_se 0.51051324 0.44535586 0.30164779
## radius_worst 0.52818510 0.62614773 0.79606365
## texture_worst 0.34122680 0.36804438 0.35642920
## perimeter_worst 0.58195768 0.66927284 0.82490478
## area_worst 0.51106468 0.61392141 0.78045867
## smoothness_worst 0.56931807 0.52684856 0.54998997
## compactness_worst 1.00000000 0.90790640 0.82833879
## concavity_worst 0.90790640 1.00000000 0.87940442
## concave.points_worst 0.82833879 0.87940442 1.00000000
## symmetry_worst 0.59774079 0.52963659 0.50750722
## fractal_dimension_worst 0.77184344 0.66338519 0.51111415
## symmetry_worst fractal_dimension_worst
## radius_mean 0.1906398 0.01143701
## texture_mean 0.1073801 0.11803658
## perimeter_mean 0.2159088 0.05859202
## area_mean 0.1876235 0.01490403
## smoothness_mean 0.4038875 0.48705779
## compactness_mean 0.4887553 0.69226189
## concavity_mean 0.4152134 0.54298435
## concave.points_mean 0.3890319 0.38892133
## symmetry_mean 0.7075882 0.44821641
## fractal_dimension_mean 0.3194930 0.75246760
## radius_se 0.1445176 0.09047023
## texture_se -0.1353543 -0.03159143
## perimeter_se 0.1514968 0.12464916
## area_se 0.1678809 0.07286530
## smoothness_se -0.1063345 0.12015816
## compactness_se 0.2586757 0.61581361
## concavity_se 0.2133950 0.50383249
## concave.points_se 0.1425014 0.33167926
## symmetry_se 0.2699171 0.03136892
## fractal_dimension_se 0.1646017 0.68897605
## radius_worst 0.2790733 0.10515188
## texture_worst 0.2393600 0.21471829
## perimeter_worst 0.3053196 0.15232085
## area_worst 0.2625276 0.10219381
## smoothness_worst 0.5088974 0.60114001
## compactness_worst 0.5977408 0.77184344
## concavity_worst 0.5296366 0.66338519
## concave.points_worst 0.5075072 0.51111415
## symmetry_worst 1.0000000 0.52881130
## fractal_dimension_worst 0.5288113 1.00000000
corrplot(cor(data_var))
1ra Prueba
Formulación de hipótesis:
Ho: Matriz de correlación iguales de cero
H1: Matriz de correlación diferente de cero
library(psych)
Evaluamos el p-valor y este tiene que ser < al nivel de significancia, el cual equivale a 0.05. Siendo así, se interpretaría que se rechaza a Ho.
cortest(data_var)
## Tests of correlation matrices
## Call:cortest(R1 = data_var)
## Chi Square value 135780.1 with df = 435 with probability < 0
Se tiene: 0 < 0.05. Entonces, se cumple, por lo tanto se rechazo Ho. Es decir, que existe correlaciones significativas entre las variables estudiadas.
2da Prueba
bartlett.test(data_var)
##
## Bartlett test of homogeneity of variances
##
## data: data_var
## Bartlett's K-squared = 208467, df = 29, p-value < 2.2e-16
Se tiene: 2.2e-16 < 0.05. Entonces, se cumple, por lo tanto se rechaza Ho Esto quiere decir que la matriz de correlaciones es distinta a la de la matriz de identidad (Se prueba este supuesto)
3ra Prueba
Ahora realizamos la prueba de KMO - Kaiser Meyer Olkin, el cual nos permite evaluar si se justifica el uso de PCA.
Se tiene la siguiente regla, si el valor del KMO es mayor o igual que 0.5. Entonces, se cumple la justificación del uso de PCA
KMO(data_var)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = data_var)
## Overall MSA = 0.86
## MSA for each item =
## radius_mean texture_mean perimeter_mean
## 0.86 0.64 0.90
## area_mean smoothness_mean compactness_mean
## 0.90 0.84 0.94
## concavity_mean concave.points_mean symmetry_mean
## 0.92 0.93 0.79
## fractal_dimension_mean radius_se texture_se
## 0.92 0.85 0.47
## perimeter_se area_se smoothness_se
## 0.88 0.96 0.67
## compactness_se concavity_se concave.points_se
## 0.88 0.87 0.89
## symmetry_se fractal_dimension_se radius_worst
## 0.53 0.88 0.84
## texture_worst perimeter_worst area_worst
## 0.59 0.89 0.89
## smoothness_worst compactness_worst concavity_worst
## 0.77 0.88 0.88
## concave.points_worst symmetry_worst fractal_dimension_worst
## 0.91 0.67 0.88
Overall MSA = 0.86 >= 0.5, se cumple. Por lo tanto de justifica el PCA.
Se cumplen las pruebas preliminares, entonces procedemos a la realización de PCA.
pca <- prcomp(data_var, center = TRUE,
scale = TRUE)
pca
## Standard deviations (1, .., p=30):
## [1] 3.74289107 2.36190685 1.65704734 1.38111718 1.26553770 1.07313219
## [7] 0.77981962 0.66078938 0.59281321 0.55604237 0.52872856 0.46723675
## [13] 0.41305156 0.37990028 0.31908018 0.25289457 0.23058952 0.22085332
## [19] 0.20425114 0.19441575 0.18353131 0.15455616 0.14462946 0.14217843
## [25] 0.13052403 0.10603961 0.09475057 0.05156586 0.03530385 0.02006342
##
## Rotation (n x k) = (30 x 30):
## PC1 PC2 PC3 PC4
## radius_mean -0.211842390 0.24156980 0.01072350 -0.04435498
## texture_mean -0.106172696 0.05375799 -0.18459065 0.57523302
## perimeter_mean -0.220520774 0.22184069 0.01374533 -0.04440885
## area_mean -0.214529913 0.23647961 -0.01217047 -0.04862866
## smoothness_mean -0.138992303 -0.19425890 0.10984386 -0.09482556
## compactness_mean -0.233821115 -0.15380581 0.07969792 -0.03228401
## concavity_mean -0.251644921 -0.05498377 0.02591704 -0.01645214
## concave.points_mean -0.254248452 0.02945602 0.02565609 -0.06712488
## symmetry_mean -0.132429040 -0.18916892 0.04423381 -0.02496103
## fractal_dimension_mean -0.059013428 -0.36934177 0.05920899 -0.02197684
## radius_se -0.205196940 0.08470080 -0.26157604 -0.12560526
## texture_se -0.018978657 -0.11000488 -0.44479529 0.29092769
## perimeter_se -0.212271896 0.06902213 -0.25685510 -0.12042615
## area_se -0.216614574 0.14155617 -0.19051139 -0.10523364
## smoothness_se -0.007855442 -0.23828173 -0.30741440 -0.08028699
## compactness_se -0.175721522 -0.22513817 -0.12109833 -0.03566960
## concavity_se -0.185279711 -0.16881507 -0.12444795 -0.04301721
## concave.points_se -0.191849047 -0.11540670 -0.19085816 -0.16070441
## symmetry_se -0.016653169 -0.18946053 -0.32124048 -0.12428698
## fractal_dimension_se -0.120436643 -0.29454926 -0.15242588 -0.05946485
## radius_worst -0.224320568 0.22042649 0.03629719 -0.01529300
## texture_worst -0.105091373 0.04216131 -0.08395817 0.64499571
## perimeter_worst -0.232231589 0.19914909 0.03927971 -0.01078620
## area_worst -0.224110112 0.21546095 0.01270754 -0.02482157
## smoothness_worst -0.128250716 -0.17625369 0.23672761 0.07008853
## compactness_worst -0.215649212 -0.12943042 0.21340747 0.09776673
## concavity_worst -0.230450442 -0.07627579 0.16711366 0.08440241
## concave.points_worst -0.247709007 0.01138582 0.15090704 0.01215740
## symmetry_worst -0.121917736 -0.12103730 0.26522136 0.11364176
## fractal_dimension_worst -0.130806798 -0.27433054 0.22698695 0.12195635
## PC5 PC6 PC7 PC8
## radius_mean 0.012621691 -0.0077002735 0.079433410 -0.014103993
## texture_mean 0.017240201 0.0045068893 -0.002459530 0.107648733
## perimeter_mean 0.017666144 -0.0079779012 0.071913673 -0.027454038
## area_mean -0.004838251 -0.0008891651 0.043367400 0.031458698
## smoothness_mean -0.375512697 0.3125307663 0.093812543 -0.347861496
## compactness_mean 0.050237462 0.0143289251 -0.015468919 -0.183458110
## concavity_mean 0.084074139 0.0170240079 0.125549209 -0.076933613
## concave.points_mean -0.056253928 0.0655331940 0.098246162 -0.162274740
## symmetry_mean -0.344374831 -0.3751681099 0.076561915 -0.259506211
## fractal_dimension_mean -0.028021197 0.0850444261 -0.342944589 -0.178397703
## radius_se -0.152306542 0.0101571265 -0.322929761 0.039664577
## texture_se -0.123590471 0.0527695973 0.065739779 -0.404002707
## perimeter_se -0.095568737 -0.0025709810 -0.289315651 0.020474232
## area_se -0.117761562 0.0160423627 -0.271030538 0.061132580
## smoothness_se -0.206475087 0.3397086831 0.216784018 0.539206501
## compactness_se 0.320217297 -0.0914691290 0.029362950 0.056277724
## concavity_se 0.340403756 -0.0472863344 0.264300393 0.052199510
## concave.points_se 0.165606961 0.0648265560 0.361198922 -0.130738811
## symmetry_se -0.217711838 -0.4667945645 0.134357881 0.208029686
## fractal_dimension_se 0.227110586 0.0023166122 -0.296683067 0.079636362
## radius_worst -0.039934261 0.0061643082 -0.042880731 0.066268276
## texture_worst -0.039982498 0.0308715599 0.007010371 -0.005643547
## perimeter_worst -0.019399266 -0.0063213346 -0.040163773 0.058876704
## area_worst -0.054303762 0.0180556827 -0.085736371 0.103996784
## smoothness_worst -0.336483456 0.3899813297 0.102668483 0.268384665
## compactness_worst 0.148847539 -0.0582655672 -0.047910331 0.084319447
## concavity_worst 0.166567755 -0.0323000863 0.131785749 0.080112131
## concave.points_worst 0.017005506 0.0444740319 0.164168651 -0.026615184
## symmetry_worst -0.305943001 -0.4877783518 0.063820137 0.224889304
## fractal_dimension_worst 0.116660254 0.0189583967 -0.369605560 0.111994232
## PC9 PC10 PC11 PC12
## radius_mean 0.24656798 -0.160171159 0.0178511810 0.063204307
## texture_mean -0.02499843 -0.119226764 -0.5337833257 -0.097344751
## perimeter_mean 0.24476425 -0.150566075 -0.0006470287 0.042268733
## area_mean 0.24033068 -0.160806570 0.0384012691 0.006041808
## smoothness_mean 0.14804225 0.133388356 -0.2121949051 -0.044580408
## compactness_mean 0.08700211 -0.005356641 -0.2040394486 -0.036623449
## concavity_mean 0.05519674 0.048051260 0.0228757134 -0.416643340
## concave.points_mean 0.11212399 0.010314805 -0.1041993158 -0.090665152
## symmetry_mean -0.31335612 -0.621986742 0.0242319064 -0.112156186
## fractal_dimension_mean 0.16976445 0.046032881 -0.1685082204 -0.107691310
## radius_se -0.28851085 0.148295393 -0.0140983366 -0.060479248
## texture_se 0.15660351 0.092670577 0.6079333722 0.094199661
## perimeter_se -0.30653640 0.178423952 -0.0620504248 -0.019536235
## area_se -0.18291532 0.072800966 0.0484992637 -0.032740051
## smoothness_se 0.01722871 -0.310790523 0.0659324205 -0.076853816
## compactness_se -0.01351116 -0.074756009 -0.0150999064 0.108962750
## concavity_se -0.16449663 0.054115901 0.1093949464 -0.337488382
## concave.points_se -0.25915721 0.070191954 -0.1943597837 0.566267954
## symmetry_se 0.43368161 0.366373264 -0.1826072893 -0.043135600
## fractal_dimension_se 0.25040965 -0.307105997 0.0416455171 0.186788678
## radius_worst 0.10010291 -0.046271347 0.0702802343 0.060049733
## texture_worst -0.04322910 0.052582076 -0.0188181530 0.077609586
## perimeter_worst 0.09018193 -0.031602620 0.0559307629 0.055187060
## area_worst 0.09434445 -0.053207728 0.0903113081 -0.002538133
## smoothness_worst -0.05330843 0.102985638 0.1233912204 0.069108475
## compactness_worst -0.05358964 0.116662183 0.0636626516 0.097411450
## concavity_worst -0.10980806 0.170592019 0.2094152209 -0.354778008
## concave.points_worst -0.08750940 0.142814548 -0.0016261104 0.242483996
## symmetry_worst -0.06715095 0.113339495 0.1315672575 0.192715257
## fractal_dimension_worst 0.09623668 -0.028114838 0.1544485516 0.153508932
## PC13 PC14 PC15 PC16
## radius_mean -0.044724072 -0.061567582 0.085331746 -0.091914923
## texture_mean 0.013384449 -0.071890969 -0.043169466 -0.145041576
## perimeter_mean -0.053622625 -0.026276006 0.086858346 -0.067591922
## area_mean -0.007104358 -0.049118820 0.125634334 -0.057760431
## smoothness_mean -0.259488247 -0.375774047 -0.191841562 -0.018265680
## compactness_mean -0.246096028 0.316545572 -0.014833499 0.074540583
## concavity_mean 0.259835478 0.124319684 -0.184463333 0.227298156
## concave.points_mean 0.269476626 0.217970846 -0.203695390 0.372630819
## symmetry_mean 0.008818221 0.021882132 -0.040915584 -0.236171375
## fractal_dimension_mean 0.185710818 0.158208666 0.708645461 -0.003107437
## radius_se 0.007288476 -0.026998203 -0.090593168 -0.026380859
## texture_se -0.047188376 0.113274797 0.024164060 -0.047905488
## perimeter_se -0.073582318 0.056390292 -0.068886850 -0.009168764
## area_se -0.005138938 -0.087589758 0.067367052 0.112924471
## smoothness_se 0.034398230 0.320937940 0.043525876 0.047027511
## compactness_se -0.500694839 0.044702585 -0.008819538 0.216123272
## concavity_se -0.031719288 -0.436959659 0.277248843 0.003980901
## concave.points_se 0.256686025 -0.055445568 0.141935965 -0.160954703
## symmetry_se 0.033160130 0.005299244 -0.106885527 -0.267017167
## fractal_dimension_se 0.114132741 -0.350239752 -0.285406092 0.244066399
## radius_worst -0.020859720 -0.046545894 0.086010550 -0.070679210
## texture_worst 0.030995052 -0.049200225 0.050343311 0.187372900
## perimeter_worst -0.063606473 0.005055075 0.097329911 -0.075995873
## area_worst 0.009206383 -0.026952457 0.131466257 -0.046266432
## smoothness_worst -0.099674624 -0.198978693 0.049230107 -0.062449845
## compactness_worst -0.413559136 0.336590481 -0.085377601 -0.183832013
## concavity_worst 0.109402347 -0.023313884 -0.087447530 -0.279781713
## concave.points_worst 0.268814041 0.193567113 -0.109700525 0.023902749
## symmetry_worst 0.019081856 -0.120554840 0.167453763 0.445251195
## fractal_dimension_worst 0.283889991 -0.034204156 -0.216950820 -0.355058173
## PC17 PC18 PC19 PC20
## radius_mean -0.036317037 -0.073046571 -0.14719975 -0.025966924
## texture_mean 0.045078101 -0.018724074 0.07584813 0.112601706
## perimeter_mean -0.038056759 -0.065159481 -0.17671295 -0.024507108
## area_mean -0.054077010 -0.014973000 -0.01888668 -0.054607489
## smoothness_mean 0.125372795 -0.379676318 0.19629795 -0.106135993
## compactness_mean -0.290255517 -0.020394665 -0.38810323 0.442877405
## concavity_mean -0.104555877 0.006130133 -0.03122774 0.035482231
## concave.points_mean -0.083424196 0.109205497 0.05534815 -0.197148159
## symmetry_mean -0.002635704 0.181200910 0.04957807 -0.073579725
## fractal_dimension_mean 0.175025758 0.065402214 0.08112002 -0.075059702
## radius_se -0.094324607 -0.034729859 -0.05607451 0.124574338
## texture_se 0.006475273 -0.013192017 -0.01917205 0.083137356
## perimeter_se 0.270821513 -0.037086179 -0.45642838 -0.388103451
## area_se -0.280988257 0.026292567 0.49322016 0.326135775
## smoothness_se 0.059907602 -0.330547963 0.03708732 -0.022054102
## compactness_se -0.158734155 0.160729846 0.27533620 -0.416953776
## concavity_se -0.166068595 -0.073869570 -0.12662217 -0.059570641
## concave.points_se -0.033393027 -0.086802239 -0.02186898 0.146791707
## symmetry_se -0.057994658 0.201350403 0.04029129 -0.041838650
## fractal_dimension_se 0.375244858 0.167160129 -0.08090806 0.265870669
## radius_worst 0.045757151 -0.020844228 0.05342835 0.005487339
## texture_worst -0.046909442 0.024870593 -0.05250541 -0.155607106
## perimeter_worst 0.164897476 -0.002299707 -0.05710063 -0.075900865
## area_worst 0.039966213 0.056444976 0.23364859 -0.010488508
## smoothness_worst -0.155635802 0.587474256 -0.20425526 0.049098768
## compactness_worst 0.216210169 -0.073245702 0.11882451 0.169647488
## concavity_worst 0.361658244 -0.043980365 0.08749700 0.148933599
## concave.points_worst 0.186084157 0.177397759 0.21162043 -0.140699496
## symmetry_worst 0.041612402 -0.322876090 -0.09572376 0.096536806
## fractal_dimension_worst -0.457908858 -0.289468007 -0.02974475 -0.246153443
## PC21 PC22 PC23 PC24
## radius_mean -0.232015941 0.099234357 0.206506014 0.0913006921
## texture_mean -0.284524593 -0.019403207 -0.229411153 -0.3365259248
## perimeter_mean -0.244988543 0.140085057 0.165617812 0.1183261331
## area_mean -0.183405270 0.035385858 0.051439950 0.0910575245
## smoothness_mean 0.079526386 -0.002545800 0.025738916 0.0251316054
## compactness_mean 0.332764999 0.065908922 -0.046229380 -0.1007000360
## concavity_mean -0.089894898 -0.428459422 -0.106141486 0.2066123970
## concave.points_mean -0.147707419 0.178016680 -0.071052527 -0.1001787112
## symmetry_mean 0.077083650 0.026455615 0.024192439 0.0692685204
## fractal_dimension_mean -0.088022536 -0.022002799 0.086437947 0.0226388952
## radius_se -0.080301020 -0.206021064 0.623298983 -0.2988369658
## texture_se -0.140434341 -0.009808999 -0.139671790 -0.2052001320
## perimeter_se -0.041551446 0.094570893 -0.318970528 0.1497332592
## area_se -0.076640133 0.322066797 -0.221191499 0.3169313274
## smoothness_se 0.085029127 0.065643445 0.051879018 0.0143609381
## compactness_se -0.122307718 -0.242069196 0.066535874 -0.0556231341
## concavity_se 0.166900111 0.332324281 -0.020686477 -0.2031028188
## concave.points_se -0.061225767 -0.264501382 -0.077232166 0.1922970102
## symmetry_se 0.120412203 0.044180958 0.035205436 0.0998814495
## fractal_dimension_se 0.056376352 0.046725545 0.022905565 0.0366193128
## radius_worst 0.247949664 -0.226063345 0.010715246 -0.1764486670
## texture_worst 0.377565475 0.028506437 0.322672725 0.4740689751
## perimeter_worst 0.246207286 -0.010251102 -0.252960270 -0.0612705262
## area_worst 0.352751552 -0.326813902 -0.213419409 -0.1517004201
## smoothness_worst -0.175163961 -0.064803545 -0.068625222 -0.0005696221
## compactness_worst -0.135983004 0.132063233 0.009555864 0.1403253603
## concavity_worst -0.099152855 -0.093128597 0.105293786 0.1313027052
## concave.points_worst 0.166095151 0.388528098 0.175422451 -0.3199084880
## symmetry_worst -0.173143223 -0.070909375 -0.070307314 -0.1310276037
## fractal_dimension_worst -0.003652793 0.017871237 -0.090297491 -0.0332870912
## PC25 PC26 PC27 PC28
## radius_mean 0.057921543 0.020469033 0.134382630 -0.279788963
## texture_mean 0.040645831 -0.014857017 0.037688679 -0.012279581
## perimeter_mean 0.031642586 0.053626687 0.181547516 -0.114934061
## area_mean 0.183110740 -0.285829689 -0.468772162 0.408755352
## smoothness_mean 0.037126772 -0.043056208 0.003427039 0.020500125
## compactness_mean 0.265320923 0.126881693 -0.120890534 -0.012278703
## concavity_mean 0.052455964 -0.402054720 0.361107423 0.038552143
## concave.points_mean -0.463556000 0.348785956 -0.293772627 -0.028380783
## symmetry_mean -0.043488106 -0.017757923 0.016690257 -0.004753927
## fractal_dimension_mean -0.007413677 0.005002139 0.046812241 -0.012669344
## radius_se -0.141360855 -0.024575936 -0.010240670 0.192418400
## texture_se 0.026171521 -0.008671482 0.013516189 -0.008572355
## perimeter_se 0.142592889 -0.072761674 -0.072984773 -0.176731019
## area_se 0.126110528 0.085739405 0.114008680 -0.044238407
## smoothness_se 0.025831170 0.029307890 0.015806801 -0.008497734
## compactness_se 0.233635165 0.178417463 0.037393341 0.022459213
## concavity_se -0.220437738 -0.149763123 -0.009544599 -0.027397394
## concave.points_se -0.095372021 0.050834866 -0.054486239 0.029773747
## symmetry_se -0.045064047 -0.001649603 0.010293989 -0.009327825
## fractal_dimension_se -0.024467360 -0.040300491 -0.011520204 -0.007228536
## radius_worst -0.158017838 0.181407105 0.254862466 -0.437858240
## texture_worst -0.058684936 0.019410386 -0.057764188 0.016000899
## perimeter_worst -0.146035070 0.256807841 0.385693693 0.659628823
## area_worst -0.026874706 -0.218137976 -0.422170071 -0.192344355
## smoothness_worst -0.032266875 0.003431287 0.013720311 -0.005738992
## compactness_worst -0.481580485 -0.326511659 -0.012533702 -0.029168224
## concavity_worst 0.268413983 0.465306806 -0.214129944 -0.010415461
## concave.points_worst 0.368311248 -0.263684363 0.168622076 -0.018861110
## symmetry_worst 0.074007503 0.014196148 -0.022864661 0.003272490
## fractal_dimension_worst 0.005676307 0.029220190 -0.005097350 0.012142839
## PC29 PC30
## radius_mean -0.0273038224 -0.7288048295
## texture_mean 0.0008313481 -0.0009068040
## perimeter_mean -0.4788483268 0.5964157597
## area_mean 0.4612104085 0.1372911682
## smoothness_mean -0.0028006234 0.0050942919
## compactness_mean 0.0167379047 -0.0147489225
## concavity_mean 0.0173247793 -0.0116804376
## concave.points_mean 0.0078183793 -0.0232073046
## symmetry_mean 0.0075963138 0.0033172523
## fractal_dimension_mean 0.0122407466 -0.0099607853
## radius_se -0.0702795575 -0.0217254860
## texture_se -0.0005125859 0.0026002717
## perimeter_se 0.0457789559 0.0139637116
## area_se 0.0267314831 -0.0027774688
## smoothness_se -0.0028538511 -0.0003150268
## compactness_se -0.0102948571 -0.0037929134
## concavity_se 0.0037361205 0.0006432705
## concave.points_se -0.0047960184 0.0034437940
## symmetry_se 0.0045938779 0.0012387126
## fractal_dimension_se 0.0034592228 0.0049532491
## radius_worst 0.5722381807 0.2544036736
## texture_worst -0.0028622058 -0.0018180606
## perimeter_worst -0.0977843021 -0.1386929974
## area_worst -0.4580475947 -0.0915326236
## smoothness_worst -0.0016690032 -0.0028329368
## compactness_worst 0.0314242337 -0.0019703336
## concavity_worst -0.0143448097 0.0031170729
## concave.points_worst -0.0048698113 0.0120322512
## symmetry_worst -0.0159307871 -0.0037460451
## fractal_dimension_worst -0.0120277777 -0.0003723341
scree(data_var)
De acuerdo al gráfico, interpretamos que con solo usar 6 variables podremos obtener un nuevo conjunto de datos, con variables reducidas.
Conjunto de datos más óptimo.
data_comp <- pca$x
comp_data <- data_comp[,1:6]
Agregamos el target.
new_data <- cbind(comp_data, data$diagnosis)
write.csv(new_data, file = "dataCancerFinal2.csv")
getwd()
## [1] "D:/VII CICLO/MINERIA DE DATOS/FINAL/R-MARKDOWN"
Se visualiza el conjunto de datos, con las variables resumidas: https://raw.githubusercontent.com/cnahuina/data-mineria/master/dataCancerFinal2.csv