PCA (análisis de componentes principales)

El análisis de componentes principales (principal component analysis) o PCA es una de las técnicas de aprendizaje no supervisado, las cuales suelen aplicarse como parte del análisis exploratorio de los datos. Una de las aplicaciones de PCA es la reducción de dimensionalidad (variables), perdiendo la menor cantidad de información (varianza) posible: cuando contamos con un gran número de variables cuantitativas posiblemente correlacionadas (indicativo de existencia de información redundante), PCA permite reducirlas a un número menor de variables transformadas (componentes principales) que explican gran parte de la variabilidad en los datos.

Esta evaluación y modificación que se realizará en el conjunto de datos es parte del proceso de la metodología CRISP-DM, la cual es la construcción de nuevos datos.

data <- read.csv("https://raw.githubusercontent.com/cnahuina/data-mineria/master/breast-cancer.csv")

De manera previa se tiene una limpieza general de los datos.

Como se mostró anteriormente, la última columna no se toma en cuenta y la otra variable a no tomar en cuenta es el identificador.

data_var <- data[,3:32]

Limpiar la data

  1. Se hace uso de una función en este caso denominada replace_outliers para eliminar los valores outliers de la data.

Capar los valores extremos, es decir, localizar todo lo que cayera fuera del bigote más arriba o más abajo de 1,5 veces de el rango intercuartilico. Y decidir capar dichas obsevaciones sustituyendolas con el percentil número 5. En el caso de los que están debajo del bigote inferior y con el percentil 95 con los que están por encima del bigote superior.

replace_outliers <- function(x, removeNA = TRUE){
  qrts <- quantile(x, probs = c(0.25, 0.75), na.rm = removeNA)
  caps <- quantile(x, probs = c(.05, 0.95), na.rm = removeNA)
  iqr <- qrts[2]-qrts[1]
  h <- 1.5*iqr
  x[x<qrts[1]-h] <- caps[1]
  x[x>qrts[2]+h] <- caps[2]
  x
}
data_var$radius_mean <- replace_outliers(data_var$radius_mean)
data_var$texture_mean  <- replace_outliers(data_var$texture_mean)
data_var$perimeter_mean  <- replace_outliers(data_var$perimeter_mean)
data_var$area_mean  <- replace_outliers(data_var$area_mean)
data_var$smoothness_mean  <- replace_outliers(data_var$smoothness_mean )
data_var$compactness_mean  <- replace_outliers(data_var$compactness_mean)
data_var$concavity_mean  <- replace_outliers(data_var$concavity_mean)
data_var$concave.points_mean  <- replace_outliers(data_var$concave.points_mean)
data_var$symmetry_mean <- replace_outliers(data_var$symmetry_mean)
data_var$fractal_dimension_mean  <- replace_outliers(data_var$fractal_dimension_mean)
data_var$radius_se <- replace_outliers(data_var$radius_se)
data_var$texture_se  <- replace_outliers(data_var$texture_se )
data_var$perimeter_se  <- replace_outliers(data_var$perimeter_se)
data_var$area_se <- replace_outliers(data_var$area_se)
data_var$smoothness_se  <- replace_outliers(data_var$smoothness_se)
data_var$compactness_se  <- replace_outliers(data_var$compactness_se)
data_var$concavity_se <- replace_outliers(data_var$concavity_se)
data_var$concave.points_se <- replace_outliers(data_var$concave.points_se)
data_var$symmetry_se <- replace_outliers(data_var$symmetry_se)
data_var$fractal_dimension_se <- replace_outliers(data_var$fractal_dimension_s)
data_var$radius_worst <- replace_outliers(data_var$radius_worst)
data_var$texture_worst <- replace_outliers(data_var$texture_worst )
data_var$perimeter_worst  <- replace_outliers(data_var$perimeter_worst)
data_var$area_worst  <- replace_outliers(data_var$area_worst)
data_var$smoothness_worst <- replace_outliers(data_var$smoothness_worst)
data_var$compactness_worst <- replace_outliers(data_var$compactness_worst)
data_var$concavity_worst <- replace_outliers(data_var$concavity_worst)
data_var$concave.points_worst <- replace_outliers(data_var$concave.points_worst)
data_var$symmetry_worst <- replace_outliers(data_var$symmetry_worst)

Se establece una semilla

set.seed(2018)
library(corrplot)
cor(data_var)
##                         radius_mean texture_mean perimeter_mean    area_mean
## radius_mean              1.00000000  0.333768563     0.99746535  0.993332519
## texture_mean             0.33376856  1.000000000     0.34094427  0.343001580
## perimeter_mean           0.99746535  0.340944272     1.00000000  0.991450806
## area_mean                0.99333252  0.343001580     0.99145081  1.000000000
## smoothness_mean          0.16613701 -0.005972216     0.20335825  0.170815960
## compactness_mean         0.49768803  0.242677318     0.54877990  0.498715668
## concavity_mean           0.67402200  0.326599091     0.71374782  0.684270463
## concave.points_mean      0.80418961  0.304175168     0.83466039  0.810909295
## symmetry_mean            0.14495366  0.091591946     0.17923662  0.155082395
## fractal_dimension_mean  -0.32220782 -0.064781678    -0.26964540 -0.303407288
## radius_se                0.67585052  0.324923667     0.68946986  0.709553294
## texture_se              -0.11356719  0.422983238    -0.10458425 -0.087898037
## perimeter_se             0.67554203  0.337195035     0.69432336  0.707054681
## area_se                  0.80040518  0.339638190     0.80964099  0.831377790
## smoothness_se           -0.28629947  0.027542408    -0.26701928 -0.243455750
## compactness_se           0.23126066  0.226354384     0.27868133  0.241075746
## concavity_se             0.32774728  0.244162548     0.36851354  0.337608581
## concave.points_se        0.42452806  0.188294813     0.45730478  0.427872337
## symmetry_se             -0.18296265  0.006935905    -0.16461220 -0.153721804
## fractal_dimension_se    -0.01249584  0.115792221     0.03242500  0.008597376
## radius_worst             0.97300475  0.355625159     0.97401973  0.974071399
## texture_worst            0.30484292  0.906801131     0.31174473  0.308126243
## perimeter_worst          0.96606030  0.365727214     0.97242535  0.967257176
## area_worst               0.95786939  0.355199886     0.95920857  0.971475477
## smoothness_worst         0.13267495  0.079683802     0.16376463  0.142154866
## compactness_worst        0.45609714  0.272778738     0.50104761  0.447946862
## concavity_worst          0.56702510  0.311692282     0.60523518  0.565575668
## concave.points_worst     0.74391234  0.297646464     0.77310475  0.739347807
## symmetry_worst           0.19063984  0.107380091     0.21590882  0.187623516
## fractal_dimension_worst  0.01143701  0.118036583     0.05859202  0.014904028
##                         smoothness_mean compactness_mean concavity_mean
## radius_mean                 0.166137006       0.49768803     0.67402200
## texture_mean               -0.005972216       0.24267732     0.32659909
## perimeter_mean              0.203358251       0.54877990     0.71374782
## area_mean                   0.170815960       0.49871567     0.68427046
## smoothness_mean             1.000000000       0.65887004     0.52629526
## compactness_mean            0.658870038       1.00000000     0.88993531
## concavity_mean              0.526295261       0.88993531     1.00000000
## concave.points_mean         0.561507233       0.83134376     0.92846135
## symmetry_mean               0.551348562       0.58234520     0.48253973
## fractal_dimension_mean      0.583862748       0.54855552     0.31370724
## radius_se                   0.316839793       0.53103228     0.64005923
## texture_se                  0.097608252       0.04073925     0.06155412
## perimeter_se                0.316829495       0.57748136     0.67321349
## area_se                     0.276492223       0.53133275     0.67128381
## smoothness_se               0.344489796       0.12243847     0.06744569
## compactness_se              0.339053593       0.77817078     0.69525409
## concavity_se                0.309226306       0.72474243     0.77212387
## concave.points_se           0.419259182       0.70297190     0.71114374
## symmetry_se                 0.148644407       0.13277751     0.07876289
## fractal_dimension_se        0.366883754       0.61762150     0.50050439
## radius_worst                0.227246454       0.54407142     0.70765327
## texture_worst               0.054539625       0.24573483     0.32194549
## perimeter_worst             0.249900841       0.59415210     0.74266754
## area_worst                  0.227618632       0.53710142     0.70845249
## smoothness_worst            0.800757737       0.56484404     0.46989277
## compactness_worst           0.482702665       0.88200315     0.80282250
## concavity_worst             0.444025853       0.83331922     0.90731565
## concave.points_worst        0.512258451       0.82491388     0.88728707
## symmetry_worst              0.403887512       0.48875532     0.41521338
## fractal_dimension_worst     0.487057785       0.69226189     0.54298435
##                         concave.points_mean symmetry_mean
## radius_mean                     0.804189608    0.14495366
## texture_mean                    0.304175168    0.09159195
## perimeter_mean                  0.834660394    0.17923662
## area_mean                       0.810909295    0.15508240
## smoothness_mean                 0.561507233    0.55134856
## compactness_mean                0.831343761    0.58234520
## concavity_mean                  0.928461346    0.48253973
## concave.points_mean             1.000000000    0.45751099
## symmetry_mean                   0.457510988    1.00000000
## fractal_dimension_mean          0.170648576    0.46304292
## radius_se                       0.725692162    0.32984582
## texture_se                      0.015494565    0.14414977
## perimeter_se                    0.744721878    0.33320741
## area_se                         0.777292123    0.27846203
## smoothness_se                   0.005461134    0.20648819
## compactness_se                  0.531826077    0.42018414
## concavity_se                    0.585814564    0.36177637
## concave.points_se               0.678599440    0.37511588
## symmetry_se                     0.022505900    0.39084369
## fractal_dimension_se            0.340670183    0.39403467
## radius_worst                    0.830440842    0.19624965
## texture_worst                   0.299144115    0.11112738
## perimeter_worst                 0.852892438    0.22726538
## area_worst                      0.826899258    0.19948912
## smoothness_worst                0.466239038    0.43013245
## compactness_worst               0.707309028    0.47375819
## concavity_worst                 0.787789344    0.43645127
## concave.points_worst            0.918791350    0.42718608
## symmetry_worst                  0.389031855    0.70758822
## fractal_dimension_worst         0.388921332    0.44821641
##                         fractal_dimension_mean   radius_se  texture_se
## radius_mean                       -0.322207815 0.675850516 -0.11356719
## texture_mean                      -0.064781678 0.324923667  0.42298324
## perimeter_mean                    -0.269645404 0.689469857 -0.10458425
## area_mean                         -0.303407288 0.709553294 -0.08789804
## smoothness_mean                    0.583862748 0.316839793  0.09760825
## compactness_mean                   0.548555519 0.531032284  0.04073925
## concavity_mean                     0.313707241 0.640059233  0.06155412
## concave.points_mean                0.170648576 0.725692162  0.01549457
## symmetry_mean                      0.463042921 0.329845821  0.14414977
## fractal_dimension_mean             1.000000000 0.009828685  0.16875952
## radius_se                          0.009828685 1.000000000  0.24749700
## texture_se                         0.168759516 0.247497000  1.00000000
## perimeter_se                       0.042446603 0.968830153  0.24301857
## area_se                           -0.091186689 0.951676453  0.14588179
## smoothness_se                      0.412528855 0.154228391  0.46039234
## compactness_se                     0.539609971 0.405830493  0.23523157
## concavity_se                       0.403415624 0.440636019  0.20986865
## concave.points_se                  0.303575007 0.582591172  0.23888947
## symmetry_se                        0.311261000 0.218366086  0.42885117
## fractal_dimension_se               0.711932490 0.295694551  0.30344083
## radius_worst                      -0.250903651 0.733499001 -0.11683611
## texture_worst                     -0.036638879 0.241637635  0.46441279
## perimeter_worst                   -0.200809161 0.733214853 -0.10755958
## area_worst                        -0.231863776 0.759618757 -0.09356314
## smoothness_worst                   0.502454839 0.179201394 -0.03971119
## compactness_worst                  0.450384065 0.364242740 -0.09663426
## concavity_worst                    0.321575790 0.444296330 -0.06622163
## concave.points_worst               0.175884454 0.581234126 -0.10396139
## symmetry_worst                     0.319493048 0.144517649 -0.13535425
## fractal_dimension_worst            0.752467599 0.090470233 -0.03159143
##                         perimeter_se     area_se smoothness_se compactness_se
## radius_mean                0.6755420  0.80040518  -0.286299469      0.2312607
## texture_mean               0.3371950  0.33963819   0.027542408      0.2263544
## perimeter_mean             0.6943234  0.80964099  -0.267019276      0.2786813
## area_mean                  0.7070547  0.83137779  -0.243455750      0.2410757
## smoothness_mean            0.3168295  0.27649222   0.344489796      0.3390536
## compactness_mean           0.5774814  0.53133275   0.122438466      0.7781708
## concavity_mean             0.6732135  0.67128381   0.067445691      0.6952541
## concave.points_mean        0.7447219  0.77729212   0.005461134      0.5318261
## symmetry_mean              0.3332074  0.27846203   0.206488195      0.4201841
## fractal_dimension_mean     0.0424466 -0.09118669   0.412528855      0.5396100
## radius_se                  0.9688302  0.95167645   0.154228391      0.4058305
## texture_se                 0.2430186  0.14588179   0.460392339      0.2352316
## perimeter_se               1.0000000  0.93042765   0.147009469      0.4774706
## area_se                    0.9304277  1.00000000   0.025780361      0.3617208
## smoothness_se              0.1470095  0.02578036   1.000000000      0.3036438
## compactness_se             0.4774706  0.36172078   0.303643798      1.0000000
## concavity_se               0.5014484  0.41794250   0.256076935      0.8854517
## concave.points_se          0.6337872  0.54317733   0.325839005      0.7604251
## symmetry_se                0.2205236  0.08706032   0.461012120      0.3242693
## fractal_dimension_se       0.3388881  0.20959473   0.458977506      0.8281693
## radius_worst               0.7262819  0.84615382  -0.266911386      0.2464900
## texture_worst              0.2520989  0.27725866  -0.054683785      0.1669325
## perimeter_worst            0.7447576  0.84504736  -0.253416943      0.3043906
## area_worst                 0.7509061  0.87098161  -0.224153975      0.2510187
## smoothness_worst           0.1737924  0.18212227   0.348520104      0.2573855
## compactness_worst          0.4228494  0.40444628  -0.056112230      0.7275143
## concavity_worst            0.4906278  0.49586555  -0.065792747      0.6670502
## concave.points_worst       0.6150970  0.65345520  -0.098728877      0.5355373
## symmetry_worst             0.1514968  0.16788087  -0.106334491      0.2586757
## fractal_dimension_worst    0.1246492  0.07286530   0.120158161      0.6158136
##                         concavity_se concave.points_se  symmetry_se
## radius_mean                0.3277473         0.4245281 -0.182962654
## texture_mean               0.2441625         0.1882948  0.006935905
## perimeter_mean             0.3685135         0.4573048 -0.164612200
## area_mean                  0.3376086         0.4278723 -0.153721804
## smoothness_mean            0.3092263         0.4192592  0.148644407
## compactness_mean           0.7247424         0.7029719  0.132777509
## concavity_mean             0.7721239         0.7111437  0.078762891
## concave.points_mean        0.5858146         0.6785994  0.022505900
## symmetry_mean              0.3617764         0.3751159  0.390843688
## fractal_dimension_mean     0.4034156         0.3035750  0.311261000
## radius_se                  0.4406360         0.5825912  0.218366086
## texture_se                 0.2098686         0.2388895  0.428851174
## perimeter_se               0.5014484         0.6337872  0.220523611
## area_se                    0.4179425         0.5431773  0.087060324
## smoothness_se              0.2560769         0.3258390  0.461012120
## compactness_se             0.8854517         0.7604251  0.324269347
## concavity_se               1.0000000         0.7989200  0.250178804
## concave.points_se          0.7989200         1.0000000  0.272734387
## symmetry_se                0.2501788         0.2727344  1.000000000
## fractal_dimension_se       0.7027625         0.5945439  0.389641966
## radius_worst               0.3326466         0.4186309 -0.189616012
## texture_worst              0.1892518         0.1110552 -0.102164378
## perimeter_worst            0.3827335         0.4544369 -0.169198665
## area_worst                 0.3352982         0.4153921 -0.164489060
## smoothness_worst           0.2469927         0.2664058 -0.067334485
## compactness_worst          0.6570193         0.5380887 -0.039477206
## concavity_worst            0.7541404         0.5875883 -0.056162772
## concave.points_worst       0.5863031         0.6633392 -0.103930996
## symmetry_worst             0.2133950         0.1425014  0.269917058
## fractal_dimension_worst    0.5038325         0.3316793  0.031368920
##                         fractal_dimension_se radius_worst texture_worst
## radius_mean                     -0.012495838   0.97300475    0.30484292
## texture_mean                     0.115792221   0.35562516    0.90680113
## perimeter_mean                   0.032425002   0.97401973    0.31174473
## area_mean                        0.008597376   0.97407140    0.30812624
## smoothness_mean                  0.366883754   0.22724645    0.05453963
## compactness_mean                 0.617621503   0.54407142    0.24573483
## concavity_mean                   0.500504389   0.70765327    0.32194549
## concave.points_mean              0.340670183   0.83044084    0.29914412
## symmetry_mean                    0.394034668   0.19624965    0.11112738
## fractal_dimension_mean           0.711932490  -0.25090365   -0.03663888
## radius_se                        0.295694551   0.73349900    0.24163764
## texture_se                       0.303440832  -0.11683611    0.46441279
## perimeter_se                     0.338888080   0.72628189    0.25209890
## area_se                          0.209594727   0.84615382    0.27725866
## smoothness_se                    0.458977506  -0.26691139   -0.05468379
## compactness_se                   0.828169317   0.24649002    0.16693252
## concavity_se                     0.702762493   0.33264658    0.18925175
## concave.points_se                0.594543946   0.41863087    0.11105524
## symmetry_se                      0.389641966  -0.18961601   -0.10216438
## fractal_dimension_se             1.000000000   0.01384833    0.05194817
## radius_worst                     0.013848326   1.00000000    0.35667314
## texture_worst                    0.051948169   0.35667314    1.00000000
## perimeter_worst                  0.062806500   0.99318265    0.36610900
## area_worst                       0.032873160   0.99170436    0.34927512
## smoothness_worst                 0.266369531   0.23719750    0.21614797
## compactness_worst                0.510513241   0.52818510    0.34122680
## concavity_worst                  0.445355860   0.62614773    0.36804438
## concave.points_worst             0.301647791   0.79606365    0.35642920
## symmetry_worst                   0.164601726   0.27907335    0.23936001
## fractal_dimension_worst          0.688976054   0.10515188    0.21471829
##                         perimeter_worst  area_worst smoothness_worst
## radius_mean                   0.9660603  0.95786939       0.13267495
## texture_mean                  0.3657272  0.35519989       0.07968380
## perimeter_mean                0.9724254  0.95920857       0.16376463
## area_mean                     0.9672572  0.97147548       0.14215487
## smoothness_mean               0.2499008  0.22761863       0.80075774
## compactness_mean              0.5941521  0.53710142       0.56484404
## concavity_mean                0.7426675  0.70845249       0.46989277
## concave.points_mean           0.8528924  0.82689926       0.46623904
## symmetry_mean                 0.2272654  0.19948912       0.43013245
## fractal_dimension_mean       -0.2008092 -0.23186378       0.50245484
## radius_se                     0.7332149  0.75961876       0.17920139
## texture_se                   -0.1075596 -0.09356314      -0.03971119
## perimeter_se                  0.7447576  0.75090611       0.17379239
## area_se                       0.8450474  0.87098161       0.18212227
## smoothness_se                -0.2534169 -0.22415398       0.34852010
## compactness_se                0.3043906  0.25101866       0.25738553
## concavity_se                  0.3827335  0.33529823       0.24699269
## concave.points_se             0.4544369  0.41539208       0.26640576
## symmetry_se                  -0.1691987 -0.16448906      -0.06733449
## fractal_dimension_se          0.0628065  0.03287316       0.26636953
## radius_worst                  0.9931827  0.99170436       0.23719750
## texture_worst                 0.3661090  0.34927512       0.21614797
## perimeter_worst               1.0000000  0.98483132       0.25729775
## area_worst                    0.9848313  1.00000000       0.23899867
## smoothness_worst              0.2572978  0.23899867       1.00000000
## compactness_worst             0.5819577  0.51106468       0.56931807
## concavity_worst               0.6692728  0.61392141       0.52684856
## concave.points_worst          0.8249048  0.78045867       0.54998997
## symmetry_worst                0.3053196  0.26252759       0.50889740
## fractal_dimension_worst       0.1523208  0.10219381       0.60114001
##                         compactness_worst concavity_worst concave.points_worst
## radius_mean                    0.45609714      0.56702510           0.74391234
## texture_mean                   0.27277874      0.31169228           0.29764646
## perimeter_mean                 0.50104761      0.60523518           0.77310475
## area_mean                      0.44794686      0.56557567           0.73934781
## smoothness_mean                0.48270266      0.44402585           0.51225845
## compactness_mean               0.88200315      0.83331922           0.82491388
## concavity_mean                 0.80282250      0.90731565           0.88728707
## concave.points_mean            0.70730903      0.78778934           0.91879135
## symmetry_mean                  0.47375819      0.43645127           0.42718608
## fractal_dimension_mean         0.45038407      0.32157579           0.17588445
## radius_se                      0.36424274      0.44429633           0.58123413
## texture_se                    -0.09663426     -0.06622163          -0.10396139
## perimeter_se                   0.42284945      0.49062777           0.61509699
## area_se                        0.40444628      0.49586555           0.65345520
## smoothness_se                 -0.05611223     -0.06579275          -0.09872888
## compactness_se                 0.72751426      0.66705021           0.53553732
## concavity_se                   0.65701934      0.75414044           0.58630307
## concave.points_se              0.53808866      0.58758830           0.66333925
## symmetry_se                   -0.03947721     -0.05616277          -0.10393100
## fractal_dimension_se           0.51051324      0.44535586           0.30164779
## radius_worst                   0.52818510      0.62614773           0.79606365
## texture_worst                  0.34122680      0.36804438           0.35642920
## perimeter_worst                0.58195768      0.66927284           0.82490478
## area_worst                     0.51106468      0.61392141           0.78045867
## smoothness_worst               0.56931807      0.52684856           0.54998997
## compactness_worst              1.00000000      0.90790640           0.82833879
## concavity_worst                0.90790640      1.00000000           0.87940442
## concave.points_worst           0.82833879      0.87940442           1.00000000
## symmetry_worst                 0.59774079      0.52963659           0.50750722
## fractal_dimension_worst        0.77184344      0.66338519           0.51111415
##                         symmetry_worst fractal_dimension_worst
## radius_mean                  0.1906398              0.01143701
## texture_mean                 0.1073801              0.11803658
## perimeter_mean               0.2159088              0.05859202
## area_mean                    0.1876235              0.01490403
## smoothness_mean              0.4038875              0.48705779
## compactness_mean             0.4887553              0.69226189
## concavity_mean               0.4152134              0.54298435
## concave.points_mean          0.3890319              0.38892133
## symmetry_mean                0.7075882              0.44821641
## fractal_dimension_mean       0.3194930              0.75246760
## radius_se                    0.1445176              0.09047023
## texture_se                  -0.1353543             -0.03159143
## perimeter_se                 0.1514968              0.12464916
## area_se                      0.1678809              0.07286530
## smoothness_se               -0.1063345              0.12015816
## compactness_se               0.2586757              0.61581361
## concavity_se                 0.2133950              0.50383249
## concave.points_se            0.1425014              0.33167926
## symmetry_se                  0.2699171              0.03136892
## fractal_dimension_se         0.1646017              0.68897605
## radius_worst                 0.2790733              0.10515188
## texture_worst                0.2393600              0.21471829
## perimeter_worst              0.3053196              0.15232085
## area_worst                   0.2625276              0.10219381
## smoothness_worst             0.5088974              0.60114001
## compactness_worst            0.5977408              0.77184344
## concavity_worst              0.5296366              0.66338519
## concave.points_worst         0.5075072              0.51111415
## symmetry_worst               1.0000000              0.52881130
## fractal_dimension_worst      0.5288113              1.00000000
corrplot(cor(data_var))

Pruebas de hipótesis

1ra Prueba

Formulación de hipótesis:

  1. Ho: Matriz de correlación iguales de cero

  2. H1: Matriz de correlación diferente de cero

library(psych)

Evaluamos el p-valor y este tiene que ser < al nivel de significancia, el cual equivale a 0.05. Siendo así, se interpretaría que se rechaza a Ho.

cortest(data_var)
## Tests of correlation matrices 
## Call:cortest(R1 = data_var)
##  Chi Square value 135780.1  with df =  435   with probability < 0

Se tiene: 0 < 0.05. Entonces, se cumple, por lo tanto se rechazo Ho. Es decir, que existe correlaciones significativas entre las variables estudiadas.

2da Prueba

bartlett.test(data_var)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  data_var
## Bartlett's K-squared = 208467, df = 29, p-value < 2.2e-16

Se tiene: 2.2e-16 < 0.05. Entonces, se cumple, por lo tanto se rechaza Ho Esto quiere decir que la matriz de correlaciones es distinta a la de la matriz de identidad (Se prueba este supuesto)

3ra Prueba

Ahora realizamos la prueba de KMO - Kaiser Meyer Olkin, el cual nos permite evaluar si se justifica el uso de PCA.

Se tiene la siguiente regla, si el valor del KMO es mayor o igual que 0.5. Entonces, se cumple la justificación del uso de PCA

KMO(data_var)
## Kaiser-Meyer-Olkin factor adequacy
## Call: KMO(r = data_var)
## Overall MSA =  0.86
## MSA for each item = 
##             radius_mean            texture_mean          perimeter_mean 
##                    0.86                    0.64                    0.90 
##               area_mean         smoothness_mean        compactness_mean 
##                    0.90                    0.84                    0.94 
##          concavity_mean     concave.points_mean           symmetry_mean 
##                    0.92                    0.93                    0.79 
##  fractal_dimension_mean               radius_se              texture_se 
##                    0.92                    0.85                    0.47 
##            perimeter_se                 area_se           smoothness_se 
##                    0.88                    0.96                    0.67 
##          compactness_se            concavity_se       concave.points_se 
##                    0.88                    0.87                    0.89 
##             symmetry_se    fractal_dimension_se            radius_worst 
##                    0.53                    0.88                    0.84 
##           texture_worst         perimeter_worst              area_worst 
##                    0.59                    0.89                    0.89 
##        smoothness_worst       compactness_worst         concavity_worst 
##                    0.77                    0.88                    0.88 
##    concave.points_worst          symmetry_worst fractal_dimension_worst 
##                    0.91                    0.67                    0.88

Overall MSA = 0.86 >= 0.5, se cumple. Por lo tanto de justifica el PCA.

Se cumplen las pruebas preliminares, entonces procedemos a la realización de PCA.

Realización de PCA

pca <- prcomp(data_var, center = TRUE,
              scale = TRUE)
pca 
## Standard deviations (1, .., p=30):
##  [1] 3.74289107 2.36190685 1.65704734 1.38111718 1.26553770 1.07313219
##  [7] 0.77981962 0.66078938 0.59281321 0.55604237 0.52872856 0.46723675
## [13] 0.41305156 0.37990028 0.31908018 0.25289457 0.23058952 0.22085332
## [19] 0.20425114 0.19441575 0.18353131 0.15455616 0.14462946 0.14217843
## [25] 0.13052403 0.10603961 0.09475057 0.05156586 0.03530385 0.02006342
## 
## Rotation (n x k) = (30 x 30):
##                                  PC1         PC2         PC3         PC4
## radius_mean             -0.211842390  0.24156980  0.01072350 -0.04435498
## texture_mean            -0.106172696  0.05375799 -0.18459065  0.57523302
## perimeter_mean          -0.220520774  0.22184069  0.01374533 -0.04440885
## area_mean               -0.214529913  0.23647961 -0.01217047 -0.04862866
## smoothness_mean         -0.138992303 -0.19425890  0.10984386 -0.09482556
## compactness_mean        -0.233821115 -0.15380581  0.07969792 -0.03228401
## concavity_mean          -0.251644921 -0.05498377  0.02591704 -0.01645214
## concave.points_mean     -0.254248452  0.02945602  0.02565609 -0.06712488
## symmetry_mean           -0.132429040 -0.18916892  0.04423381 -0.02496103
## fractal_dimension_mean  -0.059013428 -0.36934177  0.05920899 -0.02197684
## radius_se               -0.205196940  0.08470080 -0.26157604 -0.12560526
## texture_se              -0.018978657 -0.11000488 -0.44479529  0.29092769
## perimeter_se            -0.212271896  0.06902213 -0.25685510 -0.12042615
## area_se                 -0.216614574  0.14155617 -0.19051139 -0.10523364
## smoothness_se           -0.007855442 -0.23828173 -0.30741440 -0.08028699
## compactness_se          -0.175721522 -0.22513817 -0.12109833 -0.03566960
## concavity_se            -0.185279711 -0.16881507 -0.12444795 -0.04301721
## concave.points_se       -0.191849047 -0.11540670 -0.19085816 -0.16070441
## symmetry_se             -0.016653169 -0.18946053 -0.32124048 -0.12428698
## fractal_dimension_se    -0.120436643 -0.29454926 -0.15242588 -0.05946485
## radius_worst            -0.224320568  0.22042649  0.03629719 -0.01529300
## texture_worst           -0.105091373  0.04216131 -0.08395817  0.64499571
## perimeter_worst         -0.232231589  0.19914909  0.03927971 -0.01078620
## area_worst              -0.224110112  0.21546095  0.01270754 -0.02482157
## smoothness_worst        -0.128250716 -0.17625369  0.23672761  0.07008853
## compactness_worst       -0.215649212 -0.12943042  0.21340747  0.09776673
## concavity_worst         -0.230450442 -0.07627579  0.16711366  0.08440241
## concave.points_worst    -0.247709007  0.01138582  0.15090704  0.01215740
## symmetry_worst          -0.121917736 -0.12103730  0.26522136  0.11364176
## fractal_dimension_worst -0.130806798 -0.27433054  0.22698695  0.12195635
##                                  PC5           PC6          PC7          PC8
## radius_mean              0.012621691 -0.0077002735  0.079433410 -0.014103993
## texture_mean             0.017240201  0.0045068893 -0.002459530  0.107648733
## perimeter_mean           0.017666144 -0.0079779012  0.071913673 -0.027454038
## area_mean               -0.004838251 -0.0008891651  0.043367400  0.031458698
## smoothness_mean         -0.375512697  0.3125307663  0.093812543 -0.347861496
## compactness_mean         0.050237462  0.0143289251 -0.015468919 -0.183458110
## concavity_mean           0.084074139  0.0170240079  0.125549209 -0.076933613
## concave.points_mean     -0.056253928  0.0655331940  0.098246162 -0.162274740
## symmetry_mean           -0.344374831 -0.3751681099  0.076561915 -0.259506211
## fractal_dimension_mean  -0.028021197  0.0850444261 -0.342944589 -0.178397703
## radius_se               -0.152306542  0.0101571265 -0.322929761  0.039664577
## texture_se              -0.123590471  0.0527695973  0.065739779 -0.404002707
## perimeter_se            -0.095568737 -0.0025709810 -0.289315651  0.020474232
## area_se                 -0.117761562  0.0160423627 -0.271030538  0.061132580
## smoothness_se           -0.206475087  0.3397086831  0.216784018  0.539206501
## compactness_se           0.320217297 -0.0914691290  0.029362950  0.056277724
## concavity_se             0.340403756 -0.0472863344  0.264300393  0.052199510
## concave.points_se        0.165606961  0.0648265560  0.361198922 -0.130738811
## symmetry_se             -0.217711838 -0.4667945645  0.134357881  0.208029686
## fractal_dimension_se     0.227110586  0.0023166122 -0.296683067  0.079636362
## radius_worst            -0.039934261  0.0061643082 -0.042880731  0.066268276
## texture_worst           -0.039982498  0.0308715599  0.007010371 -0.005643547
## perimeter_worst         -0.019399266 -0.0063213346 -0.040163773  0.058876704
## area_worst              -0.054303762  0.0180556827 -0.085736371  0.103996784
## smoothness_worst        -0.336483456  0.3899813297  0.102668483  0.268384665
## compactness_worst        0.148847539 -0.0582655672 -0.047910331  0.084319447
## concavity_worst          0.166567755 -0.0323000863  0.131785749  0.080112131
## concave.points_worst     0.017005506  0.0444740319  0.164168651 -0.026615184
## symmetry_worst          -0.305943001 -0.4877783518  0.063820137  0.224889304
## fractal_dimension_worst  0.116660254  0.0189583967 -0.369605560  0.111994232
##                                 PC9         PC10          PC11         PC12
## radius_mean              0.24656798 -0.160171159  0.0178511810  0.063204307
## texture_mean            -0.02499843 -0.119226764 -0.5337833257 -0.097344751
## perimeter_mean           0.24476425 -0.150566075 -0.0006470287  0.042268733
## area_mean                0.24033068 -0.160806570  0.0384012691  0.006041808
## smoothness_mean          0.14804225  0.133388356 -0.2121949051 -0.044580408
## compactness_mean         0.08700211 -0.005356641 -0.2040394486 -0.036623449
## concavity_mean           0.05519674  0.048051260  0.0228757134 -0.416643340
## concave.points_mean      0.11212399  0.010314805 -0.1041993158 -0.090665152
## symmetry_mean           -0.31335612 -0.621986742  0.0242319064 -0.112156186
## fractal_dimension_mean   0.16976445  0.046032881 -0.1685082204 -0.107691310
## radius_se               -0.28851085  0.148295393 -0.0140983366 -0.060479248
## texture_se               0.15660351  0.092670577  0.6079333722  0.094199661
## perimeter_se            -0.30653640  0.178423952 -0.0620504248 -0.019536235
## area_se                 -0.18291532  0.072800966  0.0484992637 -0.032740051
## smoothness_se            0.01722871 -0.310790523  0.0659324205 -0.076853816
## compactness_se          -0.01351116 -0.074756009 -0.0150999064  0.108962750
## concavity_se            -0.16449663  0.054115901  0.1093949464 -0.337488382
## concave.points_se       -0.25915721  0.070191954 -0.1943597837  0.566267954
## symmetry_se              0.43368161  0.366373264 -0.1826072893 -0.043135600
## fractal_dimension_se     0.25040965 -0.307105997  0.0416455171  0.186788678
## radius_worst             0.10010291 -0.046271347  0.0702802343  0.060049733
## texture_worst           -0.04322910  0.052582076 -0.0188181530  0.077609586
## perimeter_worst          0.09018193 -0.031602620  0.0559307629  0.055187060
## area_worst               0.09434445 -0.053207728  0.0903113081 -0.002538133
## smoothness_worst        -0.05330843  0.102985638  0.1233912204  0.069108475
## compactness_worst       -0.05358964  0.116662183  0.0636626516  0.097411450
## concavity_worst         -0.10980806  0.170592019  0.2094152209 -0.354778008
## concave.points_worst    -0.08750940  0.142814548 -0.0016261104  0.242483996
## symmetry_worst          -0.06715095  0.113339495  0.1315672575  0.192715257
## fractal_dimension_worst  0.09623668 -0.028114838  0.1544485516  0.153508932
##                                 PC13         PC14         PC15         PC16
## radius_mean             -0.044724072 -0.061567582  0.085331746 -0.091914923
## texture_mean             0.013384449 -0.071890969 -0.043169466 -0.145041576
## perimeter_mean          -0.053622625 -0.026276006  0.086858346 -0.067591922
## area_mean               -0.007104358 -0.049118820  0.125634334 -0.057760431
## smoothness_mean         -0.259488247 -0.375774047 -0.191841562 -0.018265680
## compactness_mean        -0.246096028  0.316545572 -0.014833499  0.074540583
## concavity_mean           0.259835478  0.124319684 -0.184463333  0.227298156
## concave.points_mean      0.269476626  0.217970846 -0.203695390  0.372630819
## symmetry_mean            0.008818221  0.021882132 -0.040915584 -0.236171375
## fractal_dimension_mean   0.185710818  0.158208666  0.708645461 -0.003107437
## radius_se                0.007288476 -0.026998203 -0.090593168 -0.026380859
## texture_se              -0.047188376  0.113274797  0.024164060 -0.047905488
## perimeter_se            -0.073582318  0.056390292 -0.068886850 -0.009168764
## area_se                 -0.005138938 -0.087589758  0.067367052  0.112924471
## smoothness_se            0.034398230  0.320937940  0.043525876  0.047027511
## compactness_se          -0.500694839  0.044702585 -0.008819538  0.216123272
## concavity_se            -0.031719288 -0.436959659  0.277248843  0.003980901
## concave.points_se        0.256686025 -0.055445568  0.141935965 -0.160954703
## symmetry_se              0.033160130  0.005299244 -0.106885527 -0.267017167
## fractal_dimension_se     0.114132741 -0.350239752 -0.285406092  0.244066399
## radius_worst            -0.020859720 -0.046545894  0.086010550 -0.070679210
## texture_worst            0.030995052 -0.049200225  0.050343311  0.187372900
## perimeter_worst         -0.063606473  0.005055075  0.097329911 -0.075995873
## area_worst               0.009206383 -0.026952457  0.131466257 -0.046266432
## smoothness_worst        -0.099674624 -0.198978693  0.049230107 -0.062449845
## compactness_worst       -0.413559136  0.336590481 -0.085377601 -0.183832013
## concavity_worst          0.109402347 -0.023313884 -0.087447530 -0.279781713
## concave.points_worst     0.268814041  0.193567113 -0.109700525  0.023902749
## symmetry_worst           0.019081856 -0.120554840  0.167453763  0.445251195
## fractal_dimension_worst  0.283889991 -0.034204156 -0.216950820 -0.355058173
##                                 PC17         PC18        PC19         PC20
## radius_mean             -0.036317037 -0.073046571 -0.14719975 -0.025966924
## texture_mean             0.045078101 -0.018724074  0.07584813  0.112601706
## perimeter_mean          -0.038056759 -0.065159481 -0.17671295 -0.024507108
## area_mean               -0.054077010 -0.014973000 -0.01888668 -0.054607489
## smoothness_mean          0.125372795 -0.379676318  0.19629795 -0.106135993
## compactness_mean        -0.290255517 -0.020394665 -0.38810323  0.442877405
## concavity_mean          -0.104555877  0.006130133 -0.03122774  0.035482231
## concave.points_mean     -0.083424196  0.109205497  0.05534815 -0.197148159
## symmetry_mean           -0.002635704  0.181200910  0.04957807 -0.073579725
## fractal_dimension_mean   0.175025758  0.065402214  0.08112002 -0.075059702
## radius_se               -0.094324607 -0.034729859 -0.05607451  0.124574338
## texture_se               0.006475273 -0.013192017 -0.01917205  0.083137356
## perimeter_se             0.270821513 -0.037086179 -0.45642838 -0.388103451
## area_se                 -0.280988257  0.026292567  0.49322016  0.326135775
## smoothness_se            0.059907602 -0.330547963  0.03708732 -0.022054102
## compactness_se          -0.158734155  0.160729846  0.27533620 -0.416953776
## concavity_se            -0.166068595 -0.073869570 -0.12662217 -0.059570641
## concave.points_se       -0.033393027 -0.086802239 -0.02186898  0.146791707
## symmetry_se             -0.057994658  0.201350403  0.04029129 -0.041838650
## fractal_dimension_se     0.375244858  0.167160129 -0.08090806  0.265870669
## radius_worst             0.045757151 -0.020844228  0.05342835  0.005487339
## texture_worst           -0.046909442  0.024870593 -0.05250541 -0.155607106
## perimeter_worst          0.164897476 -0.002299707 -0.05710063 -0.075900865
## area_worst               0.039966213  0.056444976  0.23364859 -0.010488508
## smoothness_worst        -0.155635802  0.587474256 -0.20425526  0.049098768
## compactness_worst        0.216210169 -0.073245702  0.11882451  0.169647488
## concavity_worst          0.361658244 -0.043980365  0.08749700  0.148933599
## concave.points_worst     0.186084157  0.177397759  0.21162043 -0.140699496
## symmetry_worst           0.041612402 -0.322876090 -0.09572376  0.096536806
## fractal_dimension_worst -0.457908858 -0.289468007 -0.02974475 -0.246153443
##                                 PC21         PC22         PC23          PC24
## radius_mean             -0.232015941  0.099234357  0.206506014  0.0913006921
## texture_mean            -0.284524593 -0.019403207 -0.229411153 -0.3365259248
## perimeter_mean          -0.244988543  0.140085057  0.165617812  0.1183261331
## area_mean               -0.183405270  0.035385858  0.051439950  0.0910575245
## smoothness_mean          0.079526386 -0.002545800  0.025738916  0.0251316054
## compactness_mean         0.332764999  0.065908922 -0.046229380 -0.1007000360
## concavity_mean          -0.089894898 -0.428459422 -0.106141486  0.2066123970
## concave.points_mean     -0.147707419  0.178016680 -0.071052527 -0.1001787112
## symmetry_mean            0.077083650  0.026455615  0.024192439  0.0692685204
## fractal_dimension_mean  -0.088022536 -0.022002799  0.086437947  0.0226388952
## radius_se               -0.080301020 -0.206021064  0.623298983 -0.2988369658
## texture_se              -0.140434341 -0.009808999 -0.139671790 -0.2052001320
## perimeter_se            -0.041551446  0.094570893 -0.318970528  0.1497332592
## area_se                 -0.076640133  0.322066797 -0.221191499  0.3169313274
## smoothness_se            0.085029127  0.065643445  0.051879018  0.0143609381
## compactness_se          -0.122307718 -0.242069196  0.066535874 -0.0556231341
## concavity_se             0.166900111  0.332324281 -0.020686477 -0.2031028188
## concave.points_se       -0.061225767 -0.264501382 -0.077232166  0.1922970102
## symmetry_se              0.120412203  0.044180958  0.035205436  0.0998814495
## fractal_dimension_se     0.056376352  0.046725545  0.022905565  0.0366193128
## radius_worst             0.247949664 -0.226063345  0.010715246 -0.1764486670
## texture_worst            0.377565475  0.028506437  0.322672725  0.4740689751
## perimeter_worst          0.246207286 -0.010251102 -0.252960270 -0.0612705262
## area_worst               0.352751552 -0.326813902 -0.213419409 -0.1517004201
## smoothness_worst        -0.175163961 -0.064803545 -0.068625222 -0.0005696221
## compactness_worst       -0.135983004  0.132063233  0.009555864  0.1403253603
## concavity_worst         -0.099152855 -0.093128597  0.105293786  0.1313027052
## concave.points_worst     0.166095151  0.388528098  0.175422451 -0.3199084880
## symmetry_worst          -0.173143223 -0.070909375 -0.070307314 -0.1310276037
## fractal_dimension_worst -0.003652793  0.017871237 -0.090297491 -0.0332870912
##                                 PC25         PC26         PC27         PC28
## radius_mean              0.057921543  0.020469033  0.134382630 -0.279788963
## texture_mean             0.040645831 -0.014857017  0.037688679 -0.012279581
## perimeter_mean           0.031642586  0.053626687  0.181547516 -0.114934061
## area_mean                0.183110740 -0.285829689 -0.468772162  0.408755352
## smoothness_mean          0.037126772 -0.043056208  0.003427039  0.020500125
## compactness_mean         0.265320923  0.126881693 -0.120890534 -0.012278703
## concavity_mean           0.052455964 -0.402054720  0.361107423  0.038552143
## concave.points_mean     -0.463556000  0.348785956 -0.293772627 -0.028380783
## symmetry_mean           -0.043488106 -0.017757923  0.016690257 -0.004753927
## fractal_dimension_mean  -0.007413677  0.005002139  0.046812241 -0.012669344
## radius_se               -0.141360855 -0.024575936 -0.010240670  0.192418400
## texture_se               0.026171521 -0.008671482  0.013516189 -0.008572355
## perimeter_se             0.142592889 -0.072761674 -0.072984773 -0.176731019
## area_se                  0.126110528  0.085739405  0.114008680 -0.044238407
## smoothness_se            0.025831170  0.029307890  0.015806801 -0.008497734
## compactness_se           0.233635165  0.178417463  0.037393341  0.022459213
## concavity_se            -0.220437738 -0.149763123 -0.009544599 -0.027397394
## concave.points_se       -0.095372021  0.050834866 -0.054486239  0.029773747
## symmetry_se             -0.045064047 -0.001649603  0.010293989 -0.009327825
## fractal_dimension_se    -0.024467360 -0.040300491 -0.011520204 -0.007228536
## radius_worst            -0.158017838  0.181407105  0.254862466 -0.437858240
## texture_worst           -0.058684936  0.019410386 -0.057764188  0.016000899
## perimeter_worst         -0.146035070  0.256807841  0.385693693  0.659628823
## area_worst              -0.026874706 -0.218137976 -0.422170071 -0.192344355
## smoothness_worst        -0.032266875  0.003431287  0.013720311 -0.005738992
## compactness_worst       -0.481580485 -0.326511659 -0.012533702 -0.029168224
## concavity_worst          0.268413983  0.465306806 -0.214129944 -0.010415461
## concave.points_worst     0.368311248 -0.263684363  0.168622076 -0.018861110
## symmetry_worst           0.074007503  0.014196148 -0.022864661  0.003272490
## fractal_dimension_worst  0.005676307  0.029220190 -0.005097350  0.012142839
##                                  PC29          PC30
## radius_mean             -0.0273038224 -0.7288048295
## texture_mean             0.0008313481 -0.0009068040
## perimeter_mean          -0.4788483268  0.5964157597
## area_mean                0.4612104085  0.1372911682
## smoothness_mean         -0.0028006234  0.0050942919
## compactness_mean         0.0167379047 -0.0147489225
## concavity_mean           0.0173247793 -0.0116804376
## concave.points_mean      0.0078183793 -0.0232073046
## symmetry_mean            0.0075963138  0.0033172523
## fractal_dimension_mean   0.0122407466 -0.0099607853
## radius_se               -0.0702795575 -0.0217254860
## texture_se              -0.0005125859  0.0026002717
## perimeter_se             0.0457789559  0.0139637116
## area_se                  0.0267314831 -0.0027774688
## smoothness_se           -0.0028538511 -0.0003150268
## compactness_se          -0.0102948571 -0.0037929134
## concavity_se             0.0037361205  0.0006432705
## concave.points_se       -0.0047960184  0.0034437940
## symmetry_se              0.0045938779  0.0012387126
## fractal_dimension_se     0.0034592228  0.0049532491
## radius_worst             0.5722381807  0.2544036736
## texture_worst           -0.0028622058 -0.0018180606
## perimeter_worst         -0.0977843021 -0.1386929974
## area_worst              -0.4580475947 -0.0915326236
## smoothness_worst        -0.0016690032 -0.0028329368
## compactness_worst        0.0314242337 -0.0019703336
## concavity_worst         -0.0143448097  0.0031170729
## concave.points_worst    -0.0048698113  0.0120322512
## symmetry_worst          -0.0159307871 -0.0037460451
## fractal_dimension_worst -0.0120277777 -0.0003723341
  • Con el siguiente gráfico se evalúa la cantidad de variables que tomaremos en cuenta, generando así un nuevo conjunto de datos más óptimo.
scree(data_var)

De acuerdo al gráfico, interpretamos que con solo usar 6 variables podremos obtener un nuevo conjunto de datos, con variables reducidas.

Selección de variables

Conjunto de datos más óptimo.

data_comp <- pca$x 
comp_data <- data_comp[,1:6]

Agregamos el target.

new_data <- cbind(comp_data, data$diagnosis)

Formato de la data final

write.csv(new_data, file = "dataCancerFinal2.csv")
getwd()
## [1] "D:/VII CICLO/MINERIA DE DATOS/FINAL/R-MARKDOWN"

Visualización de datos

Se visualiza el conjunto de datos, con las variables resumidas: https://raw.githubusercontent.com/cnahuina/data-mineria/master/dataCancerFinal2.csv