El conjunto de datos alzheimer.csv estÔ relacionado a la enfermedad del Alzheimer. Aplique alguna técnica de reducción de dimensionalidad para visualizar en dos dimensiones la relación entre variables que obtuve luego de aplicar la técnica de reducción de dimensionalidad. Discuta los resultados.
Nota: Puede omitir la primer columna āIdentifierā.
Puntos extra: Genere un correlograma (o diagramas que considere útiles) para analizar la relación de las variables del conjunto de datos.
setwd("/Users/christianeugeniomartinez/Downloads")
D = read.csv("alzheimer.csv")
library(cowplot)
library(grid)
library(ggplot2)
summary(D)
## IDENTIFIER GSM21215 GSM21217 GSM21218
## Length:22215 Min. : 0.5 Min. : 0.9 Min. : 0.4
## Class :character 1st Qu.: 54.5 1st Qu.: 57.4 1st Qu.: 61.4
## Mode :character Median : 182.5 Median : 190.4 Median : 197.6
## Mean : 666.6 Mean : 694.7 Mean : 664.0
## 3rd Qu.: 563.0 3rd Qu.: 564.7 3rd Qu.: 584.5
## Max. :23313.3 Max. :37098.7 Max. :25044.2
## GSM21219 GSM21220 GSM21221 GSM21226
## Min. : 0.9 Min. : 0.5 Min. : 0.5 Min. : 0.5
## 1st Qu.: 68.7 1st Qu.: 60.1 1st Qu.: 60.1 1st Qu.: 75.9
## Median : 222.4 Median : 201.2 Median : 197.8 Median : 241.5
## Mean : 665.9 Mean : 691.1 Mean : 668.9 Mean : 663.9
## 3rd Qu.: 618.2 3rd Qu.: 586.9 3rd Qu.: 576.2 3rd Qu.: 618.1
## Max. :30867.1 Max. :31601.2 Max. :25266.9 Max. :30183.3
## GSM21231 GSM21232 GSM21204 GSM21205
## Min. : 0.6 Min. : 0.5 Min. : 0.9 Min. : 1.7
## 1st Qu.: 71.9 1st Qu.: 77.6 1st Qu.: 66.5 1st Qu.: 88.5
## Median : 227.7 Median : 248.7 Median : 217.8 Median : 274.4
## Mean : 686.9 Mean : 668.9 Mean : 669.0 Mean : 652.4
## 3rd Qu.: 596.6 3rd Qu.: 618.9 3rd Qu.: 611.1 3rd Qu.: 637.9
## Max. :36909.9 Max. :35048.4 Max. :36174.3 Max. :46027.2
## GSM21216 GSM21222 GSM21225 GSM21228
## Min. : 0.50 Min. : 0.8 Min. : 0.5 Min. : 0.60
## 1st Qu.: 58.15 1st Qu.: 68.5 1st Qu.: 63.0 1st Qu.: 75.05
## Median : 191.50 Median : 219.6 Median : 206.3 Median : 231.80
## Mean : 680.36 Mean : 661.9 Mean : 675.0 Mean : 677.70
## 3rd Qu.: 574.95 3rd Qu.: 598.6 3rd Qu.: 607.9 3rd Qu.: 606.90
## Max. :28901.30 Max. :27498.3 Max. :26635.7 Max. :32476.40
## GSM21233 GSM21209 GSM21210 GSM21211
## Min. : 0.9 Min. : 1.20 Min. : 0.8 Min. : 1.2
## 1st Qu.: 76.2 1st Qu.: 73.75 1st Qu.: 65.2 1st Qu.: 72.3
## Median : 235.8 Median : 235.40 Median : 213.0 Median : 224.4
## Mean : 664.0 Mean : 668.11 Mean : 679.5 Mean : 682.5
## 3rd Qu.: 607.2 3rd Qu.: 618.45 3rd Qu.: 600.4 3rd Qu.: 602.5
## Max. :30130.7 Max. :39691.40 Max. :30653.1 Max. :42913.5
## GSM21214 GSM21223 GSM21224 GSM21227
## Min. : 0.40 Min. : 0.8 Min. : 0.4 Min. : 1.3
## 1st Qu.: 66.75 1st Qu.: 69.5 1st Qu.: 66.0 1st Qu.: 72.4
## Median : 214.90 Median : 218.5 Median : 212.7 Median : 228.9
## Mean : 667.45 Mean : 675.4 Mean : 694.2 Mean : 694.2
## 3rd Qu.: 601.35 3rd Qu.: 597.9 3rd Qu.: 599.2 3rd Qu.: 610.2
## Max. :32213.70 Max. :32358.7 Max. :35002.8 Max. :44696.8
## GSM21230 GSM21203 GSM21206 GSM21207
## Min. : 1.1 Min. : 1.70 Min. : 0.5 Min. : 1.6
## 1st Qu.: 75.1 1st Qu.: 78.45 1st Qu.: 60.9 1st Qu.: 90.0
## Median : 241.4 Median : 248.50 Median : 203.7 Median : 272.5
## Mean : 699.7 Mean : 675.11 Mean : 694.6 Mean : 679.6
## 3rd Qu.: 611.5 3rd Qu.: 614.00 3rd Qu.: 590.1 3rd Qu.: 628.9
## Max. :52274.1 Max. :43540.90 Max. :35972.2 Max. :78292.3
## GSM21208 GSM21212 GSM21213 GSM21229
## Min. : 1.6 Min. : 0.40 Min. : 1.30 Min. : 0.7
## 1st Qu.: 71.6 1st Qu.: 80.95 1st Qu.: 75.55 1st Qu.: 67.6
## Median : 240.1 Median : 246.40 Median : 252.60 Median : 208.5
## Mean : 664.1 Mean : 660.91 Mean : 665.41 Mean : 673.0
## 3rd Qu.: 612.9 3rd Qu.: 626.55 3rd Qu.: 626.60 3rd Qu.: 589.3
## Max. :33719.0 Max. :41252.90 Max. :40707.20 Max. :27179.6
cor(D[,2:32])
## GSM21215 GSM21217 GSM21218 GSM21219 GSM21220 GSM21221 GSM21226
## GSM21215 1.0000000 0.9353832 0.9369323 0.8829778 0.9050716 0.9574731 0.8759577
## GSM21217 0.9353832 1.0000000 0.9097081 0.9000105 0.8926460 0.9191428 0.8769255
## GSM21218 0.9369323 0.9097081 1.0000000 0.9155014 0.9480002 0.9670626 0.9156078
## GSM21219 0.8829778 0.9000105 0.9155014 1.0000000 0.9536572 0.9234915 0.9672967
## GSM21220 0.9050716 0.8926460 0.9480002 0.9536572 1.0000000 0.9473730 0.9677615
## GSM21221 0.9574731 0.9191428 0.9670626 0.9234915 0.9473730 1.0000000 0.9170045
## GSM21226 0.8759577 0.8769255 0.9156078 0.9672967 0.9677615 0.9170045 1.0000000
## GSM21231 0.8907292 0.8754427 0.9295498 0.9554966 0.9695406 0.9436310 0.9621211
## GSM21232 0.8708923 0.8916677 0.9134709 0.9619263 0.9601073 0.9213772 0.9656313
## GSM21204 0.9102571 0.9226251 0.9061181 0.9424531 0.9235420 0.9004018 0.9390573
## GSM21205 0.7757866 0.8380126 0.8050809 0.9008846 0.8612062 0.8242918 0.8914688
## GSM21216 0.9485618 0.9365034 0.9635799 0.9311208 0.9457520 0.9812273 0.9164434
## GSM21222 0.9227848 0.8975757 0.9427718 0.9498366 0.9615706 0.9663345 0.9491900
## GSM21225 0.8895407 0.8605836 0.9205218 0.9527685 0.9543768 0.9333685 0.9604083
## GSM21228 0.8707796 0.8677730 0.9032156 0.9527764 0.9519923 0.9218126 0.9577838
## GSM21233 0.9202750 0.9189612 0.9571396 0.9496125 0.9562572 0.9530827 0.9455148
## GSM21209 0.8327451 0.8771304 0.8749440 0.9510477 0.9146536 0.8792508 0.9342113
## GSM21210 0.9155392 0.9221748 0.9420970 0.9638890 0.9589149 0.9602522 0.9556036
## GSM21211 0.8357918 0.8822178 0.8810223 0.9545664 0.9302910 0.8798769 0.9494094
## GSM21214 0.9137557 0.9304949 0.9622727 0.9438096 0.9442149 0.9533812 0.9375295
## GSM21223 0.8804141 0.8981663 0.9177721 0.9582545 0.9539457 0.9299030 0.9532457
## GSM21224 0.8668052 0.8533126 0.9100001 0.9467834 0.9503249 0.9221100 0.9541779
## GSM21227 0.8342587 0.8483455 0.8797820 0.9487702 0.9486456 0.8788347 0.9578691
## GSM21230 0.7980119 0.8058380 0.8441170 0.9348993 0.9160812 0.8555063 0.9361030
## GSM21203 0.8315814 0.8807255 0.8429234 0.9367617 0.9024829 0.8398039 0.9336650
## GSM21206 0.8410284 0.8312559 0.8942855 0.9104330 0.9503944 0.8843787 0.9287425
## GSM21207 0.6541584 0.7275266 0.6974657 0.8219366 0.7646096 0.6955096 0.8219120
## GSM21208 0.8977119 0.9317319 0.8813558 0.9192311 0.8994759 0.8852151 0.9096120
## GSM21212 0.7792472 0.8214978 0.8301819 0.9372894 0.8964881 0.8324948 0.9345452
## GSM21213 0.8311145 0.8606136 0.8358762 0.9326212 0.8979802 0.8399231 0.9341520
## GSM21229 0.9150516 0.8810147 0.9409905 0.9377914 0.9531923 0.9576165 0.9377179
## GSM21231 GSM21232 GSM21204 GSM21205 GSM21216 GSM21222 GSM21225
## GSM21215 0.8907292 0.8708923 0.9102571 0.7757866 0.9485618 0.9227848 0.8895407
## GSM21217 0.8754427 0.8916677 0.9226251 0.8380126 0.9365034 0.8975757 0.8605836
## GSM21218 0.9295498 0.9134709 0.9061181 0.8050809 0.9635799 0.9427718 0.9205218
## GSM21219 0.9554966 0.9619263 0.9424531 0.9008846 0.9311208 0.9498366 0.9527685
## GSM21220 0.9695406 0.9601073 0.9235420 0.8612062 0.9457520 0.9615706 0.9543768
## GSM21221 0.9436310 0.9213772 0.9004018 0.8242918 0.9812273 0.9663345 0.9333685
## GSM21226 0.9621211 0.9656313 0.9390573 0.8914688 0.9164434 0.9491900 0.9604083
## GSM21231 1.0000000 0.9590432 0.9081820 0.8660865 0.9421595 0.9676899 0.9560211
## GSM21232 0.9590432 1.0000000 0.9298719 0.9207286 0.9246538 0.9554746 0.9418241
## GSM21204 0.9081820 0.9298719 1.0000000 0.8689672 0.9042754 0.9191051 0.9199948
## GSM21205 0.8660865 0.9207286 0.8689672 1.0000000 0.8306715 0.8540553 0.8533325
## GSM21216 0.9421595 0.9246538 0.9042754 0.8306715 1.0000000 0.9618449 0.9214736
## GSM21222 0.9676899 0.9554746 0.9191051 0.8540553 0.9618449 1.0000000 0.9511271
## GSM21225 0.9560211 0.9418241 0.9199948 0.8533325 0.9214736 0.9511271 1.0000000
## GSM21228 0.9635618 0.9712237 0.9107672 0.8950367 0.9212517 0.9616646 0.9579677
## GSM21233 0.9443647 0.9458992 0.9357922 0.8615858 0.9505144 0.9473387 0.9304035
## GSM21209 0.9146071 0.9457953 0.9123814 0.9364932 0.8886191 0.9036591 0.9118186
## GSM21210 0.9602454 0.9621961 0.9318886 0.8973078 0.9640286 0.9614947 0.9516589
## GSM21211 0.9250589 0.9618063 0.9247127 0.9274742 0.8937857 0.9152187 0.9118840
## GSM21214 0.9350317 0.9500159 0.9392231 0.8850182 0.9569880 0.9420284 0.9282461
## GSM21223 0.9563713 0.9735143 0.9260527 0.8962114 0.9410446 0.9625016 0.9347330
## GSM21224 0.9593379 0.9590888 0.9002408 0.8579019 0.9180618 0.9566258 0.9532290
## GSM21227 0.9382405 0.9604796 0.9168877 0.8949529 0.8761743 0.9216461 0.9470527
## GSM21230 0.9306288 0.9473402 0.8952608 0.8830376 0.8509939 0.9120474 0.9300339
## GSM21203 0.9034878 0.9254139 0.9355011 0.8757557 0.8573052 0.8832277 0.8895711
## GSM21206 0.9299546 0.9124623 0.8759237 0.8051345 0.8927748 0.9116110 0.9008833
## GSM21207 0.7701360 0.8328114 0.7958050 0.8938662 0.7031994 0.7375880 0.7917405
## GSM21208 0.8978396 0.9109487 0.9474235 0.8520005 0.9023861 0.9115182 0.8710626
## GSM21212 0.8997396 0.9410798 0.8968142 0.9256117 0.8456394 0.8879670 0.9059730
## GSM21213 0.9046633 0.9337760 0.9435984 0.9065818 0.8503508 0.8874933 0.9023270
## GSM21229 0.9538355 0.9327523 0.9000234 0.8259457 0.9552943 0.9672369 0.9574021
## GSM21228 GSM21233 GSM21209 GSM21210 GSM21211 GSM21214 GSM21223
## GSM21215 0.8707796 0.9202750 0.8327451 0.9155392 0.8357918 0.9137557 0.8804141
## GSM21217 0.8677730 0.9189612 0.8771304 0.9221748 0.8822178 0.9304949 0.8981663
## GSM21218 0.9032156 0.9571396 0.8749440 0.9420970 0.8810223 0.9622727 0.9177721
## GSM21219 0.9527764 0.9496125 0.9510477 0.9638890 0.9545664 0.9438096 0.9582545
## GSM21220 0.9519923 0.9562572 0.9146536 0.9589149 0.9302910 0.9442149 0.9539457
## GSM21221 0.9218126 0.9530827 0.8792508 0.9602522 0.8798769 0.9533812 0.9299030
## GSM21226 0.9577838 0.9455148 0.9342113 0.9556036 0.9494094 0.9375295 0.9532457
## GSM21231 0.9635618 0.9443647 0.9146071 0.9602454 0.9250589 0.9350317 0.9563713
## GSM21232 0.9712237 0.9458992 0.9457953 0.9621961 0.9618063 0.9500159 0.9735143
## GSM21204 0.9107672 0.9357922 0.9123814 0.9318886 0.9247127 0.9392231 0.9260527
## GSM21205 0.8950367 0.8615858 0.9364932 0.8973078 0.9274742 0.8850182 0.8962114
## GSM21216 0.9212517 0.9505144 0.8886191 0.9640286 0.8937857 0.9569880 0.9410446
## GSM21222 0.9616646 0.9473387 0.9036591 0.9614947 0.9152187 0.9420284 0.9625016
## GSM21225 0.9579677 0.9304035 0.9118186 0.9516589 0.9118840 0.9282461 0.9347330
## GSM21228 1.0000000 0.9248756 0.9298796 0.9588593 0.9449169 0.9302105 0.9684000
## GSM21233 0.9248756 1.0000000 0.9079591 0.9559674 0.9206651 0.9705739 0.9380066
## GSM21209 0.9298796 0.9079591 1.0000000 0.9478485 0.9603123 0.9309922 0.9334896
## GSM21210 0.9588593 0.9559674 0.9478485 1.0000000 0.9455111 0.9686284 0.9648186
## GSM21211 0.9449169 0.9206651 0.9603123 0.9455111 1.0000000 0.9356373 0.9515491
## GSM21214 0.9302105 0.9705739 0.9309922 0.9686284 0.9356373 1.0000000 0.9484492
## GSM21223 0.9684000 0.9380066 0.9334896 0.9648186 0.9515491 0.9484492 1.0000000
## GSM21224 0.9759500 0.9231870 0.9200381 0.9564032 0.9297973 0.9241197 0.9612192
## GSM21227 0.9535323 0.9163270 0.9252712 0.9251677 0.9439120 0.9074515 0.9393593
## GSM21230 0.9500890 0.8872305 0.9110371 0.9086692 0.9283367 0.8829986 0.9359246
## GSM21203 0.9085978 0.8945546 0.9166614 0.9035943 0.9386201 0.8911623 0.9172872
## GSM21206 0.9061445 0.8960000 0.8716084 0.9022874 0.8914899 0.8841154 0.9217672
## GSM21207 0.8206373 0.7537538 0.8766557 0.7941095 0.8593297 0.7821910 0.7963008
## GSM21208 0.8874151 0.9139497 0.8794176 0.9121732 0.9065108 0.9115450 0.9221008
## GSM21212 0.9372529 0.8760736 0.9552284 0.9172282 0.9564975 0.8949556 0.9369720
## GSM21213 0.9229057 0.8865736 0.9284100 0.9090600 0.9394077 0.8932314 0.9245303
## GSM21229 0.9509479 0.9337889 0.8940929 0.9539362 0.8992849 0.9294250 0.9400248
## GSM21224 GSM21227 GSM21230 GSM21203 GSM21206 GSM21207 GSM21208
## GSM21215 0.8668052 0.8342587 0.7980119 0.8315814 0.8410284 0.6541584 0.8977119
## GSM21217 0.8533126 0.8483455 0.8058380 0.8807255 0.8312559 0.7275266 0.9317319
## GSM21218 0.9100001 0.8797820 0.8441170 0.8429234 0.8942855 0.6974657 0.8813558
## GSM21219 0.9467834 0.9487702 0.9348993 0.9367617 0.9104330 0.8219366 0.9192311
## GSM21220 0.9503249 0.9486456 0.9160812 0.9024829 0.9503944 0.7646096 0.8994759
## GSM21221 0.9221100 0.8788347 0.8555063 0.8398039 0.8843787 0.6955096 0.8852151
## GSM21226 0.9541779 0.9578691 0.9361030 0.9336650 0.9287425 0.8219120 0.9096120
## GSM21231 0.9593379 0.9382405 0.9306288 0.9034878 0.9299546 0.7701360 0.8978396
## GSM21232 0.9590888 0.9604796 0.9473402 0.9254139 0.9124623 0.8328114 0.9109487
## GSM21204 0.9002408 0.9168877 0.8952608 0.9355011 0.8759237 0.7958050 0.9474235
## GSM21205 0.8579019 0.8949529 0.8830376 0.8757557 0.8051345 0.8938662 0.8520005
## GSM21216 0.9180618 0.8761743 0.8509939 0.8573052 0.8927748 0.7031994 0.9023861
## GSM21222 0.9566258 0.9216461 0.9120474 0.8832277 0.9116110 0.7375880 0.9115182
## GSM21225 0.9532290 0.9470527 0.9300339 0.8895711 0.9008833 0.7917405 0.8710626
## GSM21228 0.9759500 0.9535323 0.9500890 0.9085978 0.9061445 0.8206373 0.8874151
## GSM21233 0.9231870 0.9163270 0.8872305 0.8945546 0.8960000 0.7537538 0.9139497
## GSM21209 0.9200381 0.9252712 0.9110371 0.9166614 0.8716084 0.8766557 0.8794176
## GSM21210 0.9564032 0.9251677 0.9086692 0.9035943 0.9022874 0.7941095 0.9121732
## GSM21211 0.9297973 0.9439120 0.9283367 0.9386201 0.8914899 0.8593297 0.9065108
## GSM21214 0.9241197 0.9074515 0.8829986 0.8911623 0.8841154 0.7821910 0.9115450
## GSM21223 0.9612192 0.9393593 0.9359246 0.9172872 0.9217672 0.7963008 0.9221008
## GSM21224 1.0000000 0.9361580 0.9308156 0.8930135 0.9172574 0.7796873 0.8751130
## GSM21227 0.9361580 1.0000000 0.9651219 0.9254819 0.9060811 0.8583386 0.8752800
## GSM21230 0.9308156 0.9651219 1.0000000 0.9153163 0.8844677 0.8574508 0.8548898
## GSM21203 0.8930135 0.9254819 0.9153163 1.0000000 0.8758111 0.8384629 0.9354166
## GSM21206 0.9172574 0.9060811 0.8844677 0.8758111 1.0000000 0.7241101 0.8700015
## GSM21207 0.7796873 0.8583386 0.8574508 0.8384629 0.7241101 1.0000000 0.7444494
## GSM21208 0.8751130 0.8752800 0.8548898 0.9354166 0.8700015 0.7444494 1.0000000
## GSM21212 0.9211414 0.9326141 0.9313565 0.9275345 0.8708196 0.8955919 0.8744143
## GSM21213 0.9071476 0.9270945 0.9200618 0.9567559 0.8686445 0.8565382 0.9214023
## GSM21229 0.9548155 0.9165759 0.8946151 0.8671144 0.9086429 0.7400790 0.8774467
## GSM21212 GSM21213 GSM21229
## GSM21215 0.7792472 0.8311145 0.9150516
## GSM21217 0.8214978 0.8606136 0.8810147
## GSM21218 0.8301819 0.8358762 0.9409905
## GSM21219 0.9372894 0.9326212 0.9377914
## GSM21220 0.8964881 0.8979802 0.9531923
## GSM21221 0.8324948 0.8399231 0.9576165
## GSM21226 0.9345452 0.9341520 0.9377179
## GSM21231 0.8997396 0.9046633 0.9538355
## GSM21232 0.9410798 0.9337760 0.9327523
## GSM21204 0.8968142 0.9435984 0.9000234
## GSM21205 0.9256117 0.9065818 0.8259457
## GSM21216 0.8456394 0.8503508 0.9552943
## GSM21222 0.8879670 0.8874933 0.9672369
## GSM21225 0.9059730 0.9023270 0.9574021
## GSM21228 0.9372529 0.9229057 0.9509479
## GSM21233 0.8760736 0.8865736 0.9337889
## GSM21209 0.9552284 0.9284100 0.8940929
## GSM21210 0.9172282 0.9090600 0.9539362
## GSM21211 0.9564975 0.9394077 0.8992849
## GSM21214 0.8949556 0.8932314 0.9294250
## GSM21223 0.9369720 0.9245303 0.9400248
## GSM21224 0.9211414 0.9071476 0.9548155
## GSM21227 0.9326141 0.9270945 0.9165759
## GSM21230 0.9313565 0.9200618 0.8946151
## GSM21203 0.9275345 0.9567559 0.8671144
## GSM21206 0.8708196 0.8686445 0.9086429
## GSM21207 0.8955919 0.8565382 0.7400790
## GSM21208 0.8744143 0.9214023 0.8774467
## GSM21212 1.0000000 0.9479129 0.8762190
## GSM21213 0.9479129 1.0000000 0.8710343
## GSM21229 0.8762190 0.8710343 1.0000000
colnames(D)
## [1] "IDENTIFIER" "GSM21215" "GSM21217" "GSM21218" "GSM21219"
## [6] "GSM21220" "GSM21221" "GSM21226" "GSM21231" "GSM21232"
## [11] "GSM21204" "GSM21205" "GSM21216" "GSM21222" "GSM21225"
## [16] "GSM21228" "GSM21233" "GSM21209" "GSM21210" "GSM21211"
## [21] "GSM21214" "GSM21223" "GSM21224" "GSM21227" "GSM21230"
## [26] "GSM21203" "GSM21206" "GSM21207" "GSM21208" "GSM21212"
## [31] "GSM21213" "GSM21229"
#pairs(D[,2:32], main = "Datos alzheimer")
Por defecto, la función prcomp() centra las variables para que tengan media de 0. Con el argumento scale = TRUE indicamos que queremos escalar las variables para que tengan desviación estÔndar igual a 1.
pca_D <- prcomp(D[,2:32], scale = TRUE)
names(pca_D)
## [1] "sdev" "rotation" "center" "scale" "x"
head(pca_D$rotation)[, 1:5]
## PC1 PC2 PC3 PC4 PC5
## GSM21215 -0.1724615 -0.32121197 0.224466505 -0.01516863 0.323072359
## GSM21217 -0.1741319 -0.17046472 0.445171166 -0.04265692 -0.003400844
## GSM21218 -0.1778878 -0.26025106 0.004798824 -0.14010214 0.049145007
## GSM21219 -0.1846141 0.03427660 -0.007839905 0.03303118 -0.013908558
## GSM21220 -0.1836665 -0.09009127 -0.148673511 0.06627419 -0.029124958
## GSM21221 -0.1791785 -0.27150471 -0.016802856 -0.20373098 0.060042035
dim(pca_D$rotation)
## [1] 31 31
Hay un total de 31 componentes principales distintas, ya que en general pueden haber min(nā1,p) componentes en un set de datos nĆp.Ā En este caso min (31, 31) = 64.
head(pca_D$x)[,1:5]
## PC1 PC2 PC3 PC4 PC5
## [1,] -14.9630735 4.53946688 0.8718804 -0.47007428 -0.62903225
## [2,] 1.9975394 -0.03608750 -0.1353925 0.02104120 -0.01231705
## [3,] 1.0101081 -0.14802532 -0.8790234 1.01826654 -0.94460721
## [4,] -2.5488601 0.68045326 0.4893180 -0.19788775 -0.10246445
## [5,] 1.6491719 -0.01297025 0.1338780 -0.01762962 -0.01101674
## [6,] 0.8183589 0.33244759 0.1019599 -0.08054798 -0.12982242
pca_D$sdev
## [1] 5.3163854 0.9534608 0.6419103 0.5087700 0.3888509 0.3650124 0.3117415
## [8] 0.3052550 0.2728228 0.2535404 0.2362288 0.2195945 0.2107304 0.1965574
## [15] 0.1855211 0.1805151 0.1761691 0.1705069 0.1651225 0.1583912 0.1554586
## [22] 0.1462308 0.1369696 0.1343752 0.1282888 0.1241790 0.1204769 0.1177992
## [29] 0.1095716 0.1053027 0.1013918
pca_D$sdev^2
## [1] 28.26395337 0.90908746 0.41204887 0.25884690 0.15120504 0.13323407
## [7] 0.09718276 0.09318061 0.07443230 0.06428275 0.05580404 0.04822176
## [13] 0.04440728 0.03863481 0.03441809 0.03258569 0.03103555 0.02907260
## [19] 0.02726544 0.02508777 0.02416738 0.02138345 0.01876067 0.01805670
## [25] 0.01645802 0.01542042 0.01451468 0.01387666 0.01200593 0.01108866
## [31] 0.01028029
Como es de esperar, la varianza explicada es mayor en la primera componente que en las subsiguientes.
Cabe destacar que la representación grÔfica de las observaciones y las variables es distinta: las observaciones se representan mediante sus proyecciones, mientras que las variables se representan mediante sus correlaciones. La correlación entre una componente y una variable estima la información que comparten -> loadings, por lo que las variables se pueden representar como puntos en el espacio de los componentes utilizando sus loadings como coordenadas.
library(factoextra)
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
fviz_pca_ind(pca_D, geom.ind = "point",
col.ind = "#FC4E07",
axes = c(1, 2),
pointsize = 1.5)
Para representar las variables sobre las dos primeras componentes principales podemos utilizar la función fviz_pca_var() del paquete factoextra. La correlación entre una variable y una componente principal se utiliza como la coordenada de dicha variable sobre la componente principal. De esta manera podemos obtener un grÔfico de correlación de variables:
fviz_pca_var(pca_D, col.var = "cos2",
geom.var = "arrow",
labelsize = 2,
repel = FALSE)
En este tipo de grĆ”fico, ademĆ”s de indicarse el % de varianza explicada por la primera (Dim1) y segunda componente (Dim2), las variables positivamente correlacionadas se agrupan juntas o próximas, mientras que las negativamente correlacionadas se representan en lados opuestos del origen o cuadrantes opuestos. AdemĆ”s, la distancia entre las variables y el origen mide la calidad de la representación de las variables (mayor cuanto mĆ”s próxima a la circunferencia o cĆrculo de correlación, siendo Ć©stas las que mĆ”s contribuyen en los dos primeros componentes). La calidad de esta representación se mide por el valor al cuadrado del coseno (cos2 ) del Ć”ngulo del triĆ”ngulo formado por el punto del origen, la observación y su proyección sobre el componente. Para una variable dada, la suma del cos2 sobre todos los componentes principales serĆ” igual a 1, y si ademĆ”s la variable es perfectamente representable por solo los dos primeros componentes principales, la suma de cos2 sobre estos dos serĆ” igual a 1. Variables posicionadas cerca del origen puede ser un indicativo de que serĆan necesarios mĆ”s de dos componentes principales para su representación.
Podemos hacer uso de multitud de representaciones a la hora de escoger el número óptimo de componentes principales:
Una forma es generando un scree plot que represente los eigenvalores ordenador de mayor a menor. Con la función fviz_screeplot() del paquete factoextra podemos obtener esta representación, sin importar qué función hemos utilizado para generar los componentes principales.
fviz_screeplot(pca_D, addlabels = TRUE)
Si contamos con un gran nĆŗmero de variables, podrĆamos decidir mostrar solo aquellas con mayor contribución.
fviz_contrib(pca_D, choice = "var", axes = 1)
La lĆnea roja discontinua indica el valor medio de contribución. Para una determinada componente, una variable con una contribución mayor a este lĆmite puede considerarse importante a la hora de contribuir a esta componente. En la representación anterior, la v ariable GSM21232 es la que mĆ”s contribuye a la PC1.
Para calcular la proporción de varianza explicada (PVE) por cada componente principal, simplemente dividimos la varianza explicada de cada uno entre la varianza total de todos:
PVE <- 100*pca_D$sdev^2/sum(pca_D$sdev^2)
PVE
## [1] 91.17404313 2.93254018 1.32918991 0.83499000 0.48775818 0.42978733
## [7] 0.31349278 0.30058262 0.24010419 0.20736371 0.18001302 0.15555406
## [13] 0.14324930 0.12462842 0.11102610 0.10511512 0.10011469 0.09378258
## [19] 0.08795302 0.08092827 0.07795929 0.06897887 0.06051828 0.05824740
## [25] 0.05309039 0.04974330 0.04682154 0.04476341 0.03872879 0.03576987
## [31] 0.03316223
La primera componente principal explica el 91.2% de la varianza, mientras que la segunda solo un 2.9%.
par(mfrow = c(1,2))
plot(PVE, type = "o",
ylab = "PVE",
xlab = "Componente principal",
col = "blue")
plot(cumsum(PVE), type = "o",
ylab = "PVE acumulada",
xlab = "Componente principal",
col = "brown3")
De manera conjunta, los primeros 7 componentes principales explican en torno al 40% de la varianza de los datos, lo cual no es una cantidad muy alta.
prop_varianza <- pca_D$sdev^2 / sum(pca_D$sdev^2)
prop_varianza_acum <- cumsum(prop_varianza)
prop_varianza_acum
## [1] 0.9117404 0.9410658 0.9543577 0.9627076 0.9675852 0.9718831 0.9750180
## [8] 0.9780238 0.9804249 0.9824985 0.9842987 0.9858542 0.9872867 0.9885330
## [15] 0.9896432 0.9906944 0.9916955 0.9926334 0.9935129 0.9943222 0.9951018
## [22] 0.9957915 0.9963967 0.9969792 0.9975101 0.9980075 0.9984758 0.9989234
## [29] 0.9993107 0.9996684 1.0000000
ggplot(data = data.frame(prop_varianza_acum, pc = 1:31),
aes(x = pc, y = prop_varianza_acum, group = 1)) +
geom_point() +
geom_line() +
geom_label(aes(label = round(prop_varianza_acum,2))) +
theme_bw() +
labs(x = "Componente principal",
y = "Prop. varianza explicada acumulada")
De manera conjunta, los primeros 7 componentes principales explican en torno al 98% de la varianza de los datos, lo cual es una cantidad muy alta.