特異値分解・固有値分解って?

特異値分解 ◦行列Aを次のように分解する。

A=UDV

Dは対角成分以外がゼロの行列で、対角成分を特異値と呼ぶ。UやVのベクトル成分を特異ベクトルと呼ぶ。

固有値分解 ◦正方行列Aを次のように分解する。

A=PΛP

Λの対角成分を固有値、Pのベクトル成分を固有ベクトルと呼ぶ。

データの共分散行列を特異値分解_1

データの共分散行列を固有値分解_2

データを標準化せずに主成分分析_3

#library(MASS)

data(USArrests)
head(USArrests)
##            Murder Assault UrbanPop Rape
## Alabama      13.2     236       58 21.2
## Alaska       10.0     263       48 44.5
## Arizona       8.1     294       80 31.0
## Arkansas      8.8     190       50 19.5
## California    9.0     276       91 40.6
## Colorado      7.9     204       78 38.7
summary(USArrests)
##      Murder          Assault         UrbanPop          Rape      
##  Min.   : 0.800   Min.   : 45.0   Min.   :32.00   Min.   : 7.30  
##  1st Qu.: 4.075   1st Qu.:109.0   1st Qu.:54.50   1st Qu.:15.07  
##  Median : 7.250   Median :159.0   Median :66.00   Median :20.10  
##  Mean   : 7.788   Mean   :170.8   Mean   :65.54   Mean   :21.23  
##  3rd Qu.:11.250   3rd Qu.:249.0   3rd Qu.:77.75   3rd Qu.:26.18  
##  Max.   :17.400   Max.   :337.0   Max.   :91.00   Max.   :46.00
str(USArrests)
## 'data.frame':    50 obs. of  4 variables:
##  $ Murder  : num  13.2 10 8.1 8.8 9 7.9 3.3 5.9 15.4 17.4 ...
##  $ Assault : int  236 263 294 190 276 204 110 238 335 211 ...
##  $ UrbanPop: int  58 48 80 50 91 78 77 72 80 60 ...
##  $ Rape    : num  21.2 44.5 31 19.5 40.6 38.7 11.1 15.8 31.9 25.8 ...
#1
svd(var(USArrests))
## $d
## [1] 7011.114851  201.992366   42.112651    6.164246
## 
## $u
##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.04170432  0.04482166 -0.07989066 -0.99492173
## [2,] -0.99522128  0.05876003  0.06756974  0.03893830
## [3,] -0.04633575 -0.97685748  0.20054629 -0.05816914
## [4,] -0.07515550 -0.20071807 -0.97408059  0.07232502
## 
## $v
##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.04170432  0.04482166 -0.07989066 -0.99492173
## [2,] -0.99522128  0.05876003  0.06756974  0.03893830
## [3,] -0.04633575 -0.97685748  0.20054629 -0.05816914
## [4,] -0.07515550 -0.20071807 -0.97408059  0.07232502
#2
eigen(var(USArrests))
## $values
## [1] 7011.114851  201.992366   42.112651    6.164246
## 
## $vectors
##             [,1]        [,2]        [,3]        [,4]
## [1,] -0.04170432  0.04482166  0.07989066  0.99492173
## [2,] -0.99522128  0.05876003 -0.06756974 -0.03893830
## [3,] -0.04633575 -0.97685748 -0.20054629  0.05816914
## [4,] -0.07515550 -0.20071807  0.97408059 -0.07232502
#3
prcomp(USArrests)
## Standard deviations:
## [1] 83.732400 14.212402  6.489426  2.482790
## 
## Rotation:
##                 PC1         PC2         PC3         PC4
## Murder   0.04170432 -0.04482166  0.07989066 -0.99492173
## Assault  0.99522128 -0.05876003 -0.06756974  0.03893830
## UrbanPop 0.04633575  0.97685748 -0.20054629 -0.05816914
## Rape     0.07515550  0.20071807  0.97408059  0.07232502

どれも同じ結果になる(固有値分解は正方行列しか扱えない)。主成分分析はデータの共分散行列を特異値分解したものということが分かる。

また次に示すように、標準化したデータの主成分分析は、相関行列の特異値分解である(標準化したデータの特異値分解とも一致する)。

svd(cor(USArrests))
## $d
## [1] 2.4802416 0.9897652 0.3565632 0.1734301
## 
## $u
##            [,1]       [,2]       [,3]        [,4]
## [1,] -0.5358995  0.4181809 -0.3412327  0.64922780
## [2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
## [3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
## [4,] -0.5434321 -0.1673186  0.8177779  0.08902432
## 
## $v
##            [,1]       [,2]       [,3]        [,4]
## [1,] -0.5358995  0.4181809 -0.3412327  0.64922780
## [2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
## [3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
## [4,] -0.5434321 -0.1673186  0.8177779  0.08902432
svd(scale(USArrests))
## $d
## [1] 11.024148  6.964086  4.179904  2.915146
## 
## $u
##               [,1]        [,2]         [,3]          [,4]
##  [1,] -0.088502119  0.16111249 -0.105218608  0.0530665011
##  [2,] -0.175119011  0.15255799  0.483145153 -0.1489378244
##  [3,] -0.158329049 -0.10603826  0.012974042 -0.2834384049
##  [4,]  0.012699298  0.15917987  0.027135114 -0.0620804497
##  [5,] -0.226649068 -0.21932910  0.141759482 -0.1161380178
##  [6,] -0.136005136 -0.14038162  0.259336498  0.0004974586
##  [7,]  0.122004202 -0.15479183 -0.152346210 -0.0402308322
##  [8,] -0.004284214 -0.04624999 -0.170197773 -0.2995093258
##  [9,] -0.270566006  0.00557636 -0.136613685 -0.0326971796
## [10,] -0.147204794  0.18180252 -0.081106695  0.3656676469
## [11,]  0.081955040 -0.22324195  0.012026954  0.3065826885
## [12,]  0.147251202  0.02998994  0.061530174 -0.1694899353
## [13,] -0.123823808 -0.09692418 -0.160455000 -0.0414370084
## [14,]  0.045389560 -0.02154472  0.054011475  0.1442115222
## [15,]  0.202373535 -0.01479136  0.038974667  0.0059617844
## [16,]  0.071558552 -0.03840409  0.006051930  0.0701237800
## [17,]  0.067425851  0.13624293 -0.006718884  0.2277132300
## [18,] -0.140517959  0.12382100 -0.185555941  0.1544203417
## [19,]  0.215231159  0.05350432 -0.015555919 -0.1122203023
## [20,] -0.158347534  0.06079147 -0.037242408 -0.1898534932
## [21,]  0.043656895 -0.20960067 -0.144350623 -0.0609897144
## [22,] -0.189334383 -0.02208976  0.091150533  0.0347643443
## [23,]  0.151999911 -0.08987636  0.036252509  0.0228600295
## [24,] -0.089483486  0.34027971 -0.175449706  0.0731840095
## [25,] -0.062570302 -0.03743606  0.089392087  0.0766873549
## [26,]  0.106451539  0.07631705  0.058472149  0.0420214180
## [27,]  0.113651981 -0.02757065  0.041582129  0.0053970395
## [28,] -0.258115678 -0.11025209  0.275529769  0.1068057898
## [29,]  0.214071497 -0.00257041  0.008728664 -0.0112530536
## [30,] -0.016304324 -0.20604821 -0.181049718  0.0826499280
## [31,] -0.177802722  0.02030605  0.043504824 -0.1153016522
## [32,] -0.151092550 -0.11701618 -0.152302994 -0.0045791345
## [33,] -0.100877464  0.31671218 -0.204524432 -0.3240968906
## [34,]  0.268696706  0.08516514  0.071353148 -0.0862511360
## [35,]  0.020291306 -0.10550966 -0.007374849  0.1609363198
## [36,]  0.027997563 -0.04091867 -0.003625902  0.0035087357
## [37,] -0.005309061 -0.07696200  0.222585787 -0.0807475502
## [38,]  0.079778211 -0.08118230 -0.094883089  0.1219329726
## [39,]  0.077565243 -0.21208574 -0.324451737 -0.2083610270
## [40,] -0.118598723  0.27483477 -0.071178009 -0.0446445538
## [41,]  0.178498756  0.11703880  0.092198470 -0.0372092940
## [42,] -0.089775081  0.12228530  0.044544714  0.2217051038
## [43,] -0.121689077 -0.05863443 -0.116539361  0.2184216921
## [44,]  0.049439812 -0.20917537  0.069565218 -0.0279528910
## [45,]  0.251561949  0.19933619  0.199240942 -0.0492029259
## [46,]  0.008650709  0.02839251  0.002773945  0.0717790646
## [47,]  0.019477550 -0.13790380  0.147991604 -0.0749973365
## [48,]  0.189347337  0.20254292  0.024814359  0.0447947015
## [49,]  0.186754750 -0.08689225 -0.032888157  0.0625194853
## [50,]  0.056521430  0.04563221 -0.056996643 -0.0565930091
## 
## $v
##            [,1]       [,2]       [,3]        [,4]
## [1,] -0.5358995  0.4181809 -0.3412327  0.64922780
## [2,] -0.5831836  0.1879856 -0.2681484 -0.74340748
## [3,] -0.2781909 -0.8728062 -0.3780158  0.13387773
## [4,] -0.5434321 -0.1673186  0.8177779  0.08902432
prcomp(USArrests, scale = TRUE)
## Standard deviations:
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
## 
## Rotation:
##                 PC1        PC2        PC3         PC4
## Murder   -0.5358995  0.4181809 -0.3412327  0.64922780
## Assault  -0.5831836  0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158  0.13387773
## Rape     -0.5434321 -0.1673186  0.8177779  0.08902432

主成分分析は固有値分解ないしは特異値分解である

【記述的な分析】

•主成分分析 ◦特異値分解による次元縮小

prcomp(USArrests, scale = TRUE)
## Standard deviations:
## [1] 1.5748783 0.9948694 0.5971291 0.4164494
## 
## Rotation:
##                 PC1        PC2        PC3         PC4
## Murder   -0.5358995  0.4181809 -0.3412327  0.64922780
## Assault  -0.5831836  0.1879856 -0.2681484 -0.74340748
## UrbanPop -0.2781909 -0.8728062 -0.3780158  0.13387773
## Rape     -0.5434321 -0.1673186  0.8177779  0.08902432
summary(prcomp(USArrests, scale = TRUE))
## Importance of components:
##                           PC1    PC2     PC3     PC4
## Standard deviation     1.5749 0.9949 0.59713 0.41645
## Proportion of Variance 0.6201 0.2474 0.08914 0.04336
## Cumulative Proportion  0.6201 0.8675 0.95664 1.00000
biplot(prcomp(USArrests, scale = TRUE))

•因子分析

v1 <- c(1,1,1,1,1,1,1,1,1,1,3,3,3,3,3,4,5,6)
v2 <- c(1,2,1,1,1,1,2,1,2,1,3,4,3,3,3,4,6,5)
v3 <- c(3,3,3,3,3,1,1,1,1,1,1,1,1,1,1,5,4,6)
v4 <- c(3,3,4,3,3,1,1,2,1,1,1,1,2,1,1,5,6,4)
v5 <- c(1,1,1,1,1,3,3,3,3,3,1,1,1,1,1,6,4,5)
v6 <- c(1,1,1,2,1,3,3,3,4,3,1,1,1,2,1,6,5,4)
m1 <- cbind(v1,v2,v3,v4,v5,v6)
a <- factanal(m1, factors = 3) # varimax is the default
a$loadings
## 
## Loadings:
##    Factor1 Factor2 Factor3
## v1 0.944   0.182   0.267  
## v2 0.905   0.235   0.159  
## v3 0.236   0.210   0.946  
## v4 0.180   0.242   0.828  
## v5 0.242   0.881   0.286  
## v6 0.193   0.959   0.196  
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.893   1.886   1.797
## Proportion Var   0.316   0.314   0.300
## Cumulative Var   0.316   0.630   0.929
factanal(m1, factors = 3, rotation = "promax")
## 
## Call:
## factanal(x = m1, factors = 3, rotation = "promax")
## 
## Uniquenesses:
##    v1    v2    v3    v4    v5    v6 
## 0.005 0.101 0.005 0.224 0.084 0.005 
## 
## Loadings:
##    Factor1 Factor2 Factor3
## v1          0.985         
## v2          0.951         
## v3                  1.003 
## v4                  0.867 
## v5  0.910                 
## v6  1.033                 
## 
##                Factor1 Factor2 Factor3
## SS loadings      1.903   1.876   1.772
## Proportion Var   0.317   0.313   0.295
## Cumulative Var   0.317   0.630   0.925
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.462   0.460
## Factor2   0.462   1.000   0.501
## Factor3   0.460   0.501   1.000
## 
## The degrees of freedom for the model is 0 and the fit was 0.4755

主成分分析と比較

因子分析と主成分分析は要約され方が違う→回転させてるから

prcomp(m1)
## Standard deviations:
## [1] 3.0368683 1.6313757 1.5818857 0.6344131 0.3190765 0.2649086
## 
## Rotation:
##          PC1         PC2        PC3        PC4        PC5         PC6
## v1 0.4168038 -0.52292304  0.2354298 -0.2686501  0.5157193 -0.39907358
## v2 0.3885610 -0.50887673  0.2985906  0.3060519 -0.5061522  0.38865228
## v3 0.4182779  0.01521834 -0.5555132 -0.5686880 -0.4308467 -0.08474731
## v4 0.3943646  0.02184360 -0.5986150  0.5922259  0.3558110  0.09124977
## v5 0.4254013  0.47017231  0.2923345 -0.2789775  0.3060409  0.58397162
## v6 0.4047824  0.49580764  0.3209708  0.2866938 -0.2682391 -0.57719858
summary(prcomp(m1))
## Importance of components:
##                           PC1    PC2    PC3    PC4     PC5     PC6
## Standard deviation     3.0369 1.6314 1.5819 0.6344 0.31908 0.26491
## Proportion of Variance 0.6165 0.1779 0.1673 0.0269 0.00681 0.00469
## Cumulative Proportion  0.6165 0.7943 0.9616 0.9885 0.99531 1.00000
prcomp(m1, scale=T)
## Standard deviations:
## [1] 1.9225064 1.0359124 1.0003870 0.4012524 0.2023886 0.1676783
## 
## Rotation:
##          PC1         PC2        PC3        PC4        PC5         PC6
## v1 0.4154985 -0.53088297  0.1760717 -0.2791358  0.5317514 -0.39223298
## v2 0.4007058 -0.54223870  0.2485226  0.3048547 -0.5042931  0.36932463
## v3 0.4133938  0.07418871 -0.5496063 -0.5693303 -0.4344463 -0.09302655
## v4 0.3940548  0.08433475 -0.5976225  0.5877130  0.3543977  0.09721936
## v5 0.4206885  0.44028459  0.3342420 -0.2798686  0.2920358  0.59484588
## v6 0.4045287  0.46655507  0.3691854  0.2850910 -0.2516003 -0.58121033
summary(prcomp(m1, scale=T))
## Importance of components:
##                          PC1    PC2    PC3     PC4     PC5     PC6
## Standard deviation     1.923 1.0359 1.0004 0.40125 0.20239 0.16768
## Proportion of Variance 0.616 0.1789 0.1668 0.02683 0.00683 0.00469
## Cumulative Proportion  0.616 0.7949 0.9617 0.98849 0.99531 1.00000