次元縮約+クラスター分析:東京の自殺を例に

ライブラリーの読み込み

> library(readr)
> library(dplyr)
> library(clustrd)
> library(knitr)

データの読み込みと表示

> mori1 <- read_csv("mori1.csv")
> sui1 <- select(mori1, code, rate, industry1, aging:taxgain)
> sui1.tokyo <- sui1[which(as.numeric(sui1$code) %/% 1000 == 13),]
> kable(sui1.tokyo[1:10,])
code rate industry1 aging unemployment taxgain
13101 19.59 0.0003645 0.1761120 0.0181717 0.1134598
13102 19.77 0.0003888 0.1607417 0.0240528 0.1948065
13103 17.35 0.0006927 0.1754911 0.0273618 0.5428776
13104 20.16 0.0006748 0.1956889 0.0343030 0.3382430
13105 11.96 0.0006616 0.1909031 0.0272838 0.2612840
13106 18.35 0.0006166 0.2352163 0.0376425 0.1590275
13107 17.11 0.0007521 0.2270851 0.0371070 0.1969722
13108 16.95 0.0006992 0.2108695 0.0351113 0.4161563
13109 10.32 0.0009192 0.2022644 0.0329637 0.3759405
13110 11.20 0.0017310 0.1988243 0.0311272 0.3609426

RKM

> sui1.outRKM <-  cluspca(sui1.tokyo[,-1], 3, 2, 
+                         method = "RKM", rotation = "varimax", 
+                         scale = TRUE, nstart = 10)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outRKM )
Solution with 3 clusters of sizes 51 (82.3%), 8 (12.9%), 3 (4.8%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.5672 -0.1095
Cluster 2 0.5672 -0.1095
Cluster 3 0.5672 -0.1095

Variable scores:
               Dim.1   Dim.2
rate          0.0812 -0.7557
industry1    -0.6449  0.0022
aging        -0.4729 -0.5230
unemployment  0.4759 -0.3941
taxgain       0.3569  0.0085

Within cluster sum of squares by cluster:
[1] 27.8825 20.5320  3.7710
 (between_SS / total_SS =  0 %) 

Clustering vector:
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 2 2 2 2 2 3 2 2

Objective criterion value: 73.1532 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outRKM, cludesc = TRUE)

FKM

> sui2.outRKM <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                        method = "FKM", rotation = "varimax",
+                        scale = TRUE, smartStart = sui1.outRKM$cluster)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> sui2.outRKM
Solution with 3 clusters of sizes 51 (82.3%), 9 (14.5%), 2 (3.2%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1  Dim.2
Cluster 1 -0.2936 0.1021
Cluster 2 -0.2936 0.1021
Cluster 3 -0.2936 0.1021

Variable scores:
               Dim.1   Dim.2
rate          0.1031  0.9589
industry1    -0.2011  0.2404
aging         0.5048 -0.1089
unemployment -0.8277 -0.0172
taxgain      -0.0953  0.1030

Within cluster sum of squares by cluster:
[1] 15.8906  4.9855  0.9112
 (between_SS / total_SS =  0 %) 

Objective criterion value: 21.7873 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui2.outRKM, cludesc = TRUE)

Tandem

> sui1.outTandem <- cluspca(sui1.tokyo[,-1], 3, 2, alpha = 0.3, scale=TRUE)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> plot(sui1.outTandem, cludesc = TRUE)