次元縮約+クラスター分析:東京の自殺を例に

ライブラリーの読み込み

> library(readr)
> library(dplyr)
> library(clustrd)
> library(knitr)

データの読み込みと表示

> mori1 <- read_csv("mori1.csv")
> sui1 <- select(mori1, code, rate, industry1, aging:taxgain)
> sui1.tokyo <- sui1[which(as.numeric(sui1$code) %/% 1000 == 13),]
> kable(sui1.tokyo[1:10,])
code rate industry1 aging unemployment taxgain
13101 19.59 0.0003645 0.1761120 0.0181717 0.1134598
13102 19.77 0.0003888 0.1607417 0.0240528 0.1948065
13103 17.35 0.0006927 0.1754911 0.0273618 0.5428776
13104 20.16 0.0006748 0.1956889 0.0343030 0.3382430
13105 11.96 0.0006616 0.1909031 0.0272838 0.2612840
13106 18.35 0.0006166 0.2352163 0.0376425 0.1590275
13107 17.11 0.0007521 0.2270851 0.0371070 0.1969722
13108 16.95 0.0006992 0.2108695 0.0351113 0.4161563
13109 10.32 0.0009192 0.2022644 0.0329637 0.3759405
13110 11.20 0.0017310 0.1988243 0.0311272 0.3609426

RKM(varimax再掲)

> sui1.outRKM.1 <-  cluspca(sui1.tokyo[,-1], 3,2,
+                           method = "RKM", 
+                           rotation = "varimax",
+                           scale = TRUE, nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outRKM.1)
Solution with 3 clusters of sizes 51 (82.3%), 8 (12.9%), 3 (4.8%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.5672 -0.1095
Cluster 2 0.5672 -0.1095
Cluster 3 0.5672 -0.1095

Variable scores:
               Dim.1   Dim.2
rate          0.0812 -0.7557
industry1    -0.6449  0.0022
aging        -0.4729 -0.5230
unemployment  0.4759 -0.3941
taxgain       0.3569  0.0085

Within cluster sum of squares by cluster:
[1] 27.8825 20.5320  3.7710
 (between_SS / total_SS =  0 %) 

Clustering vector:
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 2 2 2 2 2 3 2 2

Objective criterion value: 73.1532 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outRKM.1, cludesc = TRUE)

RKM(none)

> sui1.outRKM.2 <-  cluspca(sui1.tokyo[,-1], 3, 2,
+                           method = "RKM", 
+                           rotation = "none",
+                           scale = TRUE, nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outRKM.2)
Solution with 3 clusters of sizes 51 (82.3%), 8 (12.9%), 3 (4.8%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1   Dim.2
Cluster 1  0.5523  0.1693
Cluster 2 -3.2806  0.4883
Cluster 3 -0.6414 -4.1806

Variable scores:
               Dim.1   Dim.2
rate          0.4262 -0.6293
industry1    -0.5706 -0.3006
aging        -0.1723 -0.6837
unemployment  0.6051 -0.1248
taxgain       0.3112  0.1749

Within cluster sum of squares by cluster:
[1] 27.8825 20.5320  3.7710
 (between_SS / total_SS =  75.25 %) 

Clustering vector:
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 2 2 2 2 2 3 2 2

Objective criterion value: 73.1532 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outRKM.2, cludesc = TRUE)

FKM(varimax再掲)

> sui1.outFKM.1 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                        method = "FKM", 
+                        rotation = "varimax",
+                        scale = TRUE, nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outFKM.1)
Solution with 3 clusters of sizes 30 (48.4%), 26 (41.9%), 6 (9.7%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.5507 -0.2206
Cluster 2 0.5507 -0.2206
Cluster 3 0.5507 -0.2206

Variable scores:
               Dim.1   Dim.2
rate          0.5946 -0.1470
industry1     0.0169  0.6093
aging        -0.6889 -0.3545
unemployment -0.1755  0.6824
taxgain      -0.3753  0.1260

Within cluster sum of squares by cluster:
[1] 4.5353 9.8674 2.8766
 (between_SS / total_SS =  0 %) 

Clustering vector:
 [1] 2 2 2 2 2 2 2 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 2 2 2
[36] 2 2 2 2 2 1 1 1 2 1 1 1 2 1 1 3 2 3 3 1 3 1 3 2 2 2 2

Objective criterion value: 17.2792 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outFKM.1, cludesc = TRUE)

FKM(none)

> sui1.outFKM.2 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                          method = "FKM", 
+                          rotation = "none",
+                          scale = TRUE, nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outFKM.2)
Solution with 3 clusters of sizes 30 (48.4%), 26 (41.9%), 6 (9.7%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1   Dim.2
Cluster 1 -0.1670  0.2605
Cluster 2  0.5641 -0.1839
Cluster 3 -1.6091 -0.5054

Variable scores:
               Dim.1   Dim.2
rate          0.6030 -0.1076
industry1    -0.0233  0.6091
aging        -0.6640 -0.3991
unemployment -0.2200  0.6693
taxgain      -0.3828  0.1011

Within cluster sum of squares by cluster:
[1] 4.5353 9.8674 2.8766
 (between_SS / total_SS =  62.74 %) 

Clustering vector:
 [1] 2 2 2 2 2 2 2 1 1 1 1 3 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 2 2 2
[36] 2 2 2 2 2 1 1 1 2 1 1 1 2 1 1 3 2 3 3 1 3 1 3 2 2 2 2

Objective criterion value: 17.2792 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outFKM.2, cludesc = TRUE)