次元縮約+クラスター分析:東京の自殺を例に

ライブラリーの読み込み

> library(readr)
> library(dplyr)
> library(clustrd)
> library(knitr)

データの読み込みと表示

> mori1 <- read_csv("mori1.csv")
> sui1 <- select(mori1, code, rate, industry1, aging:taxgain)
> sui1.tokyo <- sui1[which(as.numeric(sui1$code) %/% 1000 == 13),]
> kable(sui1.tokyo[1:10,])
code rate industry1 aging unemployment taxgain
13101 19.59 0.0003645 0.1761120 0.0181717 0.1134598
13102 19.77 0.0003888 0.1607417 0.0240528 0.1948065
13103 17.35 0.0006927 0.1754911 0.0273618 0.5428776
13104 20.16 0.0006748 0.1956889 0.0343030 0.3382430
13105 11.96 0.0006616 0.1909031 0.0272838 0.2612840
13106 18.35 0.0006166 0.2352163 0.0376425 0.1590275
13107 17.11 0.0007521 0.2270851 0.0371070 0.1969722
13108 16.95 0.0006992 0.2108695 0.0351113 0.4161563
13109 10.32 0.0009192 0.2022644 0.0329637 0.3759405
13110 11.20 0.0017310 0.1988243 0.0311272 0.3609426

Tandem(varimax再掲)

> sui1.outTandem.1 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                           alpha = 0.3, 
+                           rotation = "varimax",
+                           scale=TRUE,nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outTandem.1)
Solution with 3 clusters of sizes 39 (62.9%), 14 (22.6%), 9 (14.5%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.6141  1.5246
Cluster 2 0.4016 -0.3240
Cluster 3 0.4016 -0.3240

Variable scores:
               Dim.1   Dim.2
rate          0.0687  0.0033
industry1    -0.5536 -0.0628
aging        -0.7811  0.0093
unemployment  0.2804 -0.0602
taxgain      -0.0109  0.9962

Within cluster sum of squares by cluster:
[1] 11.4352 10.5876  3.2486
 (between_SS / total_SS =  34.01 %) 

Clustering vector:
 [1] 1 1 2 2 1 1 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 1 3 1 1

Objective criterion value: 64.996 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outTandem.1, cludesc = TRUE)

Tandem(none)

> sui1.outTandem.2 <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                             alpha = 0.3, 
+                             rotation = "none",
+                             scale=TRUE,nstart = 10, center = TRUE,seed = 1234)

  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%
> summary(sui1.outTandem.2)
Solution with 3 clusters of sizes 39 (62.9%), 14 (22.6%), 9 (14.5%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1   Dim.2
Cluster 1  0.2222 -0.4657
Cluster 2  1.2128  1.1094
Cluster 3 -2.8493  0.2924

Variable scores:
               Dim.1   Dim.2
rate          0.0634 -0.0267
industry1    -0.5264  0.1826
aging        -0.7003  0.3460
unemployment  0.2268 -0.1754
taxgain       0.4207  0.9030

Within cluster sum of squares by cluster:
[1] 11.4352 10.5876  3.2486
 (between_SS / total_SS =  82.85 %) 

Clustering vector:
 [1] 1 1 2 2 1 1 1 2 2 2 2 2 2 1 2 1 1 1 2 2 2 1 2 2 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 3 3 3 3 3 3 1 3 1 1

Objective criterion value: 64.996 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"   
> plot(sui1.outTandem.2, cludesc = TRUE)