20190910

次元縮約＋クラスター分析：東京の自殺を例に

ライブラリーの読み込み

> library(readr)
> library(dplyr)
> library(clustrd)
> library(knitr)

データの読み込みと表示

> mori1 <- read_csv("mori1.csv")
> sui1 <- select(mori1, code, rate, industry1, aging:taxgain)
> sui1.tokyo <- sui1[which(as.numeric(sui1$code) %/% 1000 == 13),]
> kable(sui1.tokyo[1:10,])

code	rate	industry1	aging	unemployment	taxgain
13101	19.59	0.0003645	0.1761120	0.0181717	0.1134598
13102	19.77	0.0003888	0.1607417	0.0240528	0.1948065
13103	17.35	0.0006927	0.1754911	0.0273618	0.5428776
13104	20.16	0.0006748	0.1956889	0.0343030	0.3382430
13105	11.96	0.0006616	0.1909031	0.0272838	0.2612840
13106	18.35	0.0006166	0.2352163	0.0376425	0.1590275
13107	17.11	0.0007521	0.2270851	0.0371070	0.1969722
13108	16.95	0.0006992	0.2108695	0.0351113	0.4161563
13109	10.32	0.0009192	0.2022644	0.0329637	0.3759405
13110	11.20	0.0017310	0.1988243	0.0311272	0.3609426

RKM

> sui1.outRKM <-  cluspca(sui1.tokyo[,-1], 3, 2, 
+                         method = "RKM", rotation = "varimax", 
+                         scale = TRUE, nstart = 10)


  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%

> summary(sui1.outRKM )

Solution with 3 clusters of sizes 51 (82.3%), 8 (12.9%), 3 (4.8%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
           Dim.1   Dim.2
Cluster 1 0.5672 -0.1095
Cluster 2 0.5672 -0.1095
Cluster 3 0.5672 -0.1095

Variable scores:
               Dim.1   Dim.2
rate          0.0812 -0.7557
industry1    -0.6449  0.0022
aging        -0.4729 -0.5230
unemployment  0.4759 -0.3941
taxgain       0.3569  0.0085

Within cluster sum of squares by cluster:
[1] 27.8825 20.5320  3.7710
 (between_SS / total_SS =  0 %) 

Clustering vector:
 [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
[36] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 3 2 2 2 2 2 2 3 2 2

Objective criterion value: 73.1532 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"

> plot(sui1.outRKM, cludesc = TRUE)

FKM

> sui2.outRKM <- cluspca(sui1.tokyo[,-1], 3, 2, 
+                        method = "FKM", rotation = "varimax",
+                        scale = TRUE, smartStart = sui1.outRKM$cluster)


  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%

> sui2.outRKM

Solution with 3 clusters of sizes 51 (82.3%), 9 (14.5%), 2 (3.2%) in 2 dimensions. Variables were mean centered and standardized.

Cluster centroids:
            Dim.1  Dim.2
Cluster 1 -0.2936 0.1021
Cluster 2 -0.2936 0.1021
Cluster 3 -0.2936 0.1021

Variable scores:
               Dim.1   Dim.2
rate          0.1031  0.9589
industry1    -0.2011  0.2404
aging         0.5048 -0.1089
unemployment -0.8277 -0.0172
taxgain      -0.0953  0.1030

Within cluster sum of squares by cluster:
[1] 15.8906  4.9855  0.9112
 (between_SS / total_SS =  0 %) 

Objective criterion value: 21.7873 

Available output:

 [1] "obscoord"  "attcoord"  "centroid"  "cluster"   "criterion"
 [6] "size"      "odata"     "scale"     "center"    "nstart"

> plot(sui2.outRKM, cludesc = TRUE)

Tandem

> sui1.outTandem <- cluspca(sui1.tokyo[,-1], 3, 2, alpha = 0.3, scale=TRUE)


  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |=================================================================| 100%

> plot(sui1.outTandem, cludesc = TRUE)